climateprediction.net home page
hadcm3lb version 5.15 crashes when showing graphics.

hadcm3lb version 5.15 crashes when showing graphics.

Questions and Answers : Unix/Linux : hadcm3lb version 5.15 crashes when showing graphics.
Message board moderation

To post messages, you must log in.

AuthorMessage
Andris Pavenis

Send message
Joined: 22 Oct 05
Posts: 15
Credit: 2,066,409
RAC: 1,196
Message 26365 - Posted: 26 Jan 2007, 5:55:43 UTC

Processing workunit hadcm3lbm_azon_25282074 crashes when trying to show graphics. It didn\'t happen at the begin, but only when processing reached year 1952. Graphics window shows only Earth image up to coastlines and after that hadcm3lb version: 5.15 crashes:

Fatal signal caught, cleanup CPDN run and restart...

Tried to restart from earlier backup, but the problem reappeared when year 1952 was reached.

SYstem info: Linux, Fedora Core 6, Pentium 4.
ID: 26365 · Report as offensive     Reply Quote
Profile old_user81594

Send message
Joined: 11 Jun 05
Posts: 67
Credit: 1,222,916
RAC: 0
Message 27951 - Posted: 17 Apr 2007, 19:30:07 UTC - in response to Message 26365.  

Processing workunit hadcm3lbm_azon_25282074 crashes when trying to show graphics. It didn\'t happen at the begin, but only when processing reached year 1952. Graphics window shows only Earth image up to coastlines and after that hadcm3lb version: 5.15 crashes:

Fatal signal caught, cleanup CPDN run and restart...

Tried to restart from earlier backup, but the problem reappeared when year 1952 was reached.

SYstem info: Linux, Fedora Core 6, Pentium 4.



I think the graphics takes up a lot of CPU ratio, so with a slower machine or one with only 512Mb RAM (one-core CPU) you might push it over the edge!
Having said that, I have never completed a model either and am getting very frustrated with the lack of reliability with Climate Prediction. I\'ve never had any other BOINC Project model crash, but CPDN is very, very fragile I feel.

Neil.
ID: 27951 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7233
Credit: 23,154,247
RAC: 85
Message 27956 - Posted: 17 Apr 2007, 20:47:40 UTC

Having it\'s origins in a supercomputer program, it\'s not used to (or intended to) having to compete with other Windows programs for hardware resources.

People who run other resource heavy programs need to be a bit protective of their climate models at such times. Suspend BOINC (and the model), before running the other program(s).


Backups: Here
ID: 27956 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,263,895
RAC: 300
Message 28015 - Posted: 19 Apr 2007, 20:41:50 UTC

Neil, I\'ve just looked at the crash messages for your last model that crashed on your Windows computer - hadcm3inct_cmwh_1920_160_05864722_3. There\'s a selection of messages there that I\'ve never seen before, and this computer\'s had 25 models altogether.

I don\'t know very much about hardware, but on the face of it, your computer looks very similar to mine. But, running two models in tandem, my computer is doing 1.58sec/TS whereas yours is doing about 1.22. Is this machine overclocked? If so, I wonder whether that\'s the cause of the problems?

Are you backing up the contents of your boinc folder so that you can restore and continue if the models crash?

My impression is that the models themselves are pretty robust as long as you follow the \'rules\' in the README about crashes: items 1, 5 and 6. And never use the screensaver, only viewing the globe thro the boinc manager button.

If another project\'s WUs last one day but a climate model lasts 100 days, all things being equal, the climate model must be 100 times more likely to crash. This is why backups are the ultimate solution.


Cpdn news
ID: 28015 · Report as offensive     Reply Quote
Andris Pavenis

Send message
Joined: 22 Oct 05
Posts: 15
Credit: 2,066,409
RAC: 1,196
Message 28795 - Posted: 19 May 2007, 6:43:43 UTC - in response to Message 27956.  

Having it\'s origins in a supercomputer program, it\'s not used to (or intended to) having to compete with other Windows programs for hardware resources.

People who run other resource heavy programs need to be a bit protective of their climate models at such times. Suspend BOINC (and the model), before running the other program(s).



Is it really so fragile? I have many times run simultaneously compiling Mozilla Firefox development versions from source (both directly and in VmWare virtial machine running different Linux distributions) under Linux. I haven\'t seen crashes though except when trying to view graphics. Of course all these processes does not interact directly with CPDN (which viewing graphics does)

ID: 28795 · Report as offensive     Reply Quote
Profile astroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 94,916,150
RAC: 10
Message 28806 - Posted: 19 May 2007, 17:48:43 UTC

Your Firefox activity won\'t lock a CPDN file as some virus software does. CPDN doesn\'t react well when expected files are \"missing\".

Another issue arises when people try to squeeze this large and hungry software system into a too-small computer -- swapping and resultant timing relationships among OS/boinc/CPDN can become problematic. The graphics layer triggers some of that, too. Much as we\'d like to have the cake and eat it, too, these PCs won\'t behave like a Cray.

All in all, considering the array of machine and OS types in which CPDN runs, and the vast array of participant\'s run mixes, I see CPDN as remarkably robust/resilient.

"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 28806 · Report as offensive     Reply Quote
Andris Pavenis

Send message
Joined: 22 Oct 05
Posts: 15
Credit: 2,066,409
RAC: 1,196
Message 28822 - Posted: 20 May 2007, 17:12:01 UTC - in response to Message 28806.  

Your Firefox activity won\'t lock a CPDN file as some virus software does. CPDN doesn\'t react well when expected files are \"missing\".

Another issue arises when people try to squeeze this large and hungry software system into a too-small computer -- swapping and resultant timing relationships among OS/boinc/CPDN can become problematic. The graphics layer triggers some of that, too. Much as we\'d like to have the cake and eat it, too, these PCs won\'t behave like a Cray.


I would not run CPDN, if computer would swap as crazy...


All in all, considering the array of machine and OS types in which CPDN runs, and the vast array of participant\'s run mixes, I see CPDN as remarkably robust/resilient.


The problem appears when model year reaches 1951 or 1952 (currently active model, after error it restarts from the last checkpoint). Tried to get more info with the
following steps:
1) backed up BOINC directory
2) disconnected from network (to avoid unnecessary information from being sent to server)
3) started model
4) attached GDB to application (hadcm3transum_5.15_i686-pc-linux-gnu)
5) triggered error by trying to view graphics
6) tried to get backtrace in GDB
7) stopped boinc
8) restored BOINC directory from backup

Unfortunately there is not enough information in executables for backtrace to have of much use. At least one can see the exception (SIGFPE) which happens at the begin (SIGSEGV follows after that). If I would have executable with at least bit debug info, I could get more reasonable traceback.

Andris

#0 0xb7b11e88 in ?? ()
#1 0x3f800000 in ?? ()
#2 0x3f800000 in ?? ()
#3 0x3f800000 in ?? ()
#4 0x3f800000 in ?? ()
#5 0x00001d80 in ?? ()
#6 0x00001f80 in ?? ()
#7 0xbf87ce00 in ?? ()
#8 0xb7b05f9e in ?? ()
#9 0x080502d8 in pthread_create ()
#10 0x00000080 in ?? ()
#11 0x00000000 in ?? ()

ID: 28822 · Report as offensive     Reply Quote
Profile astroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 94,916,150
RAC: 10
Message 28860 - Posted: 21 May 2007, 17:03:39 UTC

5) triggered error by trying to view graphics

This has been a common problem. It often results from old video drivers or conflict with another program using heavy graphics. (Unfortunately, most of that experience has been in Windows.)

Does your machine have a graphics card or on-board graphics chip? In either case, the vendor might have an updated driver good for Linux.

"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 28860 · Report as offensive     Reply Quote
Andris Pavenis

Send message
Joined: 22 Oct 05
Posts: 15
Credit: 2,066,409
RAC: 1,196
Message 28862 - Posted: 21 May 2007, 17:27:04 UTC - in response to Message 28860.  

5) triggered error by trying to view graphics

This has been a common problem. It often results from old video drivers or conflict with another program using heavy graphics. (Unfortunately, most of that experience has been in Windows.)

Does your machine have a graphics card or on-board graphics chip? In either case, the vendor might have an updated driver good for Linux.


On-board. I\'m using standard drivers comming with X11. I have bad experience with ATI binary drivers for Radeon cards. They screw up system so, that I have to boot from rescue CD to recover (I have not tested sor some time now. Screwing up graphics twice was enough).

ID: 28862 · Report as offensive     Reply Quote
Andris Pavenis

Send message
Joined: 22 Oct 05
Posts: 15
Credit: 2,066,409
RAC: 1,196
Message 29537 - Posted: 13 Jul 2007, 22:28:10 UTC - in response to Message 28862.  

5) triggered error by trying to view graphics

This has been a common problem. It often results from old video drivers or conflict with another program using heavy graphics. (Unfortunately, most of that experience has been in Windows.)

Does your machine have a graphics card or on-board graphics chip? In either case, the vendor might have an updated driver good for Linux.


On-board. I\'m using standard drivers comming with X11. I have bad experience with ATI binary drivers for Radeon cards. They screw up system so, that I have to boot from rescue CD to recover (I have not tested sor some time now. Screwing up graphics twice was enough).


I moved project to a different computer (3.0GHz Pentium 4 HT, 1.5GB memory, Fedora Core 6). The problem remains the same - when I try to see graphics, CPDN application crashes and restarts from the last checkpoint.

Video card is different - lspci says:
00:02.0 VGA compatible controller: Intel Corporation 82915G/GV/910GL Integrated Graphics Controller (rev 04)

ID: 29537 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7233
Credit: 23,154,247
RAC: 85
Message 29538 - Posted: 13 Jul 2007, 22:55:13 UTC
Last modified: 13 Jul 2007, 22:57:03 UTC

You only have 1 computer visible on the list, but that one has very slow timings for a 3.0Ghz P4. It looks like something\'s wrong, perhaps over heating.

For instance, I\'m getting this for a 3.20Ghz P4:
Measured floating point speed 1770.95 million ops/sec
Measured integer speed 3391.74 million ops/sec


ID: 29538 · Report as offensive     Reply Quote
DJStarfox

Send message
Joined: 27 Jan 07
Posts: 288
Credit: 2,332,206
RAC: 11
Message 29594 - Posted: 18 Jul 2007, 0:15:26 UTC - in response to Message 29537.  

5) triggered error by trying to view graphics

This has been a common problem. It often results from old video drivers or conflict with another program using heavy graphics. (Unfortunately, most of that experience has been in Windows.)

Does your machine have a graphics card or on-board graphics chip? In either case, the vendor might have an updated driver good for Linux.


On-board. I\'m using standard drivers comming with X11. I have bad experience with ATI binary drivers for Radeon cards. They screw up system so, that I have to boot from rescue CD to recover (I have not tested sor some time now. Screwing up graphics twice was enough).


I moved project to a different computer (3.0GHz Pentium 4 HT, 1.5GB memory, Fedora Core 6). The problem remains the same - when I try to see graphics, CPDN application crashes and restarts from the last checkpoint.

Video card is different - lspci says:
00:02.0 VGA compatible controller: Intel Corporation 82915G/GV/910GL Integrated Graphics Controller (rev 04)



Andres:

Would you please post the output of the following command?
uname -a

Also, can you post the device section of your xorg.conf file? It should look something like this:
Section \"Device\"
Identifier \"Videocard0\"
Driver \"nvidia\"
Option \"NoLogo\" \"1\"
EndSection

Also, try enabling task_debug in the boinc cc_config.xml file. Then examine the stderrdae.txt and stdoutdae.txt files in the boinc directory for clues to the error. Post anything that looks like the culprit.
ID: 29594 · Report as offensive     Reply Quote
DJStarfox

Send message
Joined: 27 Jan 07
Posts: 288
Credit: 2,332,206
RAC: 11
Message 29607 - Posted: 18 Jul 2007, 19:01:13 UTC - in response to Message 29537.  

5) triggered error by trying to view graphics

This has been a common problem. It often results from old video drivers or conflict with another program using heavy graphics. (Unfortunately, most of that experience has been in Windows.)

Does your machine have a graphics card or on-board graphics chip? In either case, the vendor might have an updated driver good for Linux.


On-board. I\'m using standard drivers comming with X11. I have bad experience with ATI binary drivers for Radeon cards. They screw up system so, that I have to boot from rescue CD to recover (I have not tested sor some time now. Screwing up graphics twice was enough).


I moved project to a different computer (3.0GHz Pentium 4 HT, 1.5GB memory, Fedora Core 6). The problem remains the same - when I try to see graphics, CPDN application crashes and restarts from the last checkpoint.

Video card is different - lspci says:
00:02.0 VGA compatible controller: Intel Corporation 82915G/GV/910GL Integrated Graphics Controller (rev 04)


Hmm...now that I read your original post more carefully, it may be your system is fine. The model may just be crashing on its own. It would be wise to verify your video card setup by running a GL program such as glxgears or gltron. If it runs fine with a good framerate, then your setup is probably fine.

The moderators may hurt me for saying this, but I\'d say let it crash, cut your losses, and get another model to start. Or just run w/o grpahics. The most recent one I got in early June has been much more stable (with or without graphics) than the others.
ID: 29607 · Report as offensive     Reply Quote

Questions and Answers : Unix/Linux : hadcm3lb version 5.15 crashes when showing graphics.

©2020 climateprediction.net