climateprediction.net home page
Comments for \'Generic solutions to models\' sticky

Comments for \'Generic solutions to models\' sticky

Questions and Answers : Windows : Comments for \'Generic solutions to models\' sticky
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

AuthorMessage
old_user371461

Send message
Joined: 17 Mar 06
Posts: 4
Credit: 341,598
RAC: 0
Message 27734 - Posted: 5 Apr 2007, 11:52:52 UTC - in response to Message 21066.  

Laptop stopped for first time with following error messages:
<core_client_version>5.4.11</core_client_version>
<stderr_txt>
(null): cannot open input file dataout/atmos_restart.day
(null): cannot open input file dataout/ocean_restart.day
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
CPDN Monitor - Quit request from BOINC...
Not a JPEG file: starts with 0x01 0xda

BUFFIN: Read Failed: No such file or directory
BUFFIN: C I/O Error - Return code = 16

BUFFIN: Read Failed: No such file or directory
BUFFIN: C I/O Error - Return code = 16

BUFFIN: Read Failed: No such file or directory
BUFFIN: C I/O Error - Return code = 16

BUFFIN: Read Failed: No such file or directory
BUFFIN: C I/O Error - Return code = 16

BUFFIN: Read Failed: No such file or directory
BUFFIN: C I/O Error - Return code = 16

BUFFIN: Read Failed: No such file or directory
BUFFIN: C I/O Error - Return code = 16

BUFFIN: Read Failed: No such file or directory
BUFFIN: C I/O Error - Return code = 16
Error in converting file dataout/bvpdfo.pjc5c10 to netcdf format.
Error in converting file dataout/bvpdfo.pic5c10 to netcdf format.
Error in converting file dataout/bvpdfo.pfc5c10 to netcdf format.
Error in converting file dataout/bvpdfa.phc5c10 to netcdf format.
Error in converting file dataout/bvpdfa.pgc5c10 to netcdf format.
Error in converting file dataout/bvpdfa.pec5c10 to netcdf format.
Error in converting file dataout/bvpdfa.pdc5c10 to netcdf format.
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
CPDN Monitor - Quit request from BOINC...
Not a JPEG file: starts with 0x01 0xda
CPDN Monitor - Quit request from BOINC...
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
CPDN Monitor - Quit request from BOINC...
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
No heartbeat from core client for 31 sec - exiting
CPDN Monitor - No \'heartbeat\' from BOINC...
No heartbeat from core client for 32 sec - exiting
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
CPDN Monitor - Quit request from BOINC...
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
No heartbeat from core client for 31 sec - exiting
CPDN Monitor - No \'heartbeat\' from BOINC...
No heartbeat from core client for 32 sec - exiting
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
CPDN Monitor - Quit request from BOINC...
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
CPDN Monitor - Quit request from BOINC...
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
No heartbeat from core client for 31 sec - exiting
CPDN Monitor - No \'heartbeat\' from BOINC...
No heartbeat from core client for 32 sec - exiting
No heartbeat from core client for 33 sec - exiting
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
CPDN Monitor - Quit request from BOINC...
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
CPDN Monitor - Quit request from BOINC...
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
No heartbeat from core client for 31 sec - exiting
CPDN Monitor - No \'heartbeat\' from BOINC...
No heartbeat from core client for 32 sec - exiting
No heartbeat from core client for 33 sec - exiting
Not a JPEG file: starts with 0x01 0xda
CPDN Monitor - Quit request from BOINC...
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
CPDN Monitor - Quit request from BOINC...
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
CPDN Monitor - Quit request from BOINC...
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
CPDN Monitor - Quit request from BOINC...
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
CPDN Monitor - Quit request from BOINC...
Not a JPEG file: starts with 0x01 0xda
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
CPDN Monitor - Quit request from BOINC...
Not a JPEG file: starts with 0x01 0xda
CPDN Monitor - Quit request from BOINC...
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
CPDN Monitor - Quit request from BOINC...
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
CPDN Monitor - Quit request from BOINC...
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Not a JPEG file: starts with 0x01 0xda
Fatal crash! :-(

</stderr_txt>
<message>
<file_xfer_error>
<file_name>hadcm3ohf_bvpd_00954525_0_8.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadcm3ohf_bvpd_00954525_0_9.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadcm3ohf_bvpd_00954525_0_10.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadcm3ohf_bvpd_00954525_0_11.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadcm3ohf_bvpd_00954525_0_12.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadcm3ohf_bvpd_00954525_0_13.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadcm3ohf_bvpd_00954525_0_14.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadcm3ohf_bvpd_00954525_0_15.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadcm3ohf_bvpd_00954525_0_16.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>

</message>

Any hope of recovery?!!!
ID: 27734 · Report as offensive     Reply Quote
Profile Strathpeffer
Avatar

Send message
Joined: 9 Jan 07
Posts: 497
Credit: 342,899
RAC: 0
Message 27737 - Posted: 5 Apr 2007, 13:42:10 UTC

Haydn, since it says \"Fatal crash :-(\" I think the only way you can recover it is if you have a backup, so I hope you do!

Regards,
Visit the Scotland team
ID: 27737 · Report as offensive     Reply Quote
Profile Alex

Send message
Joined: 1 Apr 07
Posts: 5
Credit: 1,022,165
RAC: 0
Message 29206 - Posted: 9 Jun 2007, 21:26:06 UTC
Last modified: 9 Jun 2007, 21:47:54 UTC

Oops - hit the wrong key. See below...

ID: 29206 · Report as offensive     Reply Quote
Profile Alex

Send message
Joined: 1 Apr 07
Posts: 5
Credit: 1,022,165
RAC: 0
Message 29207 - Posted: 9 Jun 2007, 21:31:27 UTC
Last modified: 9 Jun 2007, 21:44:08 UTC

I\'ve found that setting \"keep in memory while suspended\" to NO in the preferences seems to have cut the crash rate significantly. I was getting a lot of crashes after switching tasks to or from another project (either manually or automatically), and I figured maybe the RAM was getting corrupted or overwritten. Sure, I lose some processing time if I suspend it, but I was losing a lot more reloading backups so often. So far the evil spirits have been kept at bay for 7 days, so I figured it would be worth mentioning here. Or maybe it\'s just my lucky sweater...

BTW - Core 2 Duo E6300, 1G RAM, 256M video, WinXP Media Center
ID: 29207 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 29208 - Posted: 9 Jun 2007, 21:47:51 UTC
Last modified: 9 Jun 2007, 21:48:37 UTC

A couple of Alex\'s recent crashes have been exit code 22 and \'The device does not recognise the command\'. Has anyone discovered yet what this means?

Alex, have you been running AV scans with the model running or just suspended (rather than exiting from boinc first)?
Cpdn news
ID: 29208 · Report as offensive     Reply Quote
Profile Alex

Send message
Joined: 1 Apr 07
Posts: 5
Credit: 1,022,165
RAC: 0
Message 29209 - Posted: 9 Jun 2007, 21:51:37 UTC - in response to Message 29208.  
Last modified: 9 Jun 2007, 22:16:51 UTC

Alex, have you been running AV scans with the model running or just suspended (rather than exiting from boinc first)?[/quote]

Good question. I\'ve got a DSL modem with NAT and have Trend Micro Internet Security on 24/7 (so far the real-time protection seems to work pretty well) and haven\'t actually run a full system scan in a long time. Did run an Ad-Aware scan yesterday without even suspending Boinc and had no problems.

I kinda wondered what error code 22 was too, but I figured if I could reload from a backup it was no big whoop.

This PC is almost brand new so I haven\'t defragged it yet (but if I did, I\'d exit first).
ID: 29209 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 29212 - Posted: 10 Jun 2007, 12:29:32 UTC

Just a reminder to everybody that we do need to exit from boinc before running AV scans. A scan can catch the model at just the wrong moment in its calculations and crash it.

Defrags will only tidy up the boinc folder if we exit from boinc first.

Personally, I feel it\'s safer for the model to exit from boinc say once a week and then have a serious housekeeping session - all the updates, all the scans, a disk cleanup, a defrag, boinc folder backup, other backups, and a full computer shutdown rather than just a restart. I\'m possibly being overcautious, but this way I\'m sure none of these procedures will affect the models.
Cpdn news
ID: 29212 · Report as offensive     Reply Quote
Profile Dr_Mabuse

Send message
Joined: 21 Feb 05
Posts: 24
Credit: 991,032
RAC: 0
Message 29285 - Posted: 26 Jun 2007, 9:48:19 UTC - in response to Message 21066.  


* Make backups about once per week (of course if the model doesn\'t get that far it\'s not worth bothering!). See http://www.climateprediction.net/board/viewtopic.php?t=2130 for information about backups.


My model repeats crashing after about 1,616,688.55sec CPU time.
So I want to use your hint to make a backup.
The information you mentioned is from 2004 and describes some folders that new versions of the model do not use any more. So I would like to know which files and folders are used in the running model.
My just started model is:
hadsm3fub_a46g_000471976_2 and there are 4 ZIP libraries and 1 EXE file downloaded today as well as 8 slideshow files.

But I have much more in my folder left from earlier models that crashed. Could I delete them all ? There is 1 folder named AdvancedVisualisation_V2 which contains files dreated between 2005-02-08 and 2006-03-14. Is that needed for the actual graphic visualisation ?

Thanks for help
YS
Jochen from Old Germany
*** Since I'm a fool I prooved that the system is not foolproof ;-) ***
ID: 29285 · Report as offensive     Reply Quote
old_user427067

Send message
Joined: 25 Jan 07
Posts: 4
Credit: 93,584
RAC: 0
Message 29286 - Posted: 26 Jun 2007, 12:21:04 UTC

So after 6 months running BOINC and 3 runs failing before they got a quarter of the way through, I see I have several dozen fixes to try, scattered throughout this thread and others. Sadly I do have to work occasionally to pay the mortgage and may not have to time to enjoy all the recommended procedures.

I used to run the United Devices program with the Oxford University cancer drug search. You couldn\'t kill that thing - whatever happened, it went back to a good save point and restarted. It never failed in 5 years. Couldn\'t BOINC/climateprediction be made as robust as that?

How should I get rid of the old task files? Take a guess based on mod date and filenames, or is there a per-task list somewhere, or a menu option I haven\'t discovered?
ID: 29286 · Report as offensive     Reply Quote
Profile MikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 29301 - Posted: 26 Jun 2007, 17:38:14 UTC


The folders you need to delete have the same name as the models which crashed.

I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 29301 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 29305 - Posted: 26 Jun 2007, 19:01:57 UTC
Last modified: 26 Jun 2007, 19:12:43 UTC

Bob, if you want to get rid of all the cpdn models and files you\'ve worked on but keep boinc at least for the time being (or keep other projects going), in the boinc manager Projects tab you could set cpdn to No new work. Then still in the Projects tab you could could highlight cpdn and press the Reset project button.

Bob, one of your models crashed with a -1 code, which could mean that you turned off the computer without exiting first from boinc. Two others crashed with a -22 code. If you do want to continue with cpdn, let us know and I\'ll get someone to give you a diagnosis for this error which has only occurred recently on cpdn.

The project READMEs are all here in one place:

http://www.climateprediction.net/board/viewforum.php?f=36


Cpdn news
ID: 29305 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 29306 - Posted: 26 Jun 2007, 19:37:30 UTC

Hallo Dr_Mabuse

Ich war zweimal in Markgröningen - der schwäbische Wein aus diesem Gebiet hat mir gut gefallen.

Now your models. You\'ve had several crashes with a -161 error code, so it would be a good idea to reread Mike\'s post at the top of this thread.

Mike\'s link to UKNick\'s backup and restore method still works, but it was as you say written for Classic. A more recent post is in the project READMEs

http://www.climateprediction.net/board/viewforum.php?f=36

In the Running the model README, look at item #1 by Les. That\'s the easiest manual method. Or if you prefer, there\'s a whole README with a selection of backup methods. The important thing is to exit from boinc before backing up; if boinc is running, the backup doesn\'t work.


Cpdn news
ID: 29306 · Report as offensive     Reply Quote
old_user427067

Send message
Joined: 25 Jan 07
Posts: 4
Credit: 93,584
RAC: 0
Message 29308 - Posted: 26 Jun 2007, 19:50:41 UTC

Thanks for your replies mo.v and MikeMarsUK. I\'m already running another task. I thought I should at least try the Norton Antivirus workaround (excluding the BOINC folder from the scan), because NAV and NIS have given me other problems in the past.

The -1 code could be because I have to run some fairly dangerous software - once in a long while I have to do a hard reset to get my machine back. Or it could because my Dell Inspiron doesn\'t always come back from Standby/Hibernate. I\'ve been trying to exit BOINC before putting the machine to sleep, but sometimes I forget. Sometimes it goes to sleep by itself.

I\'d be interested to know what the -22 means - if it isn\'t NAV.
ID: 29308 · Report as offensive     Reply Quote
Profile MikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 29311 - Posted: 26 Jun 2007, 22:17:07 UTC


The -22 appears instead of a normal error code with the recent batches of the model - the actual problem which caused the model to crash is hidden as a result. I think Tolu is looking at it.

The part of the log which looks suspect is this:

CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4140, iMonCtr=1
Model crash detected, will try to restart...
CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4140, iMonCtr=1
Model crash detected, will try to restart...
CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4140, iMonCtr=1
Model crash detected, will try to restart...
CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4140, iMonCtr=1
Model crash detected, will try to restart...
CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4140, iMonCtr=1
Model crash detected, will try to restart...
CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4140, iMonCtr=1
Model crash detected, will try to restart...
Sorry, too many model crashes! :-(
CPDN process is not running, exiting, bRetVal = 1, checkPID=1624, selfPID=1624, iMonCtr=1



Do you recall what was happening with the PC at the time it crashed? Something using 100% of the CPU time for an extended period of time possibly?
I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 29311 · Report as offensive     Reply Quote
old_user427067

Send message
Joined: 25 Jan 07
Posts: 4
Credit: 93,584
RAC: 0
Message 29486 - Posted: 9 Jul 2007, 10:02:19 UTC

Sorry - I got busy and then couldn\'t remember which message board I was posting in.

Yes, the software I\'m running (supply chain planning, including linear and mixed integer programming) quite often grabs 100% of the CPU for a long time (can be overnight). It can sometimes lock the machine to the point where I can\'t even Ctrl-Alt-Del to get Task Manager up and reduce the process priority.

If that\'s a problem I\'ll stop BOINC running before I start a long run.

The latest task HADSM3 Slab Model 5.06 seems to be running fine now that I\'ve stopped NAV from scanning the BOINC folder.
ID: 29486 · Report as offensive     Reply Quote
Profile MikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 29493 - Posted: 9 Jul 2007, 16:01:57 UTC


That should solve it, or at least \'suspend\' (in the \'activity\' menu).

I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 29493 · Report as offensive     Reply Quote
old_user69295

Send message
Joined: 6 Apr 05
Posts: 17
Credit: 744,057
RAC: 0
Message 29831 - Posted: 4 Aug 2007, 0:28:06 UTC - in response to Message 29311.  


The -22 appears instead of a normal error code with the recent batches of the model - the actual problem which caused the model to crash is hidden as a result. I think Tolu is looking at it.


I would also be interested in what causes a -22. My model crashed at 80%+ after over 5,000,000 CPU seconds (that\'s 2 months, folks) on an AMD X2 5600+.

Very discouraging.

--Mike
ID: 29831 · Report as offensive     Reply Quote
Profile MikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 29846 - Posted: 5 Aug 2007, 9:41:27 UTC
Last modified: 5 Aug 2007, 9:42:06 UTC

An error code 22 is meaningless in itself - most crashes end with that error code. The important bit is the error text on that model\'s result page (which is indeed the same \'Missing data in ocean UV fields\' but preceeded by a single NEGATIVE THETA DETECTED which is a new combination to me, and indicates that the model did a day or month restart prior to the main crash).


http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=6506387
Model crashed: umshell1.f: TRANSO2A: Missing data in ocean UV fields A

Model crashed: umshell1.f: ATM_DYN : NEGATIVE THETA DETECTED. A

Model crashed: umshell1.f: TRANSO2A: Missing data in ocean UV fields A

Model crashed: umshell1.f: TRANSO2A: Missing data in ocean UV fields A

Model crashed: umshell1.f: TRANSO2A: Missing data in ocean UV fields A

Model crashed: umshell1.f: TRANSO2A: Missing data in ocean UV fields A

Model crashed: umshell1.f: TRANSO2A: Missing data in ocean UV fields A

Model crashed: umshell1.f: TRANSO2A: Missing data in ocean UV fields A

Model crashed: umshell1.f: TRANSO2A: Missing data in ocean UV fields A
Sorry, too many model crashes! :-(


Is the CPU temperature OK on your PC, or have you overclocked your PC by any chance? It can sometimes cause these problems. Take a look at the README files from my signature, and try running Prime95\'s torture test on it for a day or so.

Another thing to remember is that the model uploads it\'s climate at intervals to the server. While you\'ve been running for 5.8M CPU seconds, only the last 8 hours of CPU time has been lost - the climate data you have already uploaded will be extremely useful to the scientists. The data uploads happened to have included all 3 \'restart\' dumps (1960, 2000 and 2040), so one day in the future your model could be resurrected in the form of a 2040-2080 run (using the 120 years from your PC).


I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 29846 · Report as offensive     Reply Quote
Profile old_user217043

Send message
Joined: 3 Jan 07
Posts: 10
Credit: 634,737
RAC: 0
Message 29848 - Posted: 5 Aug 2007, 16:56:52 UTC

Hi!

Something awful happened to my last wu so here I am, but I am not particularly programming savy, so please be gentle ;-)

The below has repeated itself many times and BOINC manager blew up with a windows pop-up about a buffer overrun in boinc.exe.

I have wanted to upgrade BOINC for awhile, so did that and then downloaded another wu

It\'s been running for about 24 hours and still no trickle up.

Am I running OK now?

Thanks!

D


CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
CPDN NetCDF Err #2 - No such file or directory
CPDN NetCDF Err #2 - No such file or directory
CPDN NetCDF Err #2 - No such file or directory
CPDN NetCDF Err #2 - No such file or directory
MainError: 12:15:26 AM No files match the supplied pattern.
MainError: 12:15:26 AM No files match the supplied pattern.
Fatal crash! :-(

</stderr_txt>
<message>
<file_xfer_error>
<file_name>hadcm3ohe_1ze6_05740864_0_6.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadcm3ohe_1ze6_05740864_0_7.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadcm3ohe_1ze6_05740864_0_8.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadcm3ohe_1ze6_05740864_0_9.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadcm3ohe_1ze6_05740864_0_10.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadcm3ohe_1ze6_05740864_0_11.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadcm3ohe_1ze6_05740864_0_12.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
]]>

Validate state OK
Claimed credit 14,774.40
Granted credit 14,774.40
application version 5.15
Trickle Click here
ID: 29848 · Report as offensive     Reply Quote
Profile old_user217043

Send message
Joined: 3 Jan 07
Posts: 10
Credit: 634,737
RAC: 0
Message 29849 - Posted: 5 Aug 2007, 18:38:53 UTC

P.S. to my previous....

I just noticed that there is something new in CPDN preferences. When I went there neither the HADSM3 or HADCM3 boxes were checked. I checked the HADCM3 box because that is what I am running, yes?

D
ID: 29849 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

Questions and Answers : Windows : Comments for \'Generic solutions to models\' sticky

©2024 climateprediction.net