climateprediction.net home page
Posts by paulf

Posts by paulf

1) Questions and Answers : Windows : Final file upload failing for HADAM3P 6.09 (European and PNW) (Message 47667)
Posted 26 Nov 2013 by paulf
Post:
Hello Geophi,

Thank you for the informative and pragmatic reply. It is encouraging to know the problems exhibited by some of the models on my computers aren't as bad as first made out.

Some background which you may find useful (as may others looking at this thread). Both computers run BONIC when in use.

The 660 computer runs 24h/day, but is somewhat under powered and can churn/freeze briefly at times. BOINC is set to run when computer in use and suspend only when CPU usage is over 50%. I feel this is a balance between keeping BOINC running as much as possible but not impacting the machine performance too much. It has three active BOINC threads and I have set the keep model in memory when suspended.

The 960 computer runs from early evening to morning - it is hibernated during the day. It has plenty of RAM (12Gb) and usually has six active BOINC threads as I don't notice any performance impact. I have selected the keep in memory when suspended option as with the 660.

Your findings match what I had noticed - the hadcm3/RAPIT aborts tend to happen around the 10 year (25%/50%/75%) points. It is rare that I've noticed an abort at other points. I did look into it earlier this year as I was concerned about a computer issue causing models to fail despite it happening on two otherwise unrelated computers. The information I found (sorry I don't have the link to hand) noted that the 10 year (25%/50%/75%) points were the most likely abort points as that is when an unstable model is detected (i.e. due to a starting seed that sent the model unstable). I put my model failures down to that. A brief browse through failed hadcm3/RAPIT models on my computers shows they failed on other computers also. I think I would be more worried if models were aborting only on my computer while passing consistently on others.

I note the comment that the hadcm3/RAPIT models are sensitive to suspends.
When CPDN runs out of work I run an alternative project and if CPDN work becomes available again I may have to suspend the CPDN work temporarily to clear work for the other project. In my experience BOINC is rather poor at scheduling and with other projects having short (2 week) reporting deadlines their work units often go into panic to complete in time randomly bumping CPDN work units out the way. I prefer to suspend the CPDN work units and clear other work to avoid BOINC's haphazard scheduling.

BOINC projects target standard desktop computers where suspends are more likely to happen (as opposed to dedicated server farms where projects can run as the primary task uninterrupted). In that case, as noted also by Jonny below, perhaps the models need to be made more tolerant of these suspends to cope with the desktop computer environment where suspends are inevitable (but the processing time is free to the project). Since CPDN targets the desktop environment the models should be able to cope with suspends in a graceful way.

thanks
P
2) Questions and Answers : Windows : Final file upload failing for HADAM3P 6.09 (European and PNW) (Message 47649)
Posted 25 Nov 2013 by paulf
Post:
Les,
Thanks for the reply - I'm in no urgency to upload as long as someone is looking at it.

I didn't expect to be accused of wasting data sets or be threatened with being blocked though. I joined in June 2003 as one of the Beta testers and I've been running as many models as possible since then. I'm sorry if my participation is not beneficial to the project.

P
3) Questions and Answers : Windows : Final file upload failing for HADAM3P 6.09 (European and PNW) (Message 47646)
Posted 25 Nov 2013 by paulf
Post:
Hi
I'm trying to upload the final output files from two HADAM3P models - one is European and the other is Pacific North Western. I'm running on Windows but I'm not convinced this problem is Windows specific.

The output I get from BOINC is below. Both files upload to 100% complete then the upload fails as shown below. Is there a problem with this destination server?
thanks
P

Mon, 25 Nov 13 18:21:50 climateprediction.net [error] Error reported by file upload server: can't open file /storage/cpdn-restarts/incoming/uploader/hadam3p_eu_q6vh_2009_1_008332044_0_13.zip: No such file or directory
Mon, 25 Nov 13 18:21:50 climateprediction.net Temporarily failed upload of hadam3p_eu_q6vh_2009_1_008332044_0_13.zip: transient upload error
Mon, 25 Nov 13 18:21:50 climateprediction.net Backing off 13 min 50 sec on upload of hadam3p_eu_q6vh_2009_1_008332044_0_13.zip
Mon, 25 Nov 13 18:21:53 climateprediction.net [error] Error reported by file upload server: can't open file /storage/cpdn-restarts/incoming/uploader/hadam3p_pnw_qag3_2043_1_008356940_0_13.zip: No such file or directory
Mon, 25 Nov 13 18:21:53 climateprediction.net Temporarily failed upload of hadam3p_pnw_qag3_2043_1_008356940_0_13.zip: transient upload error
Mon, 25 Nov 13 18:21:53 climateprediction.net Backing off 1 min 56 sec on upload of hadam3p_pnw_qag3_2043_1_008356940_0_13.zip






©2024 climateprediction.net