|
Message boards :
Number crunching :
WAH2 CREDITS SET TO LOW
Message board moderation
Author | Message |
---|---|
![]() Send message Joined: 31 Dec 07 Posts: 1134 Credit: 20,798,831 RAC: 4,794 |
|
![]() Volunteer moderator Send message Joined: 15 May 09 Posts: 2949 Credit: 5,696,404 RAC: 16,804 |
Noted on moderators email list. I can't comment personally as these tasks aren't available for my OS. Dave |
Misty Send message Joined: 14 Feb 06 Posts: 50 Credit: 7,976,305 RAC: 227 |
For other models too, credits relate to crunching effort only very loosely. For example some recent WUs on this machine worked out as follows: Model _____ Run time (s) / Credit HadAM3P-HadRM3P Africa v7.22 _____ 36 HadAM3P-HadRM3P Australia v6.10 _____ 85 HadAM3P-HadRM3P Pacific North West v7.27 _____ 108 HadCM3 short v7.24 _____ 160 WAH2 v7.05 _____ 460 That's nearly a 13-fold range. |
![]() Volunteer moderator Send message Joined: 15 May 09 Posts: 2949 Credit: 5,696,404 RAC: 16,804 |
It will almost certainly get sorted at some point now the project people are aware of it based on experience. I should check on my own tasks some time and see how they compare for value. Something I have noticed is that some of the short tasks show up initially as only giving half the credits they should give but some time a few hours or the following day they show the full credit. |
jrapdx Send message Joined: 4 Jul 15 Posts: 63 Credit: 3,220,720 RAC: 577 |
I haven't checked too closely the wah2 tasks running on my Windows system. Quite possibly they are slower than others. What I have noticed is the new model, hadam3pm2_*, that started ~3.5 hours ago under Linux. BOINC gave initial estimate of remaining time as 77:05:26, but more than 3 hours later, the time left was 77:01:24. For this interval, progress was ~0.09%. The salient issue is seeing only 4 minutes of reduction in time to completion vs. task running for a >3 hours. IOW going by the situation at this early point, achieving 100% progress--reaching no remaining hours--will require >3000 hours wall time, or ~2500 hours estimated elapsed time. Either way, by a factor of 10 this exceeds, for example, what hadam3prm3pm2t_eu tasks have used. It appears the hadam3pm2 model badly underestimates run time, perhaps due to a "bug" or logic error. Anyway, if my initial estimates hold up, I think it could be very discouraging for volunteers to observe such glacial progress, let alone the effects of greatly reduced credits as a result. I believe the research is important and I want to support it. Unexplained inconsistency of the sort under discussion wouldn't help convince people to continue to participate. Certainly, if I've misconstrued something I hope someone will tell me what I'm missing. |
![]() Send message Joined: 31 Dec 07 Posts: 1134 Credit: 20,798,831 RAC: 4,794 |
|
![]() Volunteer moderator Send message Joined: 15 May 09 Posts: 2949 Credit: 5,696,404 RAC: 16,804 |
If you are talking about the hadam3pm2 (hadam3p model with MOSES II land scheme) (currently no graphics) (Linux only) as opposed to the ones below on the server status page, hadam3prm3pm2t_eu (hadam3p global model with hadrm3p regional model with MOSES II land scheme and TRIFFID available) (currently no graphics) (Linux only) unless you are going to run the computer 24/7 with no risk of the task being interrupted it may be better to abort it. Even suspending the tasks often leads to them completing but not producing the data needed. |
jrapdx Send message Joined: 4 Jul 15 Posts: 63 Credit: 3,220,720 RAC: 577 |
"Completing in a reasonable amount of time" is exactly what I was wondering about. At this point, the hadam3pm2 task has been running for ~10.5 hours and shows only 13 minutes drop in remaining time to completion. (And just 0.289% progress.) Definitely not a "strong" start. I understand that the estimates of time remaining are simply approximations, subject to big change as the computation goes on. I'm willing to let it continue to see how it shapes up. So far though the pattern has been consistent: at the rate it's going, it will take roughly 3633 hours (real time) to finish. That's 151 days, assuming the computer is operational 24/7 and not used for other compute-intensive work such as compiling software, etc. Definitely not the typical course of model-running I've so far seen, but I don't know, maybe it's more "normal" than I would expect. |
jrapdx Send message Joined: 4 Jul 15 Posts: 63 Credit: 3,220,720 RAC: 577 |
Yes, it's the hadam3pm2 I was concerned about. My thought is to let it go on for a while to see if its "slowness" persists. It's possible I'm wrong, or getting excited prematurely; in any case it should be clearer after a few more days. Actually, I'm really curious to know if anyone else running the model has encountered the same issues--or not. |
![]() Volunteer moderator Send message Joined: 15 May 09 Posts: 2949 Credit: 5,696,404 RAC: 16,804 |
If you are talking about Computer 1373652 with only 1/2 GB of memory, that leaves very little to run BOINC and means the swap file will be in constant use slowing things down a lot. I have seen higher amounts of memory suggested but 500MB/task plus 500MB for the OS and anything else going on is really the absolute minimum. The cpu is also a bit on the slow side for running CPDN, Though I have successfully run tasks on an atom powered netbook. They do however take thousands of hours to complete and that is with 2GB of memory between the two cores. |
![]() Volunteer moderator Send message Joined: 15 May 09 Posts: 2949 Credit: 5,696,404 RAC: 16,804 |
This task has failed because of lack of ram. http://climateapps2.oerc.ox.ac.uk/cpdnboinc/result.php?resultid=18287630 click on the plus sign by stderr to see the error message. |
![]() ![]() Send message Joined: 6 Jul 06 Posts: 98 Credit: 1,664,649 RAC: 0 |
Yes, it's the hadam3pm2 I was concerned about. My thought is to let it go on for a while to see if its "slowness" persists. It's possible I'm wrong, or getting excited prematurely; in any case it should be clearer after a few more days. G'Day jrapdx, I have noticed the same on my AMD 1090 Phenom, first estimate was for 96 Hours to completion. After running for 8 and 3/4 hours it is now up to 113 Hours to run with just 0.257% completed. That's an estimate of 3,284 Hours run time. That is getting back to the old days when Climate ran for months to get a WU to complete. I doubt that it will take this long and that the original 96 Hours is closer to the mark, however going on a previous failed WU it will take over 250 hours. Conan |
jrapdx Send message Joined: 4 Jul 15 Posts: 63 Credit: 3,220,720 RAC: 577 |
That computer (1373652) has been out of service for weeks, not involved at all. I'm talking about a task running on 1373243, namely "hadam3pm2_d7ni_*" (18995341), entirely distinct from what you commented on. |
jrapdx Send message Joined: 4 Jul 15 Posts: 63 Credit: 3,220,720 RAC: 577 |
Conan, Thanks for that info, it's good to know I've not missed the boat altogether. Today, about 26 hours after the task started, it's reached 0.699% progress, and still shows 76:33 time to completion, down from initial 77:06 estimate. The task is running very consistently at this rate, as it will still take >3600 hours to finish. Curiously, it is communicating with the servers frequently. There have been 10 trickles reported since yesterday, occurring every 2 to 2.5 hours or so! To this point it's incremented my total credits by 168.28. Very interesting. I hope I don't upset it too much when I have to move the computer a few hours from now. (Remodeling of the space where the computer normally resides is about done, and I need to move it back.) I'm keeping a good thought shutting down, restarting the machines doesn't cause any problems. Jules. |
Les Bayliss Volunteer moderator Send message Joined: 5 Sep 04 Posts: 7355 Credit: 23,425,081 RAC: 1 |
Shutting down: 1) Check the Transfers tab. If there are any uploads/downloads, wait. 2) In the menu under Activity, click on Network activity suspended. This prevents any more transfers from starting while you're doing the following. 3) In the menu under Activity, click on Suspend. This shuts down BOINC. 4) In the menu under File, click Exit BOINC Manager. 5) In the pop up window, click on the option to stop running tasks. (I forget the wording.) 6) Wait until BOINC disappears from the desktop. 7) Shut down the computer. Reverse 3) and 2) to start running again. ********************* It's been said that BOINC takes about 10 tasks from a given project to work out times. On cpdn, it needs to be "several" of EVERY different model type, to work out how long that type of model is going to take. |
![]() ![]() Send message Joined: 17 Aug 04 Posts: 289 Credit: 44,103,664 RAC: 0 |
|
Les Bayliss Volunteer moderator Send message Joined: 5 Sep 04 Posts: 7355 Credit: 23,425,081 RAC: 1 |
Hi Byron That first screen shot is interesting to me, as it shows how long those new models are taking, something that I'm never going to see, now I don't run Windows. That's something like 10 days, as against the 6 days that the hadam3prm3pm2t_eu are taking me. And for anyone who missed it: Which experiment is this work unit for? where it says: with improved land surface models and at a 25km resolution over Europe That means higher resolution modelling, with more data returned, and s l o w e r running. |
jrapdx Send message Joined: 4 Jul 15 Posts: 63 Credit: 3,220,720 RAC: 577 |
Thanks for instructions re: shutting down the computer. Your timing was perfect, the remodeling work wrapped up late in the day prompting me to postpone the shutdown until morning, so I will soon get to use your good advice... |
Richard Haselgrove Send message Joined: 1 Jan 07 Posts: 387 Credit: 10,590,320 RAC: 16,269 |
It's been said that BOINC takes about 10 tasks from a given project to work out times. On cpdn, it needs to be "several" of EVERY different model type, to work out how long that type of model is going to take. The remark about projects needing 10 completed tasks (actually 11 - "more than 10") before calculating realistic initial runtime estimates applies to projects running the 'CreditNew' version of the server code. Doesn't matter whether they actually give credit that way - it's the runtime estimation which is all done on the server in those cases. But we don't have that server code at this project. Any adjustment to the initial estimates is done locally by the old DCF mechanism - and that can only keep track of one value at a time. Unless the task sizes estimated by the project are accurately in proportion to their eventual total running time, different model types will pull DCF in different directions, and it'll never be able to settle to a common value which is right for all models. |
Richard Haselgrove Send message Joined: 1 Jan 07 Posts: 387 Credit: 10,590,320 RAC: 16,269 |
my computer 1364207 Byron, I'd be wary about taking too much timing data from that machine - it looks to be under considerable stress, and is throwing a lot of errors. Error tasks for computer 1364207 Even with 256 GB of RAM to support the 40 running models - 6 GB each should be plenty - can the hard disk system cope with all 40 models checkpointing and preparing upload files in quick succession? That could be a big bottleneck, since all the disk accesses have to be to the same drive (or presumably RAID array) over the same interface. CPDN models can be sensitive to delays around those upload generation moments. |
©2021 climateprediction.net