climateprediction.net home page
WAH2 CREDITS SET TO LOW

WAH2 CREDITS SET TO LOW

Message boards : Number crunching : WAH2 CREDITS SET TO LOW
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1134
Credit: 20,798,831
RAC: 4,794
Message 52604 - Posted: 22 Sep 2015, 17:38:50 UTC


ID: 52604 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 2938
Credit: 5,696,404
RAC: 16,804
Message 52605 - Posted: 22 Sep 2015, 17:42:39 UTC - in response to Message 52604.  

Noted on moderators email list. I can't comment personally as these tasks aren't available for my OS.

Dave
ID: 52605 · Report as offensive     Reply Quote
Misty

Send message
Joined: 14 Feb 06
Posts: 50
Credit: 7,976,305
RAC: 227
Message 52606 - Posted: 22 Sep 2015, 19:42:48 UTC
Last modified: 22 Sep 2015, 19:49:54 UTC

For other models too, credits relate to crunching effort only very loosely.

For example some recent WUs on this machine worked out as follows:

Model _____ Run time (s) / Credit


HadAM3P-HadRM3P Africa v7.22 _____ 36
HadAM3P-HadRM3P Australia v6.10 _____ 85
HadAM3P-HadRM3P Pacific North West v7.27 _____ 108
HadCM3 short v7.24 _____ 160
WAH2 v7.05 _____ 460

That's nearly a 13-fold range.
ID: 52606 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 2938
Credit: 5,696,404
RAC: 16,804
Message 52607 - Posted: 22 Sep 2015, 21:47:00 UTC - in response to Message 52606.  

It will almost certainly get sorted at some point now the project people are aware of it based on experience. I should check on my own tasks some time and see how they compare for value. Something I have noticed is that some of the short tasks show up initially as only giving half the credits they should give but some time a few hours or the following day they show the full credit.
ID: 52607 · Report as offensive     Reply Quote
jrapdx

Send message
Joined: 4 Jul 15
Posts: 63
Credit: 3,220,720
RAC: 577
Message 52608 - Posted: 23 Sep 2015, 0:24:47 UTC
Last modified: 23 Sep 2015, 0:35:41 UTC

I haven't checked too closely the wah2 tasks running on my Windows system. Quite possibly they are slower than others.

What I have noticed is the new model, hadam3pm2_*, that started ~3.5 hours ago under Linux. BOINC gave initial estimate of remaining time as 77:05:26, but more than 3 hours later, the time left was 77:01:24. For this interval, progress was ~0.09%. The salient issue is seeing only 4 minutes of reduction in time to completion vs. task running for a >3 hours.

IOW going by the situation at this early point, achieving 100% progress--reaching no remaining hours--will require >3000 hours wall time, or ~2500 hours estimated elapsed time. Either way, by a factor of 10 this exceeds, for example, what hadam3prm3pm2t_eu tasks have used.

It appears the hadam3pm2 model badly underestimates run time, perhaps due to a "bug" or logic error. Anyway, if my initial estimates hold up, I think it could be very discouraging for volunteers to observe such glacial progress, let alone the effects of greatly reduced credits as a result.

I believe the research is important and I want to support it. Unexplained inconsistency of the sort under discussion wouldn't help convince people to continue to participate.

Certainly, if I've misconstrued something I hope someone will tell me what I'm missing.
ID: 52608 · Report as offensive     Reply Quote
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1134
Credit: 20,798,831
RAC: 4,794
Message 52609 - Posted: 23 Sep 2015, 4:11:34 UTC - in response to Message 52608.  


ID: 52609 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 2938
Credit: 5,696,404
RAC: 16,804
Message 52611 - Posted: 23 Sep 2015, 6:18:05 UTC

If you are talking about the hadam3pm2 (hadam3p model with MOSES II land scheme) (currently no graphics) (Linux only)
as opposed to the ones below on the server status page, hadam3prm3pm2t_eu (hadam3p global model with hadrm3p regional model with MOSES II land scheme and TRIFFID available) (currently no graphics) (Linux only) unless you are going to run the computer 24/7 with no risk of the task being interrupted it may be better to abort it. Even suspending the tasks often leads to them completing but not producing the data needed.
ID: 52611 · Report as offensive     Reply Quote
jrapdx

Send message
Joined: 4 Jul 15
Posts: 63
Credit: 3,220,720
RAC: 577
Message 52612 - Posted: 23 Sep 2015, 7:08:21 UTC - in response to Message 52609.  

"Completing in a reasonable amount of time" is exactly what I was wondering about. At this point, the hadam3pm2 task has been running for ~10.5 hours and shows only 13 minutes drop in remaining time to completion. (And just 0.289% progress.) Definitely not a "strong" start.

I understand that the estimates of time remaining are simply approximations, subject to big change as the computation goes on. I'm willing to let it continue to see how it shapes up. So far though the pattern has been consistent: at the rate it's going, it will take roughly 3633 hours (real time) to finish.

That's 151 days, assuming the computer is operational 24/7 and not used for other compute-intensive work such as compiling software, etc. Definitely not the typical course of model-running I've so far seen, but I don't know, maybe it's more "normal" than I would expect.
ID: 52612 · Report as offensive     Reply Quote
jrapdx

Send message
Joined: 4 Jul 15
Posts: 63
Credit: 3,220,720
RAC: 577
Message 52613 - Posted: 23 Sep 2015, 7:39:09 UTC - in response to Message 52611.  

Yes, it's the hadam3pm2 I was concerned about. My thought is to let it go on for a while to see if its "slowness" persists. It's possible I'm wrong, or getting excited prematurely; in any case it should be clearer after a few more days.

Actually, I'm really curious to know if anyone else running the model has encountered the same issues--or not.
ID: 52613 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 2938
Credit: 5,696,404
RAC: 16,804
Message 52614 - Posted: 23 Sep 2015, 7:40:14 UTC - in response to Message 52612.  

If you are talking about Computer 1373652 with only 1/2 GB of memory, that leaves very little to run BOINC and means the swap file will be in constant use slowing things down a lot. I have seen higher amounts of memory suggested but 500MB/task plus 500MB for the OS and anything else going on is really the absolute minimum. The cpu is also a bit on the slow side for running CPDN, Though I have successfully run tasks on an atom powered netbook. They do however take thousands of hours to complete and that is with 2GB of memory between the two cores.
ID: 52614 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 2938
Credit: 5,696,404
RAC: 16,804
Message 52615 - Posted: 23 Sep 2015, 8:19:34 UTC - in response to Message 52614.  

This task has failed because of lack of ram. http://climateapps2.oerc.ox.ac.uk/cpdnboinc/result.php?resultid=18287630 click on the plus sign by stderr to see the error message.
ID: 52615 · Report as offensive     Reply Quote
Profile Conan
Avatar

Send message
Joined: 6 Jul 06
Posts: 98
Credit: 1,664,649
RAC: 0
Message 52616 - Posted: 23 Sep 2015, 11:04:00 UTC - in response to Message 52613.  
Last modified: 23 Sep 2015, 11:12:23 UTC

Yes, it's the hadam3pm2 I was concerned about. My thought is to let it go on for a while to see if its "slowness" persists. It's possible I'm wrong, or getting excited prematurely; in any case it should be clearer after a few more days.

Actually, I'm really curious to know if anyone else running the model has encountered the same issues--or not.


G'Day jrapdx,

I have noticed the same on my AMD 1090 Phenom, first estimate was for 96 Hours to completion. After running for 8 and 3/4 hours it is now up to 113 Hours to run with just 0.257% completed. That's an estimate of 3,284 Hours run time.
That is getting back to the old days when Climate ran for months to get a WU to complete.
I doubt that it will take this long and that the original 96 Hours is closer to the mark, however going on a previous failed WU it will take over 250 hours.

Conan
ID: 52616 · Report as offensive     Reply Quote
jrapdx

Send message
Joined: 4 Jul 15
Posts: 63
Credit: 3,220,720
RAC: 577
Message 52622 - Posted: 23 Sep 2015, 22:14:27 UTC - in response to Message 52614.  

That computer (1373652) has been out of service for weeks, not involved at all.

I'm talking about a task running on 1373243, namely "hadam3pm2_d7ni_*" (18995341), entirely distinct from what you commented on.
ID: 52622 · Report as offensive     Reply Quote
jrapdx

Send message
Joined: 4 Jul 15
Posts: 63
Credit: 3,220,720
RAC: 577
Message 52623 - Posted: 23 Sep 2015, 22:36:13 UTC - in response to Message 52616.  

Conan,

Thanks for that info, it's good to know I've not missed the boat altogether.

Today, about 26 hours after the task started, it's reached 0.699% progress, and still shows 76:33 time to completion, down from initial 77:06 estimate. The task is running very consistently at this rate, as it will still take >3600 hours to finish.

Curiously, it is communicating with the servers frequently. There have been 10 trickles reported since yesterday, occurring every 2 to 2.5 hours or so! To this point it's incremented my total credits by 168.28. Very interesting.

I hope I don't upset it too much when I have to move the computer a few hours from now. (Remodeling of the space where the computer normally resides is about done, and I need to move it back.) I'm keeping a good thought shutting down, restarting the machines doesn't cause any problems.

Jules.
ID: 52623 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7352
Credit: 23,425,081
RAC: 1
Message 52624 - Posted: 24 Sep 2015, 0:34:12 UTC - in response to Message 52623.  

Shutting down:

1) Check the Transfers tab.
If there are any uploads/downloads, wait.

2) In the menu under Activity, click on Network activity suspended.
This prevents any more transfers from starting while you're doing the following.

3) In the menu under Activity, click on Suspend.
This shuts down BOINC.

4) In the menu under File, click Exit BOINC Manager.

5) In the pop up window, click on the option to stop running tasks.
(I forget the wording.)

6) Wait until BOINC disappears from the desktop.

7) Shut down the computer.

Reverse 3) and 2) to start running again.


*********************

It's been said that BOINC takes about 10 tasks from a given project to work out times. On cpdn, it needs to be "several" of EVERY different model type, to work out how long that type of model is going to take.

ID: 52624 · Report as offensive     Reply Quote
Profile Byron Leigh Hatch @ team Carl Sagan
Avatar

Send message
Joined: 17 Aug 04
Posts: 289
Credit: 44,103,664
RAC: 0
Message 52626 - Posted: 24 Sep 2015, 5:58:04 UTC - in response to Message 52604.  
Last modified: 24 Sep 2015, 6:17:45 UTC

ID: 52626 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7352
Credit: 23,425,081
RAC: 1
Message 52627 - Posted: 24 Sep 2015, 6:38:47 UTC - in response to Message 52626.  

Hi Byron

That first screen shot is interesting to me, as it shows how long those new models are taking, something that I'm never going to see, now I don't run Windows.

That's something like 10 days, as against the 6 days that the hadam3prm3pm2t_eu are taking me.

And for anyone who missed it:
Which experiment is this work unit for?

where it says:
with improved land surface models and at a 25km resolution over Europe

That means higher resolution modelling, with more data returned, and s l o w e r running.


ID: 52627 · Report as offensive     Reply Quote
jrapdx

Send message
Joined: 4 Jul 15
Posts: 63
Credit: 3,220,720
RAC: 577
Message 52628 - Posted: 24 Sep 2015, 7:30:30 UTC - in response to Message 52624.  

Thanks for instructions re: shutting down the computer. Your timing was perfect, the remodeling work wrapped up late in the day prompting me to postpone the shutdown until morning, so I will soon get to use your good advice...
ID: 52628 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 387
Credit: 10,590,320
RAC: 16,269
Message 52629 - Posted: 24 Sep 2015, 8:25:36 UTC - in response to Message 52624.  

It's been said that BOINC takes about 10 tasks from a given project to work out times. On cpdn, it needs to be "several" of EVERY different model type, to work out how long that type of model is going to take.

The remark about projects needing 10 completed tasks (actually 11 - "more than 10") before calculating realistic initial runtime estimates applies to projects running the 'CreditNew' version of the server code. Doesn't matter whether they actually give credit that way - it's the runtime estimation which is all done on the server in those cases.

But we don't have that server code at this project. Any adjustment to the initial estimates is done locally by the old DCF mechanism - and that can only keep track of one value at a time. Unless the task sizes estimated by the project are accurately in proportion to their eventual total running time, different model types will pull DCF in different directions, and it'll never be able to settle to a common value which is right for all models.
ID: 52629 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 387
Credit: 10,590,320
RAC: 16,269
Message 52630 - Posted: 24 Sep 2015, 9:21:54 UTC - in response to Message 52626.  

my computer 1364207

Byron, I'd be wary about taking too much timing data from that machine - it looks to be under considerable stress, and is throwing a lot of errors.

Error tasks for computer 1364207

Even with 256 GB of RAM to support the 40 running models - 6 GB each should be plenty - can the hard disk system cope with all 40 models checkpointing and preparing upload files in quick succession? That could be a big bottleneck, since all the disk accesses have to be to the same drive (or presumably RAID array) over the same interface. CPDN models can be sensitive to delays around those upload generation moments.
ID: 52630 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : WAH2 CREDITS SET TO LOW

©2021 climateprediction.net