climateprediction.net home page
Welcome back/checking if everything is working?

Welcome back/checking if everything is working?

Message boards : Number crunching : Welcome back/checking if everything is working?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 3176
Credit: 8,795,408
RAC: 7,129
Message 62751 - Posted: 5 Oct 2020, 13:13:14 UTC - in response to Message 62745.  

Hi Guys,
Re.
UK Met Office Coupled Model Full Resolution Ocean
Weather At Home 2 (wah2) (region independent)
UK Met Office HadAM4 at N144 resolution

Just a thought.
Now that there's not been any new projects for some months. the 'In progress' numbers for these applications are obviously defunct. They were actually at this level a long time before the pandemic hit the world.
They must be hoarded WU's that are past their useful date. None of the users have registered for months.
Couldn't they be zeroed?

Les has contacted the project, some cleaning up will be done but probably not before some more work appears which will be part of the new season Msc programme which should in the next few weeks have work for both Windows and Linux machines. (Not sure about Mac.
ID: 62751 · Report as offensive     Reply Quote
Rayburner

Send message
Joined: 17 Jan 05
Posts: 10
Credit: 23,525,643
RAC: 14
Message 62752 - Posted: 5 Oct 2020, 15:07:32 UTC - in response to Message 62750.  

Should be OK now.


yes, they went through.

Thank You.
ID: 62752 · Report as offensive     Reply Quote
nairb

Send message
Joined: 3 Sep 04
Posts: 75
Credit: 3,797,238
RAC: 6,015
Message 62754 - Posted: 5 Oct 2020, 23:32:34 UTC - in response to Message 62751.  


Les has contacted the project, some cleaning up will be done but probably not before some more work appears which will be part of the new season Msc programme which should in the next few weeks have work for both Windows and Linux machines. (Not sure about Mac.



I presume these will still require some 32bit libs and not the full blown 64bit jobbies. For linux w/u. I had better make sure the fedora 30 hard disk is plugged in.

Assuming I am "lucky" to snare a w/u that is.
ID: 62754 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 3176
Credit: 8,795,408
RAC: 7,129
Message 62755 - Posted: 6 Oct 2020, 6:23:15 UTC

I presume these will still require some 32bit libs and not the full blown 64bit jobbies. For linux w/u. I had better make sure the fedora 30 hard disk is plugged in.


For Linux the tasks will be N216 Hadam4 tasks and hadcm3 so yes do make sure the relevant 32bit libraries for your distribution are installed.

That said, the last testing Hadam4 tasks I ran on my new box with a clean install of xubuntu20.04 ran without my installing them explicitly. I did go on and install the ones that were not there. It may have been that I installed what was needed for them while installing everything including the kitchen sink to enable me to compile BOINC from source.
ID: 62755 · Report as offensive     Reply Quote
bernard_ivo

Send message
Joined: 18 Jul 13
Posts: 396
Credit: 18,843,957
RAC: 11,667
Message 62773 - Posted: 9 Oct 2020, 7:09:24 UTC - in response to Message 62746.  

There's quite a few batches of those, with only a small number left in each one.
I had a look at a few; some are "stuck", but some have just started running, and are returning trickles.

I'll see what the project thinks about wiping everything.


It would be great if some clean up happens. I have one orphaned Full Resolution Ocean since 2014 in my "In progress" web tab and set to expire in 2023. I'm almost there.
ID: 62773 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 9 Dec 05
Posts: 89
Credit: 11,584,566
RAC: 0
Message 62774 - Posted: 9 Oct 2020, 8:16:12 UTC - in response to Message 62773.  

There's quite a few batches of those, with only a small number left in each one.
I had a look at a few; some are "stuck", but some have just started running, and are returning trickles.

I'll see what the project thinks about wiping everything.


It would be great if some clean up happens. I have one orphaned Full Resolution Ocean since 2014 in my "In progress" web tab and set to expire in 2023. I'm almost there.

Can they do the wiping as they need to have all the old tasks available when counting the credits? So nothing can be purged?
ID: 62774 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 3176
Credit: 8,795,408
RAC: 7,129
Message 62775 - Posted: 9 Oct 2020, 9:52:46 UTC

Can they do the wiping as they need to have all the old tasks available when counting the credits? So nothing can be purged?


If everything was wiped that was past the deadline anything being wiped would I guess miss out on credits making CPDN policy on this the same as most other projects where credit is not granted after the deadline. My personal opinion so not with my moderator hat on or expressing any views of the project is that while we have the very long deadlines currently in use this would be no bad thing.
ID: 62775 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7503
Credit: 23,674,684
RAC: 3,244
Message 62776 - Posted: 9 Oct 2020, 10:53:32 UTC

The answer is, that the old data isn't going to be removed.
So, just don't stare at those numbers for long periods. :)
ID: 62776 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 3176
Credit: 8,795,408
RAC: 7,129
Message 62777 - Posted: 9 Oct 2020, 11:31:45 UTC - in response to Message 62776.  

The answer is, that the old data isn't going to be removed.
So, just don't stare at those numbers for long periods. :)


Shucks :)
ID: 62777 · Report as offensive     Reply Quote
Peter Hucker

Send message
Joined: 9 Oct 20
Posts: 156
Credit: 1,761,668
RAC: 0
Message 62853 - Posted: 5 Nov 2020, 11:56:03 UTC - in response to Message 62777.  

The answer is, that the old data isn't going to be removed.
So, just don't stare at those numbers for long periods. :)


Shucks :)


On the server status page, are those numbers real? Have people still got all those tasks and they've not past the deadline yet?

And why do we have a year's deadline for tasks that take 1-3 weeks? Rosetta for example has a 3 day deadline for 8 hour tasks. Most projects have a 2-3 week deadline. I see no advantage to the project, the scientists, or the volunteers, in letting people just store away tasks and never get round to doing them.
ID: 62853 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7503
Credit: 23,674,684
RAC: 3,244
Message 62860 - Posted: 6 Nov 2020, 3:43:44 UTC - in response to Message 62853.  
Last modified: 6 Nov 2020, 3:45:19 UTC

Some/Parts of the numbers are not real. They're left overs from the old system we had, and are there for historical reasons.
The project people are happy with the way that page is, so that's how it will stay.

The 1 year "deadline", as has been pointed out many times, doesn't apply to the tasks; it's there to stop BOINC from having problems when computers are also running other projects, most of which have much shorter task run times.
The deadline here is: ASAP!

The project controls things by closing a batch when the researcher has enough data to work with.
This prevents computers from returning more results, and from getting more credits.
People learn sooner or later.
ID: 62860 · Report as offensive     Reply Quote
Peter Hucker

Send message
Joined: 9 Oct 20
Posts: 156
Credit: 1,761,668
RAC: 0
Message 62861 - Posted: 6 Nov 2020, 12:39:13 UTC - in response to Message 62860.  
Last modified: 6 Nov 2020, 13:12:40 UTC

Some/Parts of the numbers are not real. They're left overs from the old system we had, and are there for historical reasons.
The project people are happy with the way that page is, so that's how it will stay.
Isn't the page to show the volunteers how much is available etc? It seems to serve no purpose at all, since the numbers bear no relation to anything.

The 1 year "deadline", as has been pointed out many times, doesn't apply to the tasks; it's there to stop BOINC from having problems when computers are also running other projects, most of which have much shorter task run times.
The deadline here is: ASAP!
I see. I did think they were rather long deadlines and I'd get them done much quicker than that. So I guess it's to allow Boinc to pause your huge tasks to finish off one on another project that has to be done by tomorrow? Yip, I can see that happening right now, to some extent, with some Rosetta that are due shortly.

Although Boinc isn't very good at managing multicore and singlecore tasks at the same time. I also run Primegrid, and Boinc has managed to get a computer running a 4 core Primegrid and 2 singlecore Rosettas on a 4 core CPU..... You see, if it was sensible, it could see the Primegrid has oodles of time to finish, the two Rosettas are urgent, so just run the Rosettas and download two other single core tasks to keep it busy, leaving the Primegrid till later.

The project controls things by closing a batch when the researcher has enough data to work with.
This prevents computers from returning more results, and from getting more credits.
I see it's trickling up partial results from my tasks and crediting me. And a couple of my tasks failed due to a computer restart which seems to corrupt something. So are the partial results useful? Will the remainder of that task be sent out as a retread, or the whole thing from the start?

People learn sooner or later.

So if someone leaves it too late to send it back, can the server tell their computer to abort? Or does it sit crunching for weeks pointlessly and without credit? If the latter, the user will probably never know unless they're keeping a very close eye on their credits.
ID: 62861 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 3176
Credit: 8,795,408
RAC: 7,129
Message 62862 - Posted: 6 Nov 2020, 13:39:35 UTC - in response to Message 62861.  

I see it's trickling up partial results from my tasks and crediting me. And a couple of my tasks failed due to a computer restart which seems to corrupt something. So are the partial results useful? Will the remainder of that task be sent out as a retread, or the whole thing from the start?


The whole task is sent out again.

So if someone leaves it too late to send it back, can the server tell their computer to abort? Or does it sit crunching for weeks pointlessly and without credit? If the latter, the user will probably never know unless they're keeping a very close eye on their credits.


The server can but it doesn't. About the only time CPDN uses that feature of the BOINC server code is when there are serious problems with a batch. I don't have much sympathy for those who will miss out on the credits as they will be people who don't look at their computers often enough to notice anyway.
ID: 62862 · Report as offensive     Reply Quote
Peter Hucker

Send message
Joined: 9 Oct 20
Posts: 156
Credit: 1,761,668
RAC: 0
Message 62863 - Posted: 6 Nov 2020, 13:54:32 UTC - in response to Message 62862.  

I see it's trickling up partial results from my tasks and crediting me. And a couple of my tasks failed due to a computer restart which seems to corrupt something. So are the partial results useful? Will the remainder of that task be sent out as a retread, or the whole thing from the start?
The whole task is sent out again.
That's a shame. If my computer can store a checkpoint and continue after switching tasks, running an exclusive application, or restarting the computer, can't that be used to tell someone else's PC how to continue? Or is the checkpoint CPU-specific?

So if someone leaves it too late to send it back, can the server tell their computer to abort? Or does it sit crunching for weeks pointlessly and without credit? If the latter, the user will probably never know unless they're keeping a very close eye on their credits.
The server can but it doesn't. About the only time CPDN uses that feature of the BOINC server code is when there are serious problems with a batch. I don't have much sympathy for those who will miss out on the credits as they will be people who don't look at their computers often enough to notice anyway.
Why not send out a cancel message? You're just wasting the CPU time on someone's computer, doing work that will never be used.

The trouble is your 1 year deadline is making my Boinc client put them on the back burner and do other tasks with shorter deadlines. I've had to manually suspend Primegrid tasks to let yours continue. Mind you that's also the fault of Boinc/Primegrid for giving me 3 weeks of processing to do when my buffer is set to 3 hours!
ID: 62863 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 3176
Credit: 8,795,408
RAC: 7,129
Message 62864 - Posted: 6 Nov 2020, 14:08:13 UTC

That's a shame. If my computer can store a checkpoint and continue after switching tasks, running an exclusive application, or restarting the computer, can't that be used to tell someone else's PC how to continue? Or is the checkpoint CPU-specific?


CPDN tasks are a bit strange in that the same task if sent to two different computers each completing it may not produce the exact same data. Statistical methods are used to determine which results are useful and which are not. There can be differences between AMD and Intel processors and even differences between different CPU's by the same manufacturer. There used to be tasks that would go out to both Window and Linux machines but this was stopped in order to reduce the number of variables. Having a task start on one machine and finish on another could cause more problems.

Another issue is that the code for these tasks is propitiatory from the Met Office and the license Oxford has from them doesn't let them mess about with it to any great extent so they would have to write their own code to interface with that from the Met Office to produce the partial tasks.
ID: 62864 · Report as offensive     Reply Quote
Peter Hucker

Send message
Joined: 9 Oct 20
Posts: 156
Credit: 1,761,668
RAC: 0
Message 62865 - Posted: 6 Nov 2020, 14:17:51 UTC - in response to Message 62864.  

That's a shame. If my computer can store a checkpoint and continue after switching tasks, running an exclusive application, or restarting the computer, can't that be used to tell someone else's PC how to continue? Or is the checkpoint CPU-specific?


CPDN tasks are a bit strange in that the same task if sent to two different computers each completing it may not produce the exact same data. Statistical methods are used to determine which results are useful and which are not. There can be differences between AMD and Intel processors and even differences between different CPU's by the same manufacturer. There used to be tasks that would go out to both Window and Linux machines but this was stopped in order to reduce the number of variables. Having a task start on one machine and finish on another could cause more problems.

Another issue is that the code for these tasks is propitiatory from the Met Office and the license Oxford has from them doesn't let them mess about with it to any great extent so they would have to write their own code to interface with that from the Met Office to produce the partial tasks.


That's not unique to CPDN. I've seen the same problem at other projects, specifically with GPU and CPU versions of the same task. I think it was a programmer at WCG working on a GPU version (which they don't currently have), was saying there was an additional 6% error margin on the GPU version. I always thought computers were precise, so I don't understand how a program can give an approximate answer! I guess it's compounding of rounding errors?
ID: 62865 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 9 Dec 05
Posts: 89
Credit: 11,584,566
RAC: 0
Message 62868 - Posted: 6 Nov 2020, 19:12:49 UTC - in response to Message 62863.  

The trouble is your 1 year deadline is making my Boinc client put them on the back burner and do other tasks with shorter deadlines. I've had to manually suspend Primegrid tasks to let yours continue. Mind you that's also the fault of Boinc/Primegrid for giving me 3 weeks of processing to do when my buffer is set to 3 hours!

Boinc is designed to run tasks in FIFO order so that shorter deadline tasks don't take over the resources (CPU/GPU). Only exception is if Boinc thinks that a task is going to miss the deadline, then that task is expedited. But this happens only about 1 -1½ days before the deadline.
ID: 62868 · Report as offensive     Reply Quote
Peter Hucker

Send message
Joined: 9 Oct 20
Posts: 156
Credit: 1,761,668
RAC: 0
Message 62871 - Posted: 6 Nov 2020, 20:00:41 UTC - in response to Message 62868.  

The trouble is your 1 year deadline is making my Boinc client put them on the back burner and do other tasks with shorter deadlines. I've had to manually suspend Primegrid tasks to let yours continue. Mind you that's also the fault of Boinc/Primegrid for giving me 3 weeks of processing to do when my buffer is set to 3 hours!

Boinc is designed to run tasks in FIFO order so that shorter deadline tasks don't take over the resources (CPU/GPU). Only exception is if Boinc thinks that a task is going to miss the deadline, then that task is expedited. But this happens only about 1 -1½ days before the deadline.
That's not what I've seen. It does FIFO order within each project, but it uses what they call "short term debt" to decide to take a task from project A or project B if you have several tasks queued for each. You can see it happen if you change the weighting of a project, so Boinc tries to meet that new weighting by only doing the tasks it has for the higher weighting project.

As for the panic mode, it always does that slightly too late! It's approximately at the time it needs to complete it - eg. a task needs 5 hours to run, it will start it 6 hours before the deadline, which is no good if the computer is turned off or plays a game!

Anyway, mine is in panic mode because Primegrid gave me too much work to do. So Boinc has correctly assumed that CPDN doesn't need them back for a year so was only doing Primegrid until I intervened. I really can't see the problem in changing the deadline to say 1 month (or whatever is long enough for most people to be able to do them).
ID: 62871 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 3176
Credit: 8,795,408
RAC: 7,129
Message 62873 - Posted: 6 Nov 2020, 20:11:34 UTC - in response to Message 62871.  

How projects play together is something I know little about because I only run CPDN tasks except when none are available so my knowledge of it is nearly all from reading posts here and on the BOINC fora.
ID: 62873 · Report as offensive     Reply Quote
Peter Hucker

Send message
Joined: 9 Oct 20
Posts: 156
Credit: 1,761,668
RAC: 0
Message 62875 - Posted: 6 Nov 2020, 20:18:58 UTC - in response to Message 62873.  
Last modified: 6 Nov 2020, 20:19:29 UTC

How projects play together is something I know little about because I only run CPDN tasks except when none are available so my knowledge of it is nearly all from reading posts here and on the BOINC fora.


Do you manage to have CPDN running non stop? Is there enough Linux work to keep it busy? Or do you just let the computer doze off inbetween? I like my 66 CPU cores and 4 GPUs to be doing something all the time. My wallet does not.
ID: 62875 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · Next

Message boards : Number crunching : Welcome back/checking if everything is working?

©2021 climateprediction.net