climateprediction.net (CPDN) home page
Thread 'Scheduler request too recent'

Thread 'Scheduler request too recent'

Message boards : Number crunching : Scheduler request too recent
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
theapc

Send message
Joined: 27 Jul 16
Posts: 10
Credit: 55,923
RAC: 0
Message 61815 - Posted: 25 Dec 2019, 16:18:51 UTC
Last modified: 25 Dec 2019, 16:19:27 UTC

What's the delay set for scheduling requests? I had a large number of tasks die with computation error and trying to troubleshoot, but keep getting denied new work:

12/25/2019 10:16:11 AM | climateprediction.net | Not sending work - last request too recent: 2328 sec

Seems a bit excessive.
ID: 61815 · Report as offensive     Reply Quote
Bryn Mawr

Send message
Joined: 28 Jul 19
Posts: 150
Credit: 12,830,559
RAC: 228
Message 61816 - Posted: 25 Dec 2019, 16:30:26 UTC - in response to Message 61815.  

What's the delay set for scheduling requests? I had a large number of tasks die with computation error and trying to troubleshoot, but keep getting denied new work:

12/25/2019 10:16:11 AM | climateprediction.net | Not sending work - last request too recent: 2328 sec

Seems a bit excessive.


It appears to be set at an hour plus a few seconds
ID: 61816 · Report as offensive     Reply Quote
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 61819 - Posted: 25 Dec 2019, 20:52:40 UTC - in response to Message 61816.  

It appears to be set at an hour plus a few seconds

Right. It will request work only after 1 hour of no communication with the project. So clicking the Update button in the Projects tab of boinc manager with climateprediction.net selected is counterproductive for requesting new work, unless it's been over an hour since the last scheduler request.
ID: 61819 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 61820 - Posted: 25 Dec 2019, 20:54:46 UTC - in response to Message 61815.  

Seems a bit excessive.

I run into it all the time. Most projects us a couple of minutes. I would think that would be sufficient to prevent server overload.
ID: 61820 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 61822 - Posted: 25 Dec 2019, 21:05:05 UTC - in response to Message 61820.  

It's an attempt to limit seral killers from grabbing new work to kill too quickly.
ID: 61822 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 61827 - Posted: 25 Dec 2019, 23:15:53 UTC - in response to Message 61822.  

It's an attempt to limit seral killers from grabbing new work to kill too quickly.

To flesh this out a bit more, if a computer with 48 cores is missing the 32 bit libraries, crashes them in about six seconds or less and then requests more,2 minutes could easily result in one machine crashing a large number of tasks. And there have been more than a few such machines recently. An hour following a crash doesn't seem excessive to me. I don't know if there is a way it could be managed to reduce the time interval when the request is not following a crash. I suspect that would need changes to the BOINC code.
ID: 61827 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 61828 - Posted: 25 Dec 2019, 23:41:59 UTC - in response to Message 61827.  

OK, it is for the special needs of this project. I can live with it if it is necessary.
Thanks.
ID: 61828 · Report as offensive     Reply Quote
crashtech

Send message
Joined: 1 Jun 17
Posts: 13
Credit: 30,531,193
RAC: 28,969
Message 61866 - Posted: 31 Dec 2019, 22:53:00 UTC

So, I'm getting this 3600+ second wait even though NO work has been sent to the client. I'm wondering how many hours it will sit idling, waiting for work that may never come?
ID: 61866 · Report as offensive     Reply Quote
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 61867 - Posted: 1 Jan 2020, 0:08:42 UTC - in response to Message 61866.  

So, I'm getting this 3600+ second wait even though NO work has been sent to the client. I'm wondering how many hours it will sit idling, waiting for work that may never come?

According to the server status page, there is no work for Windows computers at this time (WAH2 models). https://www.cpdn.org/server_status.php

These come in batches and we as moderators seldom know ahead of time when a batch will be released. With a lot of Windows computers attached, the tasks in the queue don't take long to be snapped up once the batch is out there. Sorry I can't bring better news at this time.
ID: 61867 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 61868 - Posted: 1 Jan 2020, 1:06:04 UTC - in response to Message 61866.  

100 processors and only 15.94 GB of ram? !!!
Come on! You need at least 2 Gigs per core for this project these days.
ID: 61868 · Report as offensive     Reply Quote
lazlo_vii

Send message
Joined: 11 Dec 19
Posts: 108
Credit: 3,012,142
RAC: 0
Message 61869 - Posted: 1 Jan 2020, 1:14:27 UTC
Last modified: 1 Jan 2020, 1:17:10 UTC

The people pushing models to the download servers seem to post about new batches in this thread:

https://www.cpdn.org/forum_thread.php?id=8272&sort_style=8&start=750

And you can cross reference project names, numbers, and supported platforms with these two links:

https://www.cpdn.org/cpdnboinc/forum_thread.php?id=8401&postid=61476#61476 and https://www.cpdn.org/apps.php

I have noticed that the only batches released since I started CPDN (a few weeks ago) have been for Windows clients and the work units rarely remain for more than an hour or two. If you have no work then I congratulate you for getting the data back to the scientists as fast as possible and not hoarding months and months of models on a few systems. So take pride in knowing that when you do get models to crunch you are returning the results as fast as you can and that is helping the scientists create newer, better models faster.

<soapbox> Which begs the question: If the people hoarding work units are not in this for the science but are in it for the points and if the points are meaningless outside of a scientific context then why are they bothering at all? The entire point of distributed computing projects is to get as many people as possible working to produce usable data as fast as possible. That is why I think CPDN specifically and BOINC in general need to take a step away from awarding points solely based on raw computational requirments and take a step towards giving bonus points for quick turnaround times. Look at Folding@Home. They have a Quick Return Bonus for certain work loads. This QRB helps them ensure a high level of constant turnaround from the people that are in it just for the points. If someone comes along that is just doing DC projects to advance the science (like me and several others on these forums) then they will be just as happy with the QRB as they would be without it or even without any points at all. </soapbox>
ID: 61869 · Report as offensive     Reply Quote
wolfman1360

Send message
Joined: 18 Feb 17
Posts: 81
Credit: 14,062,567
RAC: 2,946
Message 61870 - Posted: 1 Jan 2020, 1:28:48 UTC - in response to Message 61869.  

The people pushing models to the download servers seem to post about new batches in this thread:

https://www.cpdn.org/forum_thread.php?id=8272&sort_style=8&start=750

And you can cross reference project names, numbers, and supported platforms with these two links:

https://www.cpdn.org/cpdnboinc/forum_thread.php?id=8401&postid=61476#61476 and https://www.cpdn.org/apps.php

I have noticed that the only batches released since I started CPDN (a few weeks ago) have been for Windows clients and the work units rarely remain for more than an hour or two. If you have no work then I congratulate you for getting the data back to the scientists as fast as possible and not hoarding months and months of models on a few systems. So take pride in knowing that when you do get models to crunch you are returning the results as fast as you can and that is helping the scientists create newer, better models faster.

<soapbox> Which begs the question: If the people hoarding work units are not in this for the science but are in it for the points and if the points are meaningless outside of a scientific context then why are they bothering at all? The entire point of distributed computing projects is to get as many people as possible working to produce usable data as fast as possible. That is why I think CPDN specifically and BOINC in general need to take a step away from awarding points solely based on raw computational requirments and take a step towards giving bonus points for quick turnaround times. Look at Folding@Home. They have a Quick Return Bonus for certain work loads. This QRB helps them ensure a high level of constant turnaround from the people that are in it just for the points. If someone comes along that is just doing DC projects to advance the science (like me and several others on these forums) then they will be just as happy with the QRB as they would be without it or even without any points at all. </soapbox>

Couldn't agree more here. Something I've also seen, as an almost exclusive WCG only cruncher. lot of the number based projects have ridiculously inflated credit compared to others, like Rosetta and world community grid. I've been trying to get interested in other projects, but I keep coming back here, Rosetta and WCG almost exclusively. I think the deadline for this project should be drastically reduced.
I liked folding, though with tons of CPU power and only one GPU, I felt like a tiny drop of water in an ocean.
I do have one of my machines fubar. Will the workunit be gone forever or will it be sent to someone else eventually?
ID: 61870 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 61871 - Posted: 1 Jan 2020, 8:05:21 UTC - in response to Message 61870.  

I do have one of my machines fubar. Will the workunit be gone forever or will it be sent to someone else eventually?


Depends on whether it was its last chance or not. If the task name ended in _0 or _1 indicating first or second attempt for the task it will eventually go to another computer once it times out. if it ends in _3 (or _4 in the case of some Linux tasks) it was on its last chance having already failed on two computers.
ID: 61871 · Report as offensive     Reply Quote
ProfileBonsai911

Send message
Joined: 9 Sep 04
Posts: 228
Credit: 30,788,938
RAC: 4,014
Message 61872 - Posted: 2 Jan 2020, 10:34:55 UTC - in response to Message 61866.  
Last modified: 2 Jan 2020, 11:09:50 UTC

answer to Message 61866 (user: crashtech) 31 Dec 2019, 22:53:00 UTC


you have crashed (77) Workunits from 1 Aug 2018 to 3 Oct 2018, so in less than 70 days.


you have 6 valid results in 2019.


Your <stderr_txt> is:

Signal 11 received: Segment violation
Signal 11 received: Software termination signal from kill
Signal 11 received: Abnormal termination triggered by abort call
Signal 11 received, exiting...
09:40:15 (405892): called boinc_finish(193)
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=409660, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=405892, selfPID=387212, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_ain::Monitor...
09:40:19 (387212): called boinc_finish(0)


Do you shut down the boinc software correct?


Have a nice day and hopefully god will help Australia,


Bonsai911

Ps: Why will this task
https://www.cpdn.org/cpdnboinc/result.php?resultid=21824256

not be reissued?
ID: 61872 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 61873 - Posted: 2 Jan 2020, 11:24:11 UTC - in response to Message 61872.  


Ps: Why will this task
https://www.cpdn.org/cpdnboinc/result.php?resultid=21824256

not be reissued?


That batch has been closed.
ID: 61873 · Report as offensive     Reply Quote
ProfileBonsai911

Send message
Joined: 9 Sep 04
Posts: 228
Credit: 30,788,938
RAC: 4,014
Message 61874 - Posted: 2 Jan 2020, 11:30:57 UTC - in response to Message 61873.  

Thanks for your quick answer!

...and once more, all good wishes to Australia.


Bonsai911
ID: 61874 · Report as offensive     Reply Quote
Harri Liljeroos

Send message
Joined: 9 Dec 05
Posts: 116
Credit: 12,547,934
RAC: 2,738
Message 61875 - Posted: 2 Jan 2020, 11:32:21 UTC - in response to Message 61869.  

<soapbox> Which begs the question: If the people hoarding work units are not in this for the science but are in it for the points and if the points are meaningless outside of a scientific context then why are they bothering at all? The entire point of distributed computing projects is to get as many people as possible working to produce usable data as fast as possible. That is why I think CPDN specifically and BOINC in general need to take a step away from awarding points solely based on raw computational requirments and take a step towards giving bonus points for quick turnaround times. Look at Folding@Home. They have a Quick Return Bonus for certain work loads. This QRB helps them ensure a high level of constant turnaround from the people that are in it just for the points. If someone comes along that is just doing DC projects to advance the science (like me and several others on these forums) then they will be just as happy with the QRB as they would be without it or even without any points at all. </soapbox>

I disagree with you. In distributed computing the goal is to get results with the lowest possible cost. Time is not important. If you want to do it fast, use your own supercomputer or cluster.
ID: 61875 · Report as offensive     Reply Quote
ProfileBonsai911

Send message
Joined: 9 Sep 04
Posts: 228
Credit: 30,788,938
RAC: 4,014
Message 61876 - Posted: 2 Jan 2020, 12:14:12 UTC - in response to Message 61875.  

Time is not important.


The suffering people in South Africa, Brasil, Iceland, Greenland, Australia, also Germany, and some dozen more countries disagree.
For example: The melting glacier and ice in Greenland is melting today five times faster than in the early 1990. And the loss of ice accelerates.
Is it correct as recapitulation, that after 30 years of ignorance, there is not much more time left?
ID: 61876 · Report as offensive     Reply Quote
Harri Liljeroos

Send message
Joined: 9 Dec 05
Posts: 116
Credit: 12,547,934
RAC: 2,738
Message 61879 - Posted: 2 Jan 2020, 12:55:25 UTC - in response to Message 61876.  

The suffering people in South Africa, Brasil, Iceland, Greenland, Australia, also Germany, and some dozen more countries disagree.
For example: The melting glacier and ice in Greenland is melting today five times faster than in the early 1990. And the loss of ice accelerates.
Is it correct as recapitulation, that after 30 years of ignorance, there is not much more time left?

The timescale we are talking about in distributed computing is irrelevant compared to how long it takes to make decisions about acting on climate change and it's consequences.
ID: 61879 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 61880 - Posted: 2 Jan 2020, 14:50:36 UTC - in response to Message 61879.  

The suffering people in South Africa, Brasil, Iceland, Greenland, Australia, also Germany, and some dozen more countries disagree.
For example: The melting glacier and ice in Greenland is melting today five times faster than in the early 1990. And the loss of ice accelerates.
Is it correct as recapitulation, that after 30 years of ignorance, there is not much more time left?

The timescale we are talking about in distributed computing is irrelevant compared to how long it takes to make decisions about acting on climate change and it's consequences.


It may be irrelevant compared to how long these decisions are taking at the moment but not to how quickly they need to be taken.

And clearly for many projects the goal is to get results in quickly or they wouldn't have such short deadlines.
ID: 61880 · Report as offensive     Reply Quote
1 · 2 · 3 · Next

Message boards : Number crunching : Scheduler request too recent

©2024 cpdn.org