Message boards : Number crunching : Scheduler request too recent
Message board moderation
Author | Message |
---|---|
Send message Joined: 27 Jul 16 Posts: 10 Credit: 55,923 RAC: 0 |
What's the delay set for scheduling requests? I had a large number of tasks die with computation error and trying to troubleshoot, but keep getting denied new work: 12/25/2019 10:16:11 AM | climateprediction.net | Not sending work - last request too recent: 2328 sec Seems a bit excessive. |
Send message Joined: 28 Jul 19 Posts: 150 Credit: 12,830,559 RAC: 228 |
What's the delay set for scheduling requests? I had a large number of tasks die with computation error and trying to troubleshoot, but keep getting denied new work: It appears to be set at an hour plus a few seconds |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
It appears to be set at an hour plus a few seconds Right. It will request work only after 1 hour of no communication with the project. So clicking the Update button in the Projects tab of boinc manager with climateprediction.net selected is counterproductive for requesting new work, unless it's been over an hour since the last scheduler request. |
Send message Joined: 15 Jan 06 Posts: 637 Credit: 26,751,529 RAC: 653 |
Seems a bit excessive. I run into it all the time. Most projects us a couple of minutes. I would think that would be sufficient to prevent server overload. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
It's an attempt to limit seral killers from grabbing new work to kill too quickly. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
It's an attempt to limit seral killers from grabbing new work to kill too quickly. To flesh this out a bit more, if a computer with 48 cores is missing the 32 bit libraries, crashes them in about six seconds or less and then requests more,2 minutes could easily result in one machine crashing a large number of tasks. And there have been more than a few such machines recently. An hour following a crash doesn't seem excessive to me. I don't know if there is a way it could be managed to reduce the time interval when the request is not following a crash. I suspect that would need changes to the BOINC code. |
Send message Joined: 15 Jan 06 Posts: 637 Credit: 26,751,529 RAC: 653 |
OK, it is for the special needs of this project. I can live with it if it is necessary. Thanks. |
Send message Joined: 1 Jun 17 Posts: 13 Credit: 30,531,193 RAC: 28,969 |
So, I'm getting this 3600+ second wait even though NO work has been sent to the client. I'm wondering how many hours it will sit idling, waiting for work that may never come? |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
So, I'm getting this 3600+ second wait even though NO work has been sent to the client. I'm wondering how many hours it will sit idling, waiting for work that may never come? According to the server status page, there is no work for Windows computers at this time (WAH2 models). https://www.cpdn.org/server_status.php These come in batches and we as moderators seldom know ahead of time when a batch will be released. With a lot of Windows computers attached, the tasks in the queue don't take long to be snapped up once the batch is out there. Sorry I can't bring better news at this time. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
100 processors and only 15.94 GB of ram? !!! Come on! You need at least 2 Gigs per core for this project these days. |
Send message Joined: 11 Dec 19 Posts: 108 Credit: 3,012,142 RAC: 0 |
The people pushing models to the download servers seem to post about new batches in this thread: https://www.cpdn.org/forum_thread.php?id=8272&sort_style=8&start=750 And you can cross reference project names, numbers, and supported platforms with these two links: https://www.cpdn.org/cpdnboinc/forum_thread.php?id=8401&postid=61476#61476 and https://www.cpdn.org/apps.php I have noticed that the only batches released since I started CPDN (a few weeks ago) have been for Windows clients and the work units rarely remain for more than an hour or two. If you have no work then I congratulate you for getting the data back to the scientists as fast as possible and not hoarding months and months of models on a few systems. So take pride in knowing that when you do get models to crunch you are returning the results as fast as you can and that is helping the scientists create newer, better models faster. <soapbox> Which begs the question: If the people hoarding work units are not in this for the science but are in it for the points and if the points are meaningless outside of a scientific context then why are they bothering at all? The entire point of distributed computing projects is to get as many people as possible working to produce usable data as fast as possible. That is why I think CPDN specifically and BOINC in general need to take a step away from awarding points solely based on raw computational requirments and take a step towards giving bonus points for quick turnaround times. Look at Folding@Home. They have a Quick Return Bonus for certain work loads. This QRB helps them ensure a high level of constant turnaround from the people that are in it just for the points. If someone comes along that is just doing DC projects to advance the science (like me and several others on these forums) then they will be just as happy with the QRB as they would be without it or even without any points at all. </soapbox> |
Send message Joined: 18 Feb 17 Posts: 81 Credit: 14,062,567 RAC: 2,946 |
The people pushing models to the download servers seem to post about new batches in this thread: Couldn't agree more here. Something I've also seen, as an almost exclusive WCG only cruncher. lot of the number based projects have ridiculously inflated credit compared to others, like Rosetta and world community grid. I've been trying to get interested in other projects, but I keep coming back here, Rosetta and WCG almost exclusively. I think the deadline for this project should be drastically reduced. I liked folding, though with tons of CPU power and only one GPU, I felt like a tiny drop of water in an ocean. I do have one of my machines fubar. Will the workunit be gone forever or will it be sent to someone else eventually? |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
I do have one of my machines fubar. Will the workunit be gone forever or will it be sent to someone else eventually? Depends on whether it was its last chance or not. If the task name ended in _0 or _1 indicating first or second attempt for the task it will eventually go to another computer once it times out. if it ends in _3 (or _4 in the case of some Linux tasks) it was on its last chance having already failed on two computers. |
Send message Joined: 9 Sep 04 Posts: 228 Credit: 30,788,938 RAC: 4,014 |
answer to Message 61866 (user: crashtech) 31 Dec 2019, 22:53:00 UTC you have crashed (77) Workunits from 1 Aug 2018 to 3 Oct 2018, so in less than 70 days. you have 6 valid results in 2019. Your <stderr_txt> is: Signal 11 received: Segment violation Signal 11 received: Software termination signal from kill Signal 11 received: Abnormal termination triggered by abort call Signal 11 received, exiting... 09:40:15 (405892): called boinc_finish(193) Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=409660, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=405892, selfPID=387212, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_ain::Monitor... 09:40:19 (387212): called boinc_finish(0) Do you shut down the boinc software correct? Have a nice day and hopefully god will help Australia, Bonsai911 Ps: Why will this task https://www.cpdn.org/cpdnboinc/result.php?resultid=21824256 not be reissued? |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
That batch has been closed. |
Send message Joined: 9 Sep 04 Posts: 228 Credit: 30,788,938 RAC: 4,014 |
Thanks for your quick answer! ...and once more, all good wishes to Australia. Bonsai911 |
Send message Joined: 9 Dec 05 Posts: 116 Credit: 12,547,934 RAC: 2,738 |
<soapbox> Which begs the question: If the people hoarding work units are not in this for the science but are in it for the points and if the points are meaningless outside of a scientific context then why are they bothering at all? The entire point of distributed computing projects is to get as many people as possible working to produce usable data as fast as possible. That is why I think CPDN specifically and BOINC in general need to take a step away from awarding points solely based on raw computational requirments and take a step towards giving bonus points for quick turnaround times. Look at Folding@Home. They have a Quick Return Bonus for certain work loads. This QRB helps them ensure a high level of constant turnaround from the people that are in it just for the points. If someone comes along that is just doing DC projects to advance the science (like me and several others on these forums) then they will be just as happy with the QRB as they would be without it or even without any points at all. </soapbox> I disagree with you. In distributed computing the goal is to get results with the lowest possible cost. Time is not important. If you want to do it fast, use your own supercomputer or cluster. |
Send message Joined: 9 Sep 04 Posts: 228 Credit: 30,788,938 RAC: 4,014 |
Time is not important. The suffering people in South Africa, Brasil, Iceland, Greenland, Australia, also Germany, and some dozen more countries disagree. For example: The melting glacier and ice in Greenland is melting today five times faster than in the early 1990. And the loss of ice accelerates. Is it correct as recapitulation, that after 30 years of ignorance, there is not much more time left? |
Send message Joined: 9 Dec 05 Posts: 116 Credit: 12,547,934 RAC: 2,738 |
The suffering people in South Africa, Brasil, Iceland, Greenland, Australia, also Germany, and some dozen more countries disagree. The timescale we are talking about in distributed computing is irrelevant compared to how long it takes to make decisions about acting on climate change and it's consequences. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
The suffering people in South Africa, Brasil, Iceland, Greenland, Australia, also Germany, and some dozen more countries disagree. It may be irrelevant compared to how long these decisions are taking at the moment but not to how quickly they need to be taken. And clearly for many projects the goal is to get results in quickly or they wouldn't have such short deadlines. |
©2024 cpdn.org