Message boards : Number crunching : Batch 995 has been closed
Message board moderation
Author | Message |
---|---|
Send message Joined: 29 Oct 17 Posts: 1051 Credit: 16,655,437 RAC: 10,602 |
Batch 995 has now been closed. This was NZ25 configuration, submitted July/2023. Closed: means no resends will go out. If anyone does have any tasks for 995 on the machine they should abort it. --- CPDN Visiting Scientist |
Send message Joined: 5 Aug 04 Posts: 127 Credit: 24,577,798 RAC: 8,230 |
Closed: means no resends will go out.If this is true, why did I just download the 995 listed below a few minutes ago? Note, interestingly enough the original task for wu is listed as "Didn't need" and was never sent out to anyone, maybe this indicates CPDN has got a bug in whatever "closing" algorithm is used... Name wah2_nz25_21cs_209505_25_995_012296583_1 Workunit 12296583 Created 7 Oct 2024, 11:12:44 UTC Sent 7 Oct 2024, 11:44:10 UTC |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,748,059 RAC: 5,647 |
Likewise with wah2_nz25_22as_209805_25_995_012297807 - workunit 12297807 - extra task created and sent out today. I won't be able to cancel it until I get home from Italy ... |
Send message Joined: 22 Feb 06 Posts: 491 Credit: 31,290,179 RAC: 15,651 |
"If anyone does have any tasks for 995 on the machine they should abort it." Done for the four I got. |
Send message Joined: 5 Jun 09 Posts: 97 Credit: 3,736,855 RAC: 4,073 |
...and five months later a bout of NZ 995 have emerged from somewhere or other |
Send message Joined: 29 Oct 17 Posts: 1051 Credit: 16,655,437 RAC: 10,602 |
That's interesting. I don't know what's going on there. It's still showing as closed on the server. I will investigate. --- CPDN Visiting Scientist |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
Closed: means no resends will go out. If anyone does have any tasks for 995 on the machine they should abort it. OK: I just aborted the two I received earlier today. |
Send message Joined: 5 Jun 09 Posts: 97 Credit: 3,736,855 RAC: 4,073 |
I got four of them, all showing as being the first time out, after being marked as "didn't need" at some time in the past (no date given). https://main.cpdn.org/workunit.php?wuid=12296891 |
Send message Joined: 29 Oct 17 Posts: 1051 Credit: 16,655,437 RAC: 10,602 |
Rob, definitely abort them. That batch is not needed any more. I've reported the issue to CPDN and will find out what's going on. I wonder if the recent "rebranding" has upset things. I could try re-closing the batch with added 'abort' sent to running tasks but that tends to annoy people. --- CPDN Visiting Scientist |
Send message Joined: 5 Jun 09 Posts: 97 Credit: 3,736,855 RAC: 4,073 |
Thanks for the update Glen. I didn't want to abort them just incase they were really wanted. |
Send message Joined: 5 Aug 04 Posts: 127 Credit: 24,577,798 RAC: 8,230 |
with added 'abort' sent to running tasks but that tends to annoy people.At least personally I find it much better such a task is worse-case aborted at first trickle with 4.27% done than wasting maybe 10 days continuing crunching the task and it's just dumped at the end. The main downside with multiple aborts on same computer is the computer is yet again back with quota of 1 task per day, meaning when the next "good" batch of work is released it can take over a week to get one task per real core. |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,748,059 RAC: 5,647 |
The main downside with multiple aborts on same computer is the computer is yet again back with quota of 1 task per day, meaning when the next "good" batch of work is released it can take over a week to get one task per real core.+1 I see I've picked up two more batch 995 tasks yesterday afternoon. I can't abort them, and it's not easy to sensibly block the requests: I left the machines accepting new work while I'm away, after there have been hints of at least two new batches being prepared for release 'soon'. Most BOINC projects nowadays (that use multiple applications) have allowed users to pick and choose which of the sub-applications to run through preferences. I understand the argument here, that the 'set and forget' brigade wouldn't opt in to new applications on release, but to dis-allow choices places an even greater responsibility on the project team to understand and get the best they can out of the wild horse they're riding. |
Send message Joined: 29 Oct 17 Posts: 1051 Credit: 16,655,437 RAC: 10,602 |
I've identified the problem. I've now reclosed the batch and told the server to abort any tasks in progress on volunteers machines (since we don't need the results). Resends are prevented for closed batches by setting the workunit's number of allowed task attempts to the current number of task attempts. But when a batch is aborted, that part doesn't happen. The script in question has to be run again with a different setting. I closed & aborted tasks in batch 995 as it was going wrong if I remember right. That's why these are now going out because I forgot to do the extra step (mea culpa). Normally, CPDN do not abort workunits in progress because the results can still be used and we don't have this problem. I've modified the script so this doesn't happen again. Thanks a lot for reporting this - this is a great group! --- CPDN Visiting Scientist |
Send message Joined: 9 Dec 05 Posts: 116 Credit: 12,550,440 RAC: 491 |
I have two tasks from this batch 995. They show now here on the server as 'Didn't need' https://main.cpdn.org/results.php?hostid=1552491 but they are still running on my computer. I don't see any message from server that Boinc should abort them. |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,748,059 RAC: 5,647 |
The servers can't "send" a message to abort them by itself - that would get blocked by the firewall or the address translation in your home router. Instead, it has to wait until your own computer sends a request, and it can add the 'abort' message to the reply. That will happen automatically the next time a trickle update is triggered, or you could speed it up by updating the project manually. |
Send message Joined: 9 Dec 05 Posts: 116 Credit: 12,550,440 RAC: 491 |
Both tasks have now sent new zip-files and trickle-up messages to server and got credit for them also. Still no request to abort from server. 46871 climateprediction.net 08-10-2024 19:30 [sched_op] Starting scheduler request 46872 climateprediction.net 08-10-2024 19:30 Sending scheduler request: To send trickle-up message. 46873 climateprediction.net 08-10-2024 19:30 Not requesting tasks: don't need (CPU: max concurrent job limit; NVIDIA GPU: no applications; AMD/ATI GPU: no applications) 46874 climateprediction.net 08-10-2024 19:30 [sched_op] CPU work request: 0.00 seconds; 0.00 devices 46875 climateprediction.net 08-10-2024 19:30 [sched_op] NVIDIA GPU work request: 0.00 seconds; 0.00 devices 46876 climateprediction.net 08-10-2024 19:30 [sched_op] AMD/ATI GPU work request: 0.00 seconds; 0.00 devices 46877 climateprediction.net 08-10-2024 19:30 Scheduler request completed 46878 climateprediction.net 08-10-2024 19:30 [sched_op] Server version 721 46879 climateprediction.net 08-10-2024 19:30 Project requested delay of 3636 seconds 46880 climateprediction.net 08-10-2024 19:30 [sched_op] Deferring communication for 01:00:36 46881 climateprediction.net 08-10-2024 19:30 [sched_op] Reason: requested by project 46882 climateprediction.net 08-10-2024 19:30 Started upload of wah2_nz25_21l9_209505_25_995_012296888_1_r951423997_7.zip 46883 climateprediction.net 08-10-2024 19:30 Finished upload of wah2_nz25_21l9_209505_25_995_012296888_1_r951423997_7.zip (90221240 bytes) 47135 climateprediction.net 08-10-2024 20:35 [sched_op] Starting scheduler request 47136 climateprediction.net 08-10-2024 20:35 Sending scheduler request: To send trickle-up message. 47137 climateprediction.net 08-10-2024 20:35 Not requesting tasks: don't need (CPU: max concurrent job limit; NVIDIA GPU: no applications; AMD/ATI GPU: no applications) 47138 climateprediction.net 08-10-2024 20:35 [sched_op] CPU work request: 0.00 seconds; 0.00 devices 47139 climateprediction.net 08-10-2024 20:35 [sched_op] NVIDIA GPU work request: 0.00 seconds; 0.00 devices 47140 climateprediction.net 08-10-2024 20:35 [sched_op] AMD/ATI GPU work request: 0.00 seconds; 0.00 devices 47141 climateprediction.net 08-10-2024 20:35 Scheduler request completed 47142 climateprediction.net 08-10-2024 20:35 [sched_op] Server version 721 47143 climateprediction.net 08-10-2024 20:35 Project requested delay of 3636 seconds 47144 climateprediction.net 08-10-2024 20:35 [sched_op] Deferring communication for 01:00:36 47145 climateprediction.net 08-10-2024 20:35 [sched_op] Reason: requested by project 47146 climateprediction.net 08-10-2024 20:35 Started upload of wah2_nz25_21wv_209705_25_995_012297306_1_r862382279_7.zip 47147 climateprediction.net 08-10-2024 20:35 Finished upload of wah2_nz25_21wv_209705_25_995_012297306_1_r862382279_7.zip (90418684 bytes) |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,748,059 RAC: 5,647 |
Looking at my batch 995 tasks, I think the first one has crashed (the computer, that is, not the CPDN task - it isn't contacting other projects either) The other two seem to be going strong still, and are reporting trickles - 7 each so far, with the last ones being this morning. Good for my stats, not so much so for the science. |
Send message Joined: 14 Sep 08 Posts: 127 Credit: 43,257,301 RAC: 72,605 |
On the web results, I see the batch 995 tasks marked as "Didn't need", like this. Meanwhile, the task happily sat in the queue on my host, even though it hasn't started running yet. I recorded my "Max tasks per day" for the host and aborted them manually anyway. Interestingly, even after reporting the aborted tasks, the web status didn't change. It hasn't updated the reported date, nor did I get "Max tasks per day" or "Consecutive valid tasks" reset. This means that they can be aborted without side effects, but one can also keep crunching it for credits if they wish? (Unless there is a delay in updating web pages, but from my past experience, the results page is real-time.) This behavior is a bit different from what I've seen in other projects, where a "server abort" will actually abort a task on the host. I've seen Asteroids@Home doing that pretty frequently for resend when initial results show up late. |
Send message Joined: 12 Apr 21 Posts: 318 Credit: 15,000,104 RAC: 9,568 |
I'm not sure that re-closing the batch worked. The tasks I aborted got new tasks generated and sent to other users and are being processed and trickles returned. |
Send message Joined: 23 Jul 23 Posts: 1 Credit: 1,548,235 RAC: 2,990 |
I am also still getting trickle up credits for 995 tasks, while no new tasks are available we would credits if we aborted. |
©2024 cpdn.org