climateprediction.net (CPDN) home page
Thread 'Batch 995 has been closed'

Thread 'Batch 995 has been closed'

Message boards : Number crunching : Batch 995 has been closed
Message board moderation

To post messages, you must log in.

AuthorMessage
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1051
Credit: 16,655,437
RAC: 10,602
Message 70656 - Posted: 18 Mar 2024, 11:50:56 UTC

Batch 995 has now been closed. This was NZ25 configuration, submitted July/2023.

Closed: means no resends will go out. If anyone does have any tasks for 995 on the machine they should abort it.
---
CPDN Visiting Scientist
ID: 70656 · Report as offensive     Reply Quote
Ingleside

Send message
Joined: 5 Aug 04
Posts: 127
Credit: 24,577,798
RAC: 8,230
Message 71562 - Posted: 7 Oct 2024, 12:03:37 UTC - in response to Message 70656.  

Closed: means no resends will go out.
If this is true, why did I just download the 995 listed below a few minutes ago?
Note, interestingly enough the original task for wu is listed as "Didn't need" and was never sent out to anyone, maybe this indicates CPDN has got a bug in whatever "closing" algorithm is used...

Name wah2_nz25_21cs_209505_25_995_012296583_1
Workunit 12296583
Created 7 Oct 2024, 11:12:44 UTC
Sent 7 Oct 2024, 11:44:10 UTC
ID: 71562 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,748,059
RAC: 5,647
Message 71563 - Posted: 7 Oct 2024, 12:25:50 UTC - in response to Message 71562.  

Likewise with wah2_nz25_22as_209805_25_995_012297807 - workunit 12297807 - extra task created and sent out today. I won't be able to cancel it until I get home from Italy ...
ID: 71563 · Report as offensive     Reply Quote
ProfileAlan K

Send message
Joined: 22 Feb 06
Posts: 491
Credit: 31,290,179
RAC: 15,651
Message 71564 - Posted: 7 Oct 2024, 18:25:30 UTC - in response to Message 70656.  

"If anyone does have any tasks for 995 on the machine they should abort it."

Done for the four I got.
ID: 71564 · Report as offensive     Reply Quote
rob

Send message
Joined: 5 Jun 09
Posts: 97
Credit: 3,736,855
RAC: 4,073
Message 71565 - Posted: 7 Oct 2024, 19:30:13 UTC - in response to Message 70656.  

...and five months later a bout of NZ 995 have emerged from somewhere or other
ID: 71565 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1051
Credit: 16,655,437
RAC: 10,602
Message 71567 - Posted: 7 Oct 2024, 22:46:06 UTC - in response to Message 71565.  

That's interesting. I don't know what's going on there. It's still showing as closed on the server. I will investigate.
---
CPDN Visiting Scientist
ID: 71567 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 71569 - Posted: 7 Oct 2024, 23:37:41 UTC - in response to Message 70656.  

Closed: means no resends will go out. If anyone does have any tasks for 995 on the machine they should abort it.


OK: I just aborted the two I received earlier today.
ID: 71569 · Report as offensive     Reply Quote
rob

Send message
Joined: 5 Jun 09
Posts: 97
Credit: 3,736,855
RAC: 4,073
Message 71571 - Posted: 8 Oct 2024, 6:41:15 UTC - in response to Message 71567.  

I got four of them, all showing as being the first time out, after being marked as "didn't need" at some time in the past (no date given).
https://main.cpdn.org/workunit.php?wuid=12296891
ID: 71571 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1051
Credit: 16,655,437
RAC: 10,602
Message 71573 - Posted: 8 Oct 2024, 10:50:40 UTC - in response to Message 71571.  

Rob, definitely abort them. That batch is not needed any more. I've reported the issue to CPDN and will find out what's going on. I wonder if the recent "rebranding" has upset things.

I could try re-closing the batch with added 'abort' sent to running tasks but that tends to annoy people.
---
CPDN Visiting Scientist
ID: 71573 · Report as offensive     Reply Quote
rob

Send message
Joined: 5 Jun 09
Posts: 97
Credit: 3,736,855
RAC: 4,073
Message 71574 - Posted: 8 Oct 2024, 11:09:10 UTC - in response to Message 71573.  

Thanks for the update Glen. I didn't want to abort them just incase they were really wanted.
ID: 71574 · Report as offensive     Reply Quote
Ingleside

Send message
Joined: 5 Aug 04
Posts: 127
Credit: 24,577,798
RAC: 8,230
Message 71575 - Posted: 8 Oct 2024, 11:48:17 UTC - in response to Message 71573.  

with added 'abort' sent to running tasks but that tends to annoy people.
At least personally I find it much better such a task is worse-case aborted at first trickle with 4.27% done than wasting maybe 10 days continuing crunching the task and it's just dumped at the end.

The main downside with multiple aborts on same computer is the computer is yet again back with quota of 1 task per day, meaning when the next "good" batch of work is released it can take over a week to get one task per real core.
ID: 71575 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,748,059
RAC: 5,647
Message 71577 - Posted: 8 Oct 2024, 13:08:34 UTC - in response to Message 71575.  

The main downside with multiple aborts on same computer is the computer is yet again back with quota of 1 task per day, meaning when the next "good" batch of work is released it can take over a week to get one task per real core.
+1

I see I've picked up two more batch 995 tasks yesterday afternoon. I can't abort them, and it's not easy to sensibly block the requests: I left the machines accepting new work while I'm away, after there have been hints of at least two new batches being prepared for release 'soon'.

Most BOINC projects nowadays (that use multiple applications) have allowed users to pick and choose which of the sub-applications to run through preferences. I understand the argument here, that the 'set and forget' brigade wouldn't opt in to new applications on release, but to dis-allow choices places an even greater responsibility on the project team to understand and get the best they can out of the wild horse they're riding.
ID: 71577 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1051
Credit: 16,655,437
RAC: 10,602
Message 71578 - Posted: 8 Oct 2024, 14:01:02 UTC
Last modified: 8 Oct 2024, 14:24:12 UTC

I've identified the problem. I've now reclosed the batch and told the server to abort any tasks in progress on volunteers machines (since we don't need the results).

Resends are prevented for closed batches by setting the workunit's number of allowed task attempts to the current number of task attempts. But when a batch is aborted, that part doesn't happen. The script in question has to be run again with a different setting. I closed & aborted tasks in batch 995 as it was going wrong if I remember right. That's why these are now going out because I forgot to do the extra step (mea culpa). Normally, CPDN do not abort workunits in progress because the results can still be used and we don't have this problem. I've modified the script so this doesn't happen again.

Thanks a lot for reporting this - this is a great group!
---
CPDN Visiting Scientist
ID: 71578 · Report as offensive     Reply Quote
Harri Liljeroos

Send message
Joined: 9 Dec 05
Posts: 116
Credit: 12,550,440
RAC: 491
Message 71581 - Posted: 8 Oct 2024, 15:56:51 UTC

I have two tasks from this batch 995. They show now here on the server as 'Didn't need' https://main.cpdn.org/results.php?hostid=1552491 but they are still running on my computer. I don't see any message from server that Boinc should abort them.
ID: 71581 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,748,059
RAC: 5,647
Message 71582 - Posted: 8 Oct 2024, 16:26:28 UTC - in response to Message 71581.  

The servers can't "send" a message to abort them by itself - that would get blocked by the firewall or the address translation in your home router.

Instead, it has to wait until your own computer sends a request, and it can add the 'abort' message to the reply. That will happen automatically the next time a trickle update is triggered, or you could speed it up by updating the project manually.
ID: 71582 · Report as offensive     Reply Quote
Harri Liljeroos

Send message
Joined: 9 Dec 05
Posts: 116
Credit: 12,550,440
RAC: 491
Message 71583 - Posted: 8 Oct 2024, 17:46:02 UTC - in response to Message 71582.  

Both tasks have now sent new zip-files and trickle-up messages to server and got credit for them also. Still no request to abort from server.

46871	climateprediction.net	08-10-2024 19:30	[sched_op] Starting scheduler request	
46872	climateprediction.net	08-10-2024 19:30	Sending scheduler request: To send trickle-up message.	
46873	climateprediction.net	08-10-2024 19:30	Not requesting tasks: don't need (CPU: max concurrent job limit; NVIDIA GPU: no applications; AMD/ATI GPU: no applications)	
46874	climateprediction.net	08-10-2024 19:30	[sched_op] CPU work request: 0.00 seconds; 0.00 devices	
46875	climateprediction.net	08-10-2024 19:30	[sched_op] NVIDIA GPU work request: 0.00 seconds; 0.00 devices	
46876	climateprediction.net	08-10-2024 19:30	[sched_op] AMD/ATI GPU work request: 0.00 seconds; 0.00 devices	
46877	climateprediction.net	08-10-2024 19:30	Scheduler request completed	
46878	climateprediction.net	08-10-2024 19:30	[sched_op] Server version 721	
46879	climateprediction.net	08-10-2024 19:30	Project requested delay of 3636 seconds	
46880	climateprediction.net	08-10-2024 19:30	[sched_op] Deferring communication for 01:00:36	
46881	climateprediction.net	08-10-2024 19:30	[sched_op] Reason: requested by project	
46882	climateprediction.net	08-10-2024 19:30	Started upload of wah2_nz25_21l9_209505_25_995_012296888_1_r951423997_7.zip	
46883	climateprediction.net	08-10-2024 19:30	Finished upload of wah2_nz25_21l9_209505_25_995_012296888_1_r951423997_7.zip (90221240 bytes)	
47135	climateprediction.net	08-10-2024 20:35	[sched_op] Starting scheduler request	
47136	climateprediction.net	08-10-2024 20:35	Sending scheduler request: To send trickle-up message.	
47137	climateprediction.net	08-10-2024 20:35	Not requesting tasks: don't need (CPU: max concurrent job limit; NVIDIA GPU: no applications; AMD/ATI GPU: no applications)	
47138	climateprediction.net	08-10-2024 20:35	[sched_op] CPU work request: 0.00 seconds; 0.00 devices	
47139	climateprediction.net	08-10-2024 20:35	[sched_op] NVIDIA GPU work request: 0.00 seconds; 0.00 devices	
47140	climateprediction.net	08-10-2024 20:35	[sched_op] AMD/ATI GPU work request: 0.00 seconds; 0.00 devices	
47141	climateprediction.net	08-10-2024 20:35	Scheduler request completed	
47142	climateprediction.net	08-10-2024 20:35	[sched_op] Server version 721	
47143	climateprediction.net	08-10-2024 20:35	Project requested delay of 3636 seconds	
47144	climateprediction.net	08-10-2024 20:35	[sched_op] Deferring communication for 01:00:36	
47145	climateprediction.net	08-10-2024 20:35	[sched_op] Reason: requested by project	
47146	climateprediction.net	08-10-2024 20:35	Started upload of wah2_nz25_21wv_209705_25_995_012297306_1_r862382279_7.zip	
47147	climateprediction.net	08-10-2024 20:35	Finished upload of wah2_nz25_21wv_209705_25_995_012297306_1_r862382279_7.zip (90418684 bytes)	

ID: 71583 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,748,059
RAC: 5,647
Message 71587 - Posted: 9 Oct 2024, 14:59:21 UTC

Looking at my batch 995 tasks, I think the first one has crashed (the computer, that is, not the CPDN task - it isn't contacting other projects either)

The other two seem to be going strong still, and are reporting trickles - 7 each so far, with the last ones being this morning. Good for my stats, not so much so for the science.
ID: 71587 · Report as offensive     Reply Quote
wujj123456

Send message
Joined: 14 Sep 08
Posts: 127
Credit: 43,257,301
RAC: 72,605
Message 71588 - Posted: 9 Oct 2024, 23:22:38 UTC
Last modified: 9 Oct 2024, 23:32:16 UTC

On the web results, I see the batch 995 tasks marked as "Didn't need", like this. Meanwhile, the task happily sat in the queue on my host, even though it hasn't started running yet. I recorded my "Max tasks per day" for the host and aborted them manually anyway. Interestingly, even after reporting the aborted tasks, the web status didn't change. It hasn't updated the reported date, nor did I get "Max tasks per day" or "Consecutive valid tasks" reset. This means that they can be aborted without side effects, but one can also keep crunching it for credits if they wish? (Unless there is a delay in updating web pages, but from my past experience, the results page is real-time.)

This behavior is a bit different from what I've seen in other projects, where a "server abort" will actually abort a task on the host. I've seen Asteroids@Home doing that pretty frequently for resend when initial results show up late.
ID: 71588 · Report as offensive     Reply Quote
AndreyOR

Send message
Joined: 12 Apr 21
Posts: 318
Credit: 15,000,104
RAC: 9,568
Message 71589 - Posted: 10 Oct 2024, 6:13:37 UTC - in response to Message 71578.  

I'm not sure that re-closing the batch worked. The tasks I aborted got new tasks generated and sent to other users and are being processed and trickles returned.
ID: 71589 · Report as offensive     Reply Quote
UBT - wbiz

Send message
Joined: 23 Jul 23
Posts: 1
Credit: 1,548,235
RAC: 2,990
Message 71618 - Posted: 15 Oct 2024, 13:57:21 UTC
Last modified: 15 Oct 2024, 13:59:41 UTC

I am also still getting trickle up credits for 995 tasks, while no new tasks are available we would credits if we aborted.
ID: 71618 · Report as offensive     Reply Quote

Message boards : Number crunching : Batch 995 has been closed

©2024 cpdn.org