climateprediction.net home page
Posts by zombie67 [MM]

Posts by zombie67 [MM]

1) Message boards : Number crunching : EAS batches 1001-4 (Message 70314)
Posted 26 days ago by zombie67 [MM]
Post:
Great news! Thanks!
2) Message boards : Number crunching : EAS batches 1001-4 (Message 70304)
Posted 28 days ago by zombie67 [MM]
Post:
CPDN don't abort tasks from the server. They used to do it years ago but users complained.


With trickles, there are no (valid) reasons to complain any more. No need to complete the tasks to get paid for work done. They already get paid for partial work. So doing server-side aborts should be resumed.
3) Message boards : Number crunching : Batches closed (Message 70291)
Posted 29 days ago by zombie67 [MM]
Post:
It is not CPDN policy to remotely abort workunits on volunteer computers.


This is a bad policy, and needs to be changed. If the project is not going to use the results of any tasks out in the wild, they should be aborted from the server side. Otherwise electricity and time are just be wasted. Seems contrary to the goal of CPDN, right?
4) Message boards : Number crunching : EAS batches 1001-4 (Message 70289)
Posted 29 days ago by zombie67 [MM]
Post:
EAS Batches 1002, 1003, 1004 have been closed


I just tried an update for the machines with those tasks, but nothing happened. Will a server-side abort be issued? That seem like the right way to do this.
5) Message boards : Number crunching : EAS batches 1001-4 (Message 70186)
Posted 23 Jan 2024 by zombie67 [MM]
Post:
Thanks. I am not sure if someone fixed something, of if they just magically resolved themselves. In any case, all my pending uploaded are gone now.
6) Message boards : Number crunching : EAS batches 1001-4 (Message 70181)
Posted 22 Jan 2024 by zombie67 [MM]
Post:
FWIW, I am having a problem uploading a completed task. It has failed 7 times now:

1/22/2024 9:07:45 AM	[error] Error reported by file upload server: [wah2_eas25_n1z2_201412_24_1002_012238572_0_r1345862682_out.zip] locked by file_upload_handler PID=2762255	


Edit: I now have a total of 6 uploads across 3 machines, with this same problem.
7) Message boards : Number crunching : New Work 2024 (Message 70139)
Posted 17 Jan 2024 by zombie67 [MM]
Post:
Other projects don't use these same impossible rules.
I can't remember which but at least one other project does. I don't know enough about BOINC server to say whether it is possible to turn this feature on for some task types and not others but given the issue with the missing libraries problem for Linux tasks, even when the number of resends was upped to five, there were still a lot of tasks going to hard fail because of machines that crash everything.


The 1/day rule doesn't change the amount of tasks that "hard fail". It just delays the eventual result. And delays the valid completion of the rest.
8) Message boards : Number crunching : New Work 2024 (Message 70136)
Posted 17 Jan 2024 by zombie67 [MM]
Post:
The server side rules for this need to be modified. Other projects don't use these same impossible rules.
I disagree - the rule protects the server from wasting time sending out tasks to a machine likely to break the next one. It doesn't matter if it's the task's fault, the point is it's better off sending the next task to another machine.

I disagree with your disagreement. There are still 16k tasks just waiting to be sent. There are no "other machines" at this point.

Edit: And there is no harm sending a task to a bad machine. It just gets resent to the next. This is a feature, not a fault.
9) Message boards : Number crunching : New Work 2024 (Message 70129)
Posted 17 Jan 2024 by zombie67 [MM]
Post:
Now a few of my computers according to the error log are limited to a quota of 1 task for the day. I don't believe my computers are the issue. I believe a batch of bad tasks are what put me in this position. Any way to fix this?

The limitation will be lifted once your boxes return some completed tasks.

Yeah, this is a real problem. I have a 16 core machine with this issue. And even the faster machines will take 6 days to complete a task. So 16 cores...with a single task...and each day it gets to add one more. Except it got an error that second say. Now 16 days before it fills up the cores. Unless there is another error(s) in the duration.

Perhaps after 6 days+, something changes? But with task errors mixed in who knows?

The server side rules for this need to be modified. Other projects don't use these same impossible rules.
10) Message boards : Number crunching : New Work 2024 (Message 70118)
Posted 16 Jan 2024 by zombie67 [MM]
Post:
Worth noting, unlike past practice with CPDN these have a 3 month deadline rather than a year or more.

This is great news!
11) Message boards : Number crunching : Resends. (Message 70055)
Posted 19 Nov 2023 by zombie67 [MM]
Post:
Is it possible to (say) just cancel any tasks that have not reported a trickle in the past month? For those batches where any results are no longer useful, of course.
12) Message boards : Number crunching : Resends. (Message 70051)
Posted 17 Nov 2023 by zombie67 [MM]
Post:
I am confused by this. According to the server status page, there are way more than two batches in progress.
13) Message boards : Number crunching : Batch 996 Weather@Home2 East Asia25 (Message 69947)
Posted 19 Oct 2023 by zombie67 [MM]
Post:
My 40+ uploads finally went through over night. First time the transfers tab has been empty in weeks.
14) Message boards : Number crunching : Batch 996 Weather@Home2 East Asia25 (Message 69918)
Posted 17 Oct 2023 by zombie67 [MM]
Post:
The max number of connections to the Korean upload server has been increased from 256 to 1000. At the time Andy@CPDN made the change there were 116 active connections. IT in Korea are investigating further and I'll report back if they find anything.


FWIW, no change from my end. Pending uploads is now up to 36 for me.
15) Message boards : Number crunching : Batch 996 Weather@Home2 East Asia25 (Message 69884)
Posted 16 Oct 2023 by zombie67 [MM]
Post:
The IT guy in Korea is keen to find out why people are having problems. So again, it's getting looked at. Korea are a new server for CPDN.


Do any of the back end guys not run the projects on their own home machines? Shouldn't they be seeing the same thing?

FWIW, I am up to 20 pending transfers, with the largest being 50 attempts. I tried setting each of the machines to 100k upload speed, with no luck. But I am not sure that helps, since all my machines are coming from my single IP address. So even if the speed is limited per machine, the speed per IP address can be larger if there are multiple attempts at the same time.
16) Questions and Answers : Windows : Upload issue. (Message 69823)
Posted 13 Oct 2023 by zombie67 [MM]
Post:
For 'stuck' files, what is the retry number people are getting?

I have 15 stuck uploads. The retries range from 7 to 42 at this time.
17) Message boards : Number crunching : Batch 996 Weather@Home2 East Asia25 (Message 69779)
Posted 12 Oct 2023 by zombie67 [MM]
Post:
Of the 54 tasks I have in progress, 3 have now completed. But they are stuck uploading, and cannot report. Is this what is going to happen to all 54 tasks? Am I wasting my time and energy cr4unching these tasks? Or is there a solution in the works, about to be rolled out?
18) Message boards : Number crunching : Batch 996 Weather@Home2 East Asia25 (Message 69765)
Posted 11 Oct 2023 by zombie67 [MM]
Post:
In addition to 12 zip files, I now have a completed task that cannot upload.
19) Message boards : Number crunching : Batch 996 Weather@Home2 East Asia25 (Message 69745)
Posted 10 Oct 2023 by zombie67 [MM]
Post:
FWIW, I have 7 zips that cannot upload. "transient HTTP error"
Andy's just informed me that he's restarted the httpd server on the Korean machine. It was running & not out of space, but rather alot of uploads and most likely stale connections. Hope that's got stuck uploads moving again.

If it misbehaves again, pls post it here.


I have 9 now stuck. I just now tried to upload, with no success:

1182081	climateprediction.net	10/10/2023 9:08:03 AM	Started upload of wah2_eas25_a2h5_200112_24_996_012226757_2_r1545933914_8.zip	
1182214	climateprediction.net	10/10/2023 9:08:52 AM	Temporarily failed upload of wah2_eas25_a2h5_200112_24_996_012226757_2_r1545933914_8.zip: transient HTTP error	
1182215	climateprediction.net	10/10/2023 9:08:52 AM	Backing off 05:00:54 on upload of wah2_eas25_a2h5_200112_24_996_012226757_2_r1545933914_8.zip	
1182216			10/10/2023 9:08:53 AM	Project communication failed: attempting access to reference site	
1182217			10/10/2023 9:08:54 AM	Internet access OK - project servers may be temporarily down.	
20) Message boards : Number crunching : Batch 996 Weather@Home2 East Asia25 (Message 69711)
Posted 9 Oct 2023 by zombie67 [MM]
Post:
FWIW, I have 7 zips that cannot upload. "transient HTTP error"


Next 20

©2024 climateprediction.net