climateprediction.net home page
Posts by Richard Haselgrove

Posts by Richard Haselgrove

41) Questions and Answers : Getting started : Sign Up broken (Message 70326)
Posted 6 Feb 2024 by Richard Haselgrove
Post:
Thanks for letting us know. I've just added one more bog-standard Windows 7 machine, from the address offered by the 'Attach to project' wizard, and that sailed through too.
42) Message boards : Number crunching : Batch 1005 WAH2 NZ region (Message 70324)
Posted 6 Feb 2024 by Richard Haselgrove
Post:
Putting the upload11 url into a browser just gives you the Apache test page:

This page is used to test the proper operation of the Apache HTTP server after it has been installed. If you can read this page it means that this site is working properly. This server is powered by CentOS.
43) Message boards : Number crunching : Batch 1005 WAH2 NZ region (Message 70317)
Posted 5 Feb 2024 by Richard Haselgrove
Post:
BOINC reports that CPDN is using 1.71 GB for one eas25 task and one nz25 task (and probably including some residual program files from older runs).

The nz25 task itself (at a little over 40% done) has a working set size of 263.69 MB.

Check those figures against the amount of space remaining on your BOINC data drive, and check what proportion of the available space BOINC is allowed to use.
44) Message boards : Number crunching : Trickles stop new work arriving (Message 70310)
Posted 4 Feb 2024 by Richard Haselgrove
Post:
A sidebar on all this. When a task finishes, it first reports a final trickle, and (a few seconds later) starts to upload the final data file. Here's the timing on one of my machines:

04/02/2024 09:27:28 | climateprediction.net | Sending scheduler request: To send trickle-up message.
04/02/2024 09:27:29 | climateprediction.net | Project requested delay of 3636 seconds
04/02/2024 09:29:52 | climateprediction.net | Computation for task wah2_nz25_n2fo_200705_25_1005_012257314_2 finished
04/02/2024 09:29:58 | climateprediction.net | Finished upload of wah2_nz25_n2fo_200705_25_1005_012257314_2_r1186546406_out.zip (1265821 bytes)
If you're using the project setting for the maximum number of tasks in progress, and you have that number running, you can't have a spare task ready to start immediately - that machine will have to wait for about 58 minutes before reporting/fetching.

I'm thinking of trying a project setting of n+1 tasks, and using other tools like 'no new tasks' to control the actual resource use.
45) Questions and Answers : Windows : //climateprediction.net/ failed: transient HTTP error (Message 70307)
Posted 3 Feb 2024 by Richard Haselgrove
Post:
No.

This problem is generic, not limited to Windows. See the related discussion in Getting started.
46) Message boards : Number crunching : Trickles stop new work arriving (Message 70302)
Posted 3 Feb 2024 by Richard Haselgrove
Post:
Then suspend a few unstarted tasks for the lower priority project, and let it work off the cache for a while. If you've suspended enough for a work fetch to be needed, it will be done alongside any new trickle reports at the end of the backoff hour.

CPDN doesn't need new work often enough to make that an onerous chore.
47) Message boards : Number crunching : Trickles stop new work arriving (Message 70300)
Posted 3 Feb 2024 by Richard Haselgrove
Post:
Actually, forget that. I think your basic premise is wrong.

If a trickle becomes due while scheduler contact is backed off (perhaps because of a trickle report from another task), it will be held in a queue until the backoff time has passed.

Then, the scheduler will be contacted and all pending operations will be completed in a batch - a work fetch request (if deemed necessary), and all pending trickles reported.
48) Message boards : Number crunching : Trickles stop new work arriving (Message 70299)
Posted 3 Feb 2024 by Richard Haselgrove
Post:
Suspend network activity until the timer runs down, and everything can be done in a single burst when you allow it again.
49) Message boards : Number crunching : Are the relevant people aware www.climateprediction.net is down? (Message 70250)
Posted 31 Jan 2024 by Richard Haselgrove
Post:
Not for me:

DNS_PROBE_FINISHED_NXDOMAIN
It may be intermittently available while they're working on it?
50) Message boards : Number crunching : Multithread - why not? (Message 70239)
Posted 30 Jan 2024 by Richard Haselgrove
Post:
A programmer could have a quick read-through of comment lines 18 - 52 of https://github.com/BOINC/boinc/blob/master/client/cpu_sched.cpp.

In particular, we could look at the implementation of lines 44, 49 - 52:

//      Don't run a job if
//      - its memory usage would exceed RAM limits
//          If there's a running job using a given app version,
//          unstarted jobs using that app version
//          are assumed to have the same working set size.
51) Message boards : Number crunching : Multithread - why not? (Message 70234)
Posted 30 Jan 2024 by Richard Haselgrove
Post:
I heard LHC maintains the client code now?
Not quite. LHC oversees the final testing and release/deployment of finished server packages, but the raw code and client package release is still under the direction of David Anderson in Berkeley.
52) Questions and Answers : Getting started : Sign Up broken (Message 70225)
Posted 29 Jan 2024 by Richard Haselgrove
Post:
Likely to be the same problem. Last week it was just the server rejecting connections: now the whole climateprediction.net domain seems to missing from dns. And CPDN can't fully function without it.
53) Questions and Answers : Getting started : Sign Up broken (Message 70195)
Posted 24 Jan 2024 by Richard Haselgrove
Post:
Failing to attach a new test machine to the main project. It baulks at 'Scheduler list fetch', and the given url is https://www.climateprediction.net/index.php

That's the same as the banner address above, and that's giving a 503 error in a browser.
54) Message boards : Cafe CPDN : Top participants page (Message 70159)
Posted 19 Jan 2024 by Richard Haselgrove
Post:
An alternative view of the same data is at https://www.boincstats.com/stats/2/user/list/16/0/0
55) Message boards : Cafe CPDN : Top participants page (Message 70156)
Posted 19 Jan 2024 by Richard Haselgrove
Post:
The stats for any given computer or user are only updated when a 'credit event' happens - basically, whenever a computer on the account reports a trickle or a completed task. If a computer doesn't contact the server, their last known readings are preserved for ever. You're probably seeing left-over figures from the massive OpenIFS run last Christmas.

That's why you can sometimes see active RAC figures on projects which have closed down and stopped sending new work, like SETI@home
56) Message boards : Number crunching : New Work Announcements 2024 (Message 70149)
Posted 18 Jan 2024 by Richard Haselgrove
Post:
So, yes, there is 'harm' in sending tasks that are known to likely fail on machines. We end up with more hard fails.
Not to mention that internet bandwidth is not a zero-cost resource, in either climate or financial terms.
57) Message boards : Number crunching : New Work Announcements 2024 (Message 70134)
Posted 17 Jan 2024 by Richard Haselgrove
Post:
Adding another failure:

wah2_eas25_g3ue_201812_24_1003_012247044_0

This is my travelling laptop, so I don't usually run CPDN on it (because of the restarts). I'm not going anywhere until winter is over, so I thought I'd try it static to observe the 'daily quota' issue.

Task failed after 2 minutes with

<stderr_out>
<![CDATA[
<stderr_txt>
Signal 11 received: Segment violation
Signal 11 received: Software termination signal from kill 
Signal 11 received: Abnormal termination triggered by abort call
Signal 11 received, exiting...
14:30:48 (16964): called boinc_finish(193)
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=16884, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=16964, selfPID=6608, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_ain::Monitor...
14:30:52 (6608): called boinc_finish(0)
</stderr_txt>
'Signal 11' might be expected under Linux, but this is native Windows 11. Poking around before reporting it, I found a file

boinc_ufs_cpdnout_out.zip

in the slot directory, but the name is confusing: it's actually a plain-text file containing the single line

<status>0</status>

Which doesn't help, either. Willing to delve deeper if anyone has any questions/suggestions.
58) Message boards : Number crunching : New tasks issued December 2023 (Message 70090)
Posted 5 Dec 2023 by Richard Haselgrove
Post:
OK
59) Message boards : Number crunching : New tasks issued December 2023 (Message 70088)
Posted 5 Dec 2023 by Richard Haselgrove
Post:
Did you want one in particular on the main site or would something on the dev site do?
Thanks, Glenn.

You've seen the discussion between me and head office. They want to run a few tasks through with the extra server-side logging, to confirm the latest theory, and I offered to monitor the external reports again for the same task, to check that both methods point in the same direction. For that, it could be any task, but it would be good to use an agreed task number, so we knew that the comparison was fully justified.

I'm also mindful of your post 70057, where you say

A new Weather-at-Home NZ25 batch is being prepared & tested. It will go out before the Christmas break
- it would be nice to get the credit issue sorted once and for all before that goes out.

With regard to the dev site - I did actually receive several tasks from your Cubic Octahedral tco95 batch on 26 Nov, just as we were looking at the server logs. But they were shortened tasks, running about 90 minutes, and - to coin a couple of phrases - they produced five "upload trickles", but only one "credit trickle", at the very end. That won't do for this test - we need the intermediate credit trickles too.

What might work: use the dev site for the test, but for a WaH2 task - a full-length one, lasting around 8 days and producing 24 of each type of trickle. The server log should confirm the theory within two or three days, and we could then test the proposed revision to the credit script, while the same task is still running. That might be messy, but informative.
60) Message boards : Number crunching : New tasks issued December 2023 (Message 70086)
Posted 5 Dec 2023 by Richard Haselgrove
Post:
Anyone reading here received one of the 38 IFS tasks shown as 'in progress' on the main site server status page this morning?

I'd like to monitor the progress of at least one, to check progress on the credit issue: I've been in touch with the project team, and we think we've identified a possible cause, but none of my machines picked up a job from this release.

A task, workunit, or host ID number would be most helpful. Many thanks.


Previous 20 · Next 20

©2024 climateprediction.net