climateprediction.net home page
Posts by Richard Haselgrove

Posts by Richard Haselgrove

1) Message boards : Number crunching : Should full credit be given for time on non successful tasks? (Message 70683)
Posted 1 day ago by Richard Haselgrove
Post:
I think CPDN is a particularly difficult case for BOINC. Although it seems straightforward from the outside, this is the sort of project that doesn't respond well to the "Fish some second-hand heavy metal out of a skip, power it up, turn all the knobs up to 11, and walk away" approach to crunching.

I've mentioned host 1549227 on the board before: this week, host 1548623 came to my attention as well. They both fall into the 'Heavy Metal' category, with 16 thread and 32 thread capacity respectively. But they're a bit skinny on memory, with just 1GB per thread, and the hyperthreading will hit floating point speed something rotten. You can't just walk away from a machine like that, and think "job done".

Part of the problem is that BOINC Central - the main developers - seem to have adopted an approach, over the last 10 years or so, that human volunteers just get in the way: the computers know how it all works, right out of the box, and can be left to work it out for themselves. The most egregious example of this, of course, is Science United.

But BOINC isn't perfect, and doesn't cope perfectly under all conditions. Any programmer will know that instinctively, even if they don't want to talk about it in polite company. Glenn drew my attention to 1548623, and asked if I could throw some light on the tasks which had been aborted with 'EXIT_TIME_LIMIT_EXCEEDED'. I ran the numbers yesterday, and the problem turns out to be that BOINC has benchmarked the CPU at 47.15 GFlops. That's something that CPDN can't cure, and I strongly suspect that BOINC won't cure either (and because the machines are anonymously registered, we - as users - have no way of making contact and offering to help).

No, I think that the original idea of BOINC - to get volunteers interested and involved in the science, as well a enjoying the competition and the social sides - was on the right lines. But there's a lot of noise out there competing for our attention.
2) Message boards : Number crunching : Top participants RAC (Message 70665)
Posted 5 days ago by Richard Haselgrove
Post:
And his computers are hidden, too - so we have no idea what operating system he's running, and hence what task types he's likely to have processed.

But you're right - there was a credit glitch last summer. The earliest IFS runs for Linux weren't fully credited in real time, under the old system. When the new system was activated for the first time, everyone's RAC was calculated as if the work had all been processed on a single recent day. That made for a huge spike in RAC. He may have decided to retire from this project and rest on his laurels at that point ...
3) Message boards : Number crunching : Top participants RAC (Message 70663)
Posted 5 days ago by Richard Haselgrove
Post:
Is that what happens with most projects with the server code Richard?
Yes. I have a current RAC of 874.09 at SETI@home - which stopped issuing new work in the Spring of 2020.

Ironically, this project was an outlier until last year. It used a special, bespoke, credit handling system, which did keep the figures updated during a work drought. - but that system was desperately inefficient, and placed a heavy load on the project's servers. It was replaced (with a few bugs, which we've now ironed out) with a new bespoke system which brought us closer to the standard BOINC framework.

But that framework has an unspoken assumption that no BOINC project ever runs out of work or closes down ...

Edit - SETI RAC table
4) Message boards : Number crunching : Top participants RAC (Message 70661)
Posted 5 days ago by Richard Haselgrove
Post:
The list is kept up to date with each individual's current figures.

The problem is that the "current figure" only changes when the user in question is actively processing new work - and there hasn't been a lot of that recently. Hopefully, that will change soon.
5) Message boards : Number crunching : New Work 2024 (Message 70531)
Posted 23 Feb 2024 by Richard Haselgrove
Post:
What can we do about systems like host 1549227?

The anonymous owner grabbed 16 tasks for their 8 core Ryzen on 19 February: 13 of them have returned just one trickle each since then. One has crashed, showing many, many quit requests from BOINC: I got the resend, which is how it came to my attention.
6) Message boards : Number crunching : New Work 2024 (Message 70502)
Posted 21 Feb 2024 by Richard Haselgrove
Post:
Or, since all CPDN applications will be floating point intensive, and will all suffer from FPU congestion on a hyperthreaded CPU, you could use the single project-level tag instead:

<project_max_concurrent>N</project_max_concurrent>
For a full list of the available options, see the BOINC user manual.
7) Questions and Answers : Getting started : Stats missing (Message 70487)
Posted 20 Feb 2024 by Richard Haselgrove
Post:
From your account home page here, there is a link to "Preferences for this project". Or you can get there directly from

https://www.cpdn.org/prefs.php?subset=project

(you can't get there directly from the navigation bar above these posts, which is a BOINC mess-up)

In the first group, there is a check-box for "Do you consent to exporting your data to BOINC statistics aggregation Web sites?". That must be checked before the stats appear externally.

Stats are only exported once e day, so they may take a couple of days to show up - but they will go back to the beginning of your time here when they do show up.
8) Message boards : Number crunching : couldn't start app: CreateProcess() failed. Check your antivirus. (Message 70442)
Posted 19 Feb 2024 by Richard Haselgrove
Post:
I have 11 batch 1006 tasks in progress, and 2 batch 1007. The leader has reached 76%, and tail-end Charlie is approaching 28%. None has shown any sign of distress so far, and I have no stuck uploads. Fingers crossed that continues.
9) Message boards : Number crunching : couldn't start app: CreateProcess() failed. Check your antivirus. (Message 70428)
Posted 18 Feb 2024 by Richard Haselgrove
Post:
Download a new task, then suspend either the task or all computation before it gets a chance to run. Look to see if the .exe was quarantined immediately on download. If so, restore it and use whatever tools the AV program provides to say "I trust this file".

If it wasn't flagged immediately on download, look to see if the AV provides an option to scan a single file on request (most do). Scan it, and respond to any warnings/options it generates. Again, the idea it to get to the point where you can say "I trust this file".

Only then allow the task to start running.
10) Message boards : Number crunching : Are the relevant people aware www.climateprediction.net is down? (Message 70397)
Posted 15 Feb 2024 by Richard Haselgrove
Post:
Did you try clicking the banner? Your browser may be able to give you more evidence of the nature of the problem.
11) Message boards : Number crunching : New Work 2024 (Message 70394)
Posted 15 Feb 2024 by Richard Haselgrove
Post:
Can these half done workunits be handed out to others to complete? I've never been given a smaller workunit. Sounds like it should be possible. If my CPU can continue where it left off after rebooting the computer, why can't your CPU take my half done unit and continue it?
Restarting after a reboot relies on the presence of checkpoint files on your local hard drive.

CPDN uploads data zips (intermediate weather reports) 24 times during each task in the current batch, but only one 'restart' file at about the mid-point. Glenn can confirm the technicals, but I would guess that a short completion run would only be feasible if the original host had uploaded that restart file before failing.
12) Message boards : Number crunching : Are the relevant people aware www.climateprediction.net is down? (Message 70393)
Posted 15 Feb 2024 by Richard Haselgrove
Post:
There were problems at the beginning of the month, as discussed in this thread, but they seemed to have been cured.

It is possible that there was another temporary glitch just as you were trying to attach: the quickest way to check the status of https://climateprediction.net is to click on the banner at the top of this page. It opened for me just now (although slowly): it might be worth another attempt now.
13) Message boards : Number crunching : EAS batches 1001-4 (Message 70387)
Posted 15 Feb 2024 by Richard Haselgrove
Post:
Particularly plausible because yesterday was 'patch Wednesday' in UTC: the Wednesday after the second Tuesday of the month. That's the usual day Microsoft releases major update packages - even large security update packages for Windows 7, which is otherwise out of support.
14) Message boards : Number crunching : couldn't start app: CreateProcess() failed. Check your antivirus. (Message 70382)
Posted 14 Feb 2024 by Richard Haselgrove
Post:
Had a quick grok before going out to the pub, and found a relevant message board post (SETI NC 1641898 - nine years ago today!).

I did look into the possibility of whitelisting when this question was raised after the release of v0.43 last year. It turned out to require considerably more identity and security checks than opening a bank or PayPal business account, and I found I couldn't possibly qualify in any event (I don't run a personal website with my home address checkable through a WhoIs lookup on the domain name).
15) Message boards : Number crunching : couldn't start app: CreateProcess() failed. Check your antivirus. (Message 70381)
Posted 14 Feb 2024 by Richard Haselgrove
Post:
It would be most helpful if we could identify which anti-virus products are reporting a detection.

That won't ever be identifiable in your server logs: we would probably have to crowd-source it by an appeal here. So: if anyone here has noticed an AV alert relatable to CPDN since the new tasks were released, please post details here.

I don't have any recent experience of a central, industry-wide, reporting point, although I have written and released software installation packages in the past which have triggered similar problems.

Unfortunately the BOINC message board most likely to have searchable records of such events (SETI@home) has been offline all day, although it's back now. I'll put my thinking cap on in the morning.
16) Message boards : Number crunching : couldn't start app: CreateProcess() failed. Check your antivirus. (Message 70379)
Posted 14 Feb 2024 by Richard Haselgrove
Post:
I think a more likely explanation is that the potential wrong 'un is being detected by heuristics, rather than by pattern matching.

Modern AV products often have a module which examines the behaviour of the beastie in questions like:

  • Does it have a user interface? (no = stealth module)
  • Does it use a lot of resources? (yes = could be doing something nasty)
  • Does it communicate with external sites? (not a problem here, but is a problem for BOINC)
  • Have I ever heard of it before? (no = could be a threat - handle with care)

The remedy is the same - check your AV messages and logs, and report any false positives as quickly and as often as you can. They usually get the message after a few days.

Obviously, an official message from the affected institution carries more weight than any number of end-user reports.

Edit - I've reported my 8.29 executable to https://www.virustotal.com/gui/file/2bc8155ce0a9f3a44cae5a0376f6d662a393f8e21c2aff377c7f23ae307fd9f4

For comparison, 8.24 is at https://www.virustotal.com/gui/file/ba4288b2d84f24a7ae47d01f75bed804edac36d8c7eff5ded7c9bd1bf740440e

17) Questions and Answers : Getting started : Communication deferred (Message 70365)
Posted 13 Feb 2024 by Richard Haselgrove
Post:
The project-set deferral period at this project is 1 hour (3636 seconds), whatever the outcome of the update. New work will never be issued if you request it during this period.

You need to look more deeply (BOINC's Event Log is recommended) to see the state of play.
18) Message boards : Number crunching : New Work 2024 (Message 70361)
Posted 13 Feb 2024 by Richard Haselgrove
Post:
I mentioned some time ago that my travelling laptop crashed a test task with the old app, with a signal 11 at startup. That host is approaching 10% on wah2_eas25_h0k1_201012_24_1006_012259529_0.

I also have a tiny, low power, Celeron box (about the size and shape of a portable CD library) - picked up to test a 64-bit BOINC error on some low power processors, now resolved (host 1548871). That one is also running a task successfully, but has only reached 3% over the same timescale.
19) Message boards : Number crunching : New Work 2024 (Message 70349)
Posted 12 Feb 2024 by Richard Haselgrove
Post:
Not sure what happened to it or where to look to find out.
You can look in your computer's task list from your home page on this website.

All tasks for computer 1367467

Unfortunately, in this particular case, not much evidence has been preserved.
20) Message boards : Cafe CPDN : Top participants page (Message 70342)
Posted 9 Feb 2024 by Richard Haselgrove
Post:
Looking at a few more of the names on the 'top participants' page, the are plenty whose computers aren't hidden. You can usually work out why the figures haven't changed: some only run CPDN on Linux, others have multiple computers registered but they haven't contacted the servers for a long time.

My own figures (buried several pages down the list!) seem to be fully up-to-date.


Next 20

©2024 climateprediction.net