climateprediction.net home page
Posts by Aurum

Posts by Aurum

21) Message boards : Number crunching : New work discussion - 2 (Message 68867)
Posted 7 Jun 2023 by Aurum
Post:
The researcher for the EAS tasks has discussed the results with her professor and there were some concerns about the spin up results but they have decided everything is within range and the mainsite tasks will be released, "very soon."

Looking forward to it. Which app would that be?
22) Message boards : Number crunching : Big credit jump! (Message 68824)
Posted 28 May 2023 by Aurum
Post:
The HADAM4 task I am running has just sent its first data and trickle uploads. I see that the new credit script has instantly granted the credit and it is commensurate with the amount of credit granted for these tasks in the past, i.e. 1/5 of the total credit for a 5 month N216 resolution task of that type. It is good to see the credit script works.

Me too. I've got 3 HADAM4 WUs running and one sequestered on a retired computer that I'll either finish or abort.
23) Message boards : Number crunching : Server Status page questions (Message 68606)
Posted 19 Mar 2023 by Aurum
Post:
What is happening with these work units?
In the case of the region independent tasks, I doubt anything is happening. The research that used these is long finished. However I do see the very occasional user returning one on the server status page. CPDN has in the past granted credit for work done after the deadline. At a time when tasks even on a reasonably fast machine of the day could take over six months I don't think this was unreasonable. I hope this is not happening on more recent tasks but I have no idea whether it is or not.

Which applications exactly are the "region independent tasks?"
I keep getting hadam4 WUs and I sure do NOT want to waste my electric bill on useless garbage.
If there's obsolete WUs circulating then the project should issue "server aborts" for all of them and clear the decks of the flotsam and jetsam.
24) Message boards : Number crunching : New work discussion - 2 (Message 66069)
Posted 7 Sep 2022 by Aurum
Post:
Will the new work have user-friendly checkpointing?
I sure would love to run climate & weather models. I searched for "checkpoint" and found nothing about it.
As I recall checkpoints were 4 hours or so apart. That makes it very difficult to deal with heatwaves and TOU metering.
Also looks like CPDN wants to use every CPU thread on your computer. Hopefully they'll fix that bug too.

Edit: Found some info but not sure how old it is: https://www.climateprediction.net/getting-started/support/technical-faq/#no_tasks_available
How long does a Timestep take in real time?
"A Timestep represents a 1/2 hour of model time (not realtime)."
"Climateprediction.net checkpoints every 144 Timesteps..."

How do we make backups of a WU in-progress?
"More worrying is that a computation error loses more work. What is the appropriate reaction to this? Complaining is unlikely to be useful as trying to make the Work Unit smaller has been considered and rejected as not practical. A better reaction would be to decide to make a backup from time to time so if you do suffer an error, you can recover without losing too much work."
25) Message boards : Number crunching : New work Discussion (Message 64774)
Posted 3 Nov 2021 by Aurum
Post:
So I do I know that any of my work will actually be used??? How do I prevent wasting my time and money doing futile work???
26) Message boards : Number crunching : New work Discussion (Message 64773)
Posted 3 Nov 2021 by Aurum
Post:
Not only does this project have the worst performing work server in all BOINCdom it's so rude. I just turned in 7 N144 completed tasks and they were recorded as Abandoned.
That'll be the fault of the server software supplied by BOINC, rather than anything CPDN has done.
Are you saying it's BOINC's fault that Oxford did not send out a Server Abort signal when they abandoned the N144 project???
27) Message boards : Number crunching : New work Discussion (Message 64772)
Posted 3 Nov 2021 by Aurum
Post:
And the researchers are well aware that these models take a long time to run.
This "BOINC stuff" is only a small part of the research, more "a special treat", rather than "the main course(s)".
And it really shows by how poorly they run a BONIC server.
They're so lazy they don't even send out a Server Abort when they abandon a project. Last night I completed 7 N144 WUs and they called them Abandoned. That's shameless. That's about seven CPU months of work I could've done for a project that actually cares.
28) Message boards : Number crunching : New work Discussion (Message 64771)
Posted 3 Nov 2021 by Aurum
Post:
or I get a year's worth of work in one delivery and must abort almost all of it.
I have never received close to even six months of work even when work cache set to maximum.

"Preferences for this project" page to allow the BOINCer to specify how many WUs of which project they'd like to download and maintain on their computer.
In the past, CPDN used to allow users to specify which types of task they could receive, N216, N144 etc though this was before those particular types of task made it onto the drawing board but you get what I mean. I and at least one or two of the other moderators would like this but we have been told it isn't going to be changed, at least in the short term.

I assume, I have never had some of the scheduling problems you have because I only run projects other than CPDN when there is no work available here.

Windows tasks all get snapped up within a couple of days of appearing or even less, so on that front the only way more work can be done is for more scientists who want to do the areas of research that is suited to that task type.

I've gotten a year's worth of work several times, most recently a couple of days ago.
The main point is to specify the number of WUs to send.
29) Message boards : Number crunching : New work Discussion (Message 64769)
Posted 3 Nov 2021 by Aurum
Post:
Not only does this project have the worst performing work server in all BOINCdom it's so rude. I just turned in 7 N144 completed tasks and they were recorded as Abandoned.
30) Message boards : Number crunching : New work Discussion (Message 64768)
Posted 3 Nov 2021 by Aurum
Post:
Then if it's at all possible make the checkpoints closer together.
In the computing preferences menu item in "Options" there is a box :-checkpoint at most every.... seconds".
That does nothing. Mine is set to 10 minutes.
31) Message boards : Number crunching : New work Discussion (Message 64762)
Posted 2 Nov 2021 by Aurum
Post:
What improvements do you have in mind?
Nothing even comes close to fixing the CPU cache issue but a few upgrades could make this project a whole lot more user-friendly.
I'd start by fixing the work delivery bugs. Several projects use the "Preferences for this project" page to allow the BOINCer to specify how many WUs of which project they'd like to download and maintain on their computer. Also fix the perpetual 60-minute project backoff. It makes no sense how work is delivered, it's just feast or famine. I either go days or weeks getting no WUs on a particular computer, even though the Server Status page says there's work available and another computer is getting work, or I get a year's worth of work in one delivery and must abort almost all of it. I can't think of another BOINC project that behaves this way.
16946 climateprediction.net 11/2/2021 2:14:19 PM update requested by user
16950 climateprediction.net 11/2/2021 2:14:25 PM Sending scheduler request: Requested by user.
16951 climateprediction.net 11/2/2021 2:14:25 PM Not requesting tasks: don't need (CPU: ; NVIDIA GPU: )
16952 climateprediction.net 11/2/2021 2:14:27 PM Scheduler request completed
16953 climateprediction.net 11/2/2021 2:14:27 PM Project requested delay of 3636 seconds
"Don't need" is not true. I have one 921 WU running and would like to run another. If I do get lucky and I'm blessed with a second WU I'd switch to "No new work" and switch back after one completed.

Then if it's at all possible make the checkpoints closer together.
32) Message boards : Number crunching : New work Discussion (Message 64761)
Posted 2 Nov 2021 by Aurum
Post:
This project could easily do ten or twenty times as much work if they'd just make some improvements.
Only if it had ten or twenty times as many researchers asking Oxford to send work out for them.
Ok. This project could easily do ten or twenty times as much work per unit time if they'd just make some improvements.
The main problem with that is not owning the source code - they’re not allowed to make changes to most of it.
I assume the UK MetOffice owns the code. Or is it someone else?
The biggest problem I see is the CPU cache congestion problem. Running too many WUs on a computer slows it down to a snail's pace. I keep playing around trying to figure out the most CP work units I can run on a computer. I've tried disabling hyperthreading and that works better but I still can't run all CPUs because it still slows down. Besides if I can't run every CPU thread with CP then I'd like to support ARP etc. Right now as my older WUs complete I detach from CP and then reattach to sweep up the debris it leaves behind. Then I specify a max of two CPUs and under BOINC preferences use at most 33/36=92%. That leaves some headroom but it's still noticeably faster if I run only one CP WU. It's frustrating when I know I could be running 18 or more if not for the CPU Congestion Issue.
Last time I suggested this someone said they'd have to rewrite a million lines of Fortran. I'm not a coder but I would think they'd only need to modify aspects of the code.
https://www.ibm.com/docs/en/aix/7.2?topic=implementation-design-coding-effective-use-caches
"Repackaging techniques can yield significant improvements without recoding..."
https://hackernoon.com/programming-how-to-improve-application-performance-by-understanding-the-cpu-cache-levels-df0e87b70c90
This guy says his code ran 50x faster after optimizing for CPU cache usage.
I've even seen a book dedicated to efficient CPU cache coding.
33) Message boards : Number crunching : New work Discussion (Message 64756)
Posted 1 Nov 2021 by Aurum
Post:
This project could easily do ten or twenty times as much work if they'd just make some improvements.
Only if it had ten or twenty times as many researchers asking Oxford to send work out for them.
Ok. This project could easily do ten or twenty times as much work per unit time if they'd just make some improvements.
34) Message boards : Number crunching : New work Discussion (Message 64754)
Posted 1 Nov 2021 by Aurum
Post:
I was trained in Physics at 'the other place' - the Cavendish Laboratory in fenland. I well remember two bits of advice that they drilled into me:
1) A number is useless unless the units are specified (watch out for those binary/decimal definitions of a megabyte).
2) Do every calculation twice. Once, to the highest precision the available hardware (a slide rule, in my day) is capable of. And again, on the back of an envelope or beer mat, to order-of-magnitude accuracy only. Catches those wayward decimal points.
When I got my physics degree using my dad's slide rule we learned that the three things physicists do most are: add and subtract zero, multiply and divide by one, and call it something else. :-)
It's amazing how much math my generation can do in our heads compared to kids today that need a calculator to do the most rudimentry arithmetic.

This project could easily do ten or twenty times as much work if they'd just make some improvements.
35) Message boards : Number crunching : New work Discussion (Message 64740)
Posted 31 Oct 2021 by Aurum
Post:
Put this in your cc_config and it won't happen:
<max_file_xfers_per_project>1</max_file_xfers_per_project>
36) Message boards : Number crunching : New work Discussion (Message 64710)
Posted 26 Oct 2021 by Aurum
Post:
The trick with these is to stagger the completion times.
Suspend all but one, give it an hours head start, Resume one and wait another hour, and so on.
That way all of the files won't get bunched up waiting for a turn to upload.
I already decided that I'm only going to run one CP WU per computer. So I've already got that covered.
And make sure that nothing else wants to use your net connection at an upload time.
Now I'm confused. I thought the error under discussion is:
Output file hadam4h_h02w_200802_4_920_012115322_0_r75796790_4.zip for task hadam4h_h02w_200802_4_920_012115322_0 exceeds size limit.
Now instead of exceeding a file size you're talking about how many files are being uploaded at the same time. I'm now running 3,201 WUs of various projects so that will be next to impossible.
One of these commands in ones cc_config file may be useful:
<max_file_xfers>32</max_file_xfers>
<max_file_xfers_per_project>32</max_file_xfers_per_project>
37) Message boards : Number crunching : New work Discussion (Message 64707)
Posted 26 Oct 2021 by Aurum
Post:
I have a few 920s running. Should I abort them and lose a week's work now or let them fail at the end and lose a month's work?
What is this catch and set a new limit you guys are talking about? Is that something we civilians can do?
38) Message boards : Number crunching : Site problems (Message 64695)
Posted 25 Oct 2021 by Aurum
Post:
I think it's a problem in the western US from this big storm that just hammered us. The ULs keep moving, that's the important thing.
39) Message boards : Number crunching : Site problems (Message 64691)
Posted 25 Oct 2021 by Aurum
Post:
Is the upload speed normally capped at 21 kBps?
40) Message boards : Number crunching : Site problems (Message 64659)
Posted 20 Oct 2021 by Aurum
Post:
Check the batch number. If it's closed, then that's the reason.
Those that know how to properly run a BOINC server system would issue a Server Abort signal and that would never happen.


Previous 20 · Next 20

©2024 climateprediction.net