climateprediction.net home page
Posts by marmot

Posts by marmot

1) Message boards : Number crunching : WU won't upload (Message 61607)
Posted 25 Nov 2019 by marmot
Post:

2: That is a hadcm3s.
Up until recently, when the problem was found and fixed, that type of model had a fault in the trickle_up code, whereby the 1st trickle was generated with the correct info, but all subsequent trickles were identical.
So the server code would get the 2nd (and all others) trickle, compare it with what it already had, then discard it as a duplicate.
So people only got credit for one trickle for this type of model.
And, as you've already received that, you won't be getting more credit.



After aborting, likely because of the above error and subsequent patch, the WU is marked completed.

21710685	11381390	15 Jun 2019, 6:10:47 UTC	21 Nov 2019, 8:00:36 UTC	Completed	779,944.57	779,508.50	3,111.26	UK Met Office HadCM3 short v8.34 windows_intelx86
2) Message boards : Number crunching : WU won't upload (Message 61594)
Posted 22 Nov 2019 by marmot
Post:

1: The so called deadline is just a number to keep BOINC happy. It's NOT when the task is required. Which is: ASAP
.


If required return is ASAP then maybe lessen the deadline to 10-20 days from over 200 days (IIRC).
People actually USE the deadline to make decisions when manually prioritizing WU's.

I would have let the WU complete or aborted it before shutting that machine down for 3 months during the summer (climate crisis: a/c to cool servers in a home when we're in the hottest global streak of years on record is just species' suicide).
3) Message boards : Number crunching : no new work units (Message 61588)
Posted 21 Nov 2019 by marmot
Post:
"even though it's set to additional 0.01 days of work (which should equate to every 14 minutes and appears to be the smallest increment BOINC accepts in that field)."

I thought that "set to additional days work" (actually "Store additional Days work" in my BOINC manager version), enabled you to download and store additional work units for your computer to crunch on future days.

Surely it has nothing to do with how frequently the requests for work are made - which is controlled by CPDN, not the BOINC Manager, to no more frequently that one hour as Les pointed out.



I asked at the main BOINC forums how to get boinc.exe to request tasks at it's fastest rate and setting additional work to 0.01 was the response. You can check the forums; you'll likely find my handle 'marmot' in the search.

Other settings to increase frequency of requests was to exert control over the work cache to limit WU's to very few in number such as setting resource share to 0 on each project (some don't accept less than 1, some projects are still so aggressive their WU's dominate all other projects vying for spots) and, for project with newer BOINC server software, and the options implemented, explicitly setting number of downloaded WU's.

Local app_config.xml settings of <max_concurrent> or ,project_max_concurrent> aren't reported to the server and can worsen cache flow as the project sends enough work to fill all available cores yet only gets to run the max_concurrent number at once. Resource share 0 can be a crucial supplement to max_concurrent settings when you are assigning cores to multiple projects.
4) Message boards : Number crunching : WU won't upload (Message 61586)
Posted 21 Nov 2019 by marmot
Post:
WU was aborted.

Thanks for the answer.
5) Message boards : Number crunching : no new work units (Message 61501)
Posted 9 Nov 2019 by marmot
Post:


For Windows work, there's a "window of opportunity" which lasts for around 30 to 90 minutes, because of the huge numbers of Windows machines waiting for work.
Then it's back to waiting for a few weeks.


So we need to have a machine that has no work units running at all, CPDN the only project accepting work and set to only request particular WU's to have a shot at WU's we've personally never crunched before.

Even if I use 0 resource share on all the other projects and there is 1 WU running per core, and no WU's in queue; BOINC slows the request for work down to every 60 minutes even though it's set to additional 0.01 days of work (which should equate to every 14 minutes and appears to be the smallest increment BOINC accepts in that field).
I've played with leaving 1 or 2 of 8 cores open and the request rate is still slowed but maybe my brain is misremembering that test. I should try it again and see if the requests are every 14-15 minutes with 2 cores always open (only possible on a project with server controlled number of WU's downloaded).
6) Message boards : Number crunching : WU won't upload (Message 61499)
Posted 8 Nov 2019 by marmot
Post:
BOINC has been retrying to upload the results on https://www.cpdn.org/result.php?resultid=21710685 for 10 days now.

Restarted the BOINC install but every other thing to try (reset project, detach/reattach) will lose the WU.

The WU is not past it's deadline but sat idle during the summer while the computer was down for the hot months.
7) Message boards : Number crunching : Free-DC reports negative credits today for CPDN (Message 60419)
Posted 24 Jun 2019 by marmot
Post:
Any idea what happened with the export?

My CPID shows -19,129 today.
8) Questions and Answers : Getting started : Avatar issue (Message 60240)
Posted 29 May 2019 by marmot
Post:

And personally, I think that this board looks a lot cleaner without them.



You're missing out.

Majestic Alpine Marmot Surveys His Alpine Meadow painting is beautiful.

Woah, your security even broke IMG tag posting.

OK, an 8k JPG on IMGUR looks horrible because of magnification pixalation.

URL link https://imgur.com/MUW9i3I
9) Message boards : Number crunching : Error 22 on machine that successfully ran same WU type in April. (Message 60233)
Posted 28 May 2019 by marmot
Post:
Credits are awarded each time data is received from a task (unlike other projects, which require completed tasks). Your task apparently failed to report the first reporting point. Sorry about that - we all lose some minutes of processing that way...



I understand.

How hard would a script that recognized "Model crashed: ATM_DYN : INVALID THETA DETECTED", awards a base 100 credit for the failed model, then lists these WU's as invalids, be?

Guess the researchers are getting their Invalid Theta percentages, and scrutinizing other various failures, from a separate script that gathers statistics on all failed and invalid WU's.

It's just a thought from the standpoint that people getting the error won't waste time at helpdesk diagnostics trying to discover some issue with their machines. Minimal credit and marked invalid; people might just say "huh, that's odd" and not bother the helpdesk staff (like I did).
10) Message boards : Number crunching : Daily scores = more crunchers (Message 60232)
Posted 28 May 2019 by marmot
Post:
Ok, then I'll leave.


BYE BYE


Andy was working on an alternative script that wouldn't use so many resources but I have heard nothing of that since the big crash a few months ago.

.
.
.

A bit like Brexit, whatever happens, many will not be happy! Personally I think the current system is acceptable. When other important work is completed, it may be that Andy has time to return to work on the new script but until/unless that happens I doubt if there will be any change.


That alternate script would be appreciated.

I'd love to crunch this regularly; this was my first project when joining BOINC in 2007 after leaving SETI@Home in 1999. As of right now I've compromised: I maintain 300 magnitude for GRC as a way to pay for the other projects that aren't on our whitelist. The donated time went to RakeSearch, Primegrid and, in a vanity goal, an attempt to get my name on the 180 list at WUProps by giving 100 hours to a bunch of projects' WU's.

Hot season approaches and our local utility still have some gas and coal fired plants, so shutting down machines rather than use A/C. Crypto currency shouldn't hasten climate change; and when it uses electricity, it should heat homes and provide useful information rather than attempting to crack senseless codes.

Aurum's attitude doesn't reflect the entire GRC community. We chose to focus on GRC, instead of other cryptocurrencies, because the science is important. Many off topic discussions, in our chat rooms, are about the advancements in science.
Many of us recognize an oncoming job loss to AI, and an ever widening world-wide wealth gap, and see that our managing machines for science has value. We should get paid for our work and are building a method to do so.

I see the women heating her house in Siberia with computers grinding for BitCoin as a wasted opportunity since she's gained heat and some money but the community didn't gain any advancement in scientific knowledge.

https://qz.com/1117836/bitcoin-mining-heats-homes-for-free-in-siberia/
11) Questions and Answers : Getting started : Avatar issue (Message 60229)
Posted 28 May 2019 by marmot
Post:
Still no success today after letting the server clear it's cache.

Avatar won't upload and no error message; just a blank white screen.
12) Questions and Answers : Getting started : Avatar issue (Message 60215)
Posted 27 May 2019 by marmot
Post:
My avatar had been up here for several years and noticed today it was black and white. Deleted it and tried to upload a color jpg, 87x99, <4kb and all I get is a blank, white website page when clicking update.

The page is:

https://www.cpdn.org/cpdnboinc/edit_forum_preferences_action.php

So I can't successfully update the avatar and my old one is gone.
13) Message boards : Number crunching : Error 22 on machine that successfully ran same WU type in April. (Message 60214)
Posted 27 May 2019 by marmot
Post:
Just have to add that there was no error on the user-client side. Nothing the BOINC user has any control over.

The work unit is at worst invalid because of a failure in the data set. And even the failure of the model because of initial conditions is something learned. A failed experiment can still teach the scientist something about their research.

As such, these work units should complete as invalid WITH credit given.

The WU did take up minimum 30 minutes of a slot that another project could have had reserved time for.

Worked on 170+ different work units now and can't remember another WU end as a 0 credit error because the calculation ended in a null result.
This would be akin to assigning 0 credit because we didn't find a prime number in a SRBase search.
14) Message boards : Number crunching : Error 22 on machine that successfully ran same WU type in April. (Message 60196)
Posted 22 May 2019 by marmot
Post:
Model crashed: ATM_DYN : INVALID THETA DETECTED


They now know that the starting values used in that model run lead to an instability.

So it's and error from data set variable starting values.


This seemed to be a configuration error, but if this error can occur from data set conditions, then all is fine and just keep crunching.
<![CDATA[
<message>
The device does not recognize the command.
(0x16) - exit code 22 (0x16)</message>
<stderr_txt>



This particular WU is hard to get and it's disappointing that it ended so quickly. Thanks for the response.
15) Message boards : Number crunching : Error 22 on machine that successfully ran same WU type in April. (Message 60186)
Posted 22 May 2019 by marmot
Post:
The only change to the machine is an added RX 550 and it's downclocked from 24x to 8x due to warming weather.
No OS changes but the AMD driver.
Running at 31.5 GB commits of 32GB RAM and 28.7GB occupied private/working.
All 32 threads in use.
Plenty of free disk space.
Currently also running, and turning in valid results for, Amicable Numbers(GPU+cores), Sixtrack (LHC), RakeSearch and SRBase long.

Did some requirement change for this WU type?

From machine: https://www.cpdn.org/cpdnboinc/results.php?hostid=1347460
<core_client_version>7.14.2</core_client_version>
<![CDATA[
<message>
The device does not recognize the command.
(0x16) - exit code 22 (0x16)</message>
<stderr_txt>

Model crashed: ATM_DYN : INVALID THETA DETECTED. tmp/pipe_dummy 2048

Model crashed: ATM_DYN : INVALID THETA DETECTED. tmp/pipe_dummy 2048

Model crashed: ATM_DYN : INVALID THETA DETECTED. tmp/pipe_dummy 2048

Model crashed: ATM_DYN : INVALID THETA DETECTED. tmp/pipe_dummy 2048

Model crashed: ATM_DYN : INVALID THETA DETECTED. tmp/pipe_dummy 2048

Model crashed: ATM_DYN : INVALID THETA DETECTED. tmp/pipe_dummy 2048
Sorry, too many model crashes! :-(
02:12:54 (1396): called boinc_finish(22)

</stderr_txt>
]]>
16) Message boards : Number crunching : Total Credit (Message 53833)
Posted 27 Mar 2016 by marmot
Post:
@Dave Jackson
As I said a few posts earlier, Andy thinks it is the pre-22011 credit that has gone, as you have been crunching since well before that it has hit you. If it isn't fixed soon, because of the weekend and public holidays, not much is likely to change till Tuesday.


I missed that. The amounts lost do seem to be about all the credit my old machine got in 2005-2008. Will see what happens later this week. My machines are focused on other projects now.

Are you all going to consider sending some kind of summary notification to our clients detailing what is the issue with credit so that people out there are updated on this issue?


@Iain Inglis

These WU are responsible for 70 to 95% of the power usage on any machine that they are run upon. It's not MARGINAL at all.


True enough, but in winter time you wont need to use the heaters so much, since the PC's heat your rooms :) And the research is valuable. I think it was marginal many years ago? Back when Athlon64's were the thing?

I read somewhere that a search on google uses as much electricity as a 60 watt light bulb does for 17 seconds.

Now back to topic :)

As an aside and not to divert the thread too greatly, the word "marginal" is sometimes used in marmot's sense of "small", but also means "at the margin" as in the expression "marginal cost". In addressing the question of 24/7 running, the word was used in the second sense and not the first: in effect, marmot argues that the marginal cost of running a CPDN model is "large" relative to the cost of the running an idle PC - that's actually a good thing, as it means the PC is efficient. However, it only produces a disagreement if "marginal" is allowed to have only one meaning, which it doesn't.


None of the included definitions, including that from economics, use "marginal" when the value talked about becomes 4 (6 hours of gaming vs 24 hours of BOINC) to 20 (100% vs 5% wattage at idle) times the original base cost wattages used by BOINC WU when run 100% of the time on 100% cores. BOINC energy usage overrides all other considerations and become the primary cost. But this isn't the topic of the thread, I just had to address a major fault of perception that seems common.
BOINC WU's are valuable but also the primary cost of electricity usage on computers running CPU/GPU intensive WU's.


17) Message boards : Number crunching : Total Credit (Message 53791)
Posted 24 Mar 2016 by marmot
Post:

2. The project itself only asks volunteers to run models in the background, so the additional energy expenditure is the "marginal" increase in electricity consumed and not the entire energy consumption of a PC that the user chooses to leave on 24/7.


This is actually quite wrong.

I have spent many years watching the temperatures and energy used by my machines as they heat my house. A CPU sitting idle will use a couple watts and hover just above room temperature. A machine used for viewing TV shows or browsing uses about 15 to 25% and pushes the curve up to 10 (on a 35W profile) to 40 watts ( on a 135W TDP). A intensive game will maybe drive the cores up to an average 35 to 70% usage and into the 25W(35W profile) to 75W(135W profile) range

4 to 8 Climate WU runs the CPUs up to 100%, drives the power profile up maximum output (35W or 135W) and the temp of the CPU's heads up to 70 to 80C.

GPU's crunching is similar to even more extreme as a machine on 24/7 just doing nothing will have a GPU in black screen and doing nearly nothing. Watching videos or playing games don't drive GPU's to like a 100% usage WU.

These WU are responsible for 70 to 95% of the power usage on any machine that they are run upon. It's not MARGINAL at all.

In 2013, global computing power consumed 10% of the world's electric power generation.

Cloud computing centers are supposedly (company advert is suspect, need a better source) using more power than the entire country of India in 2016.
18) Message boards : Number crunching : Total Credit (Message 53790)
Posted 24 Mar 2016 by marmot
Post:
If you were to look at the News and Announcements thread, I made a post about this a few hours ago.
We've been telling people for years to subscribe to that thread.




Some of us have 30 to 60 projects we are involved in, plus lives outside of BOINC and we aren't going to be checking the News threads.

If you want to get the information to us then the ONLY reliable method is a notice to the BOINC client.


FYI, instead of restoring credit the repair scripts took away even more a few minutes ago.
19) Questions and Answers : Windows : Visual Fortran Run-Time Error (Message 52684)
Posted 6 Oct 2015 by marmot
Post:
@Les Bayliss:

And climateprediction.net does NOT write the code.
It all comes from the UK Met Office, where it normally runs on their super-computers, for daily weather modelling to long term climate modelling.
All of which has been posted about many times over the years.



I'm not sure why you took this tact. You can see that I have 11 posts on these forums and obviously am not deeply involved with these projects so going after my ignorance of the years worth of posts was unusual.

@Les Bayliss:
Climate models don't like being interrupted.
Some model types are more prone to various failures than others.



This kind of fault intolerance after 10 years of climateprediction.net running on BOINC shows some failure in the project. Probably from lack of funding leading to programmers not being able to spend appropriate amounts of time hardening their code for the BOINC environment across a heterogeneous selection of user machines. I have trouble believing that FORTRAN itself hasn't been hardened to run in a multi-core modern OS.

@ryan:
The issue is further compounded because the processes are not properly cleaned up. They stick around taking up memory until the user ends them manually, logs out, or reboots the machine.

I can consistently repeat this problem by suspending tasks then taking up a bunch of extra memory (browsers, office programs, etc) then closing them and resuming the tasks. I get a slew of fortran errors but the tasks stay in Windows process viewer.

Even if models do not like being interrupted don't think it should be too hard to take the few extra milliseconds or seconds to reach a safe stopping point.


Some younger coders need to take some time looking over the apps being sent out to BOINC machines and improve the fault tolerance of the code.
Maybe some student loan forgiveness could be offered.

Maybe these comments need to be taken to a UK Met Office forum or representative since they write the code, might never read any of these forums, and ClimatePrediction.net has no power to make any changes to correct these errors.

20) Questions and Answers : Windows : Visual Fortran Run-Time Error (Message 51857)
Posted 20 Apr 2015 by marmot
Post:
All that is a bit irrelevant, as the Met Office only has apps for desktops/laptops using the x86 instruction set.
There may never be any ARM/RISC version, as professionals want the results of their daily work fast, not in a few weeks/months, as provided by a lot of BOINC users.



Your comment makes little sense as the deadlines for WU on ClimatePrediction is 1 YEAR which is the longest deadline of any project I've ever seen. If work is required from the BOINC network more quickly then smaller slices of work needs to be put out and the deadline severely decreased.

If ClimatePrediction wants to ignore the quickly growing ARM market then they are making a huge mistake as there will be a growing number of people going without desktops or laptops and using only ARM based phone and pads in the next decade.

It's already happening among the college and under crowd. Who needs a laptop when you have a Samsung Note with writing stylus which a student can get discounted. If you want people to run BOINC WU for you for years to come then catch them young and get them involved.

Politics and name recognition are also considerations as climate modeling is crucial to the future of our species.


Next 20

©2020 climateprediction.net