climateprediction.net home page
Posts by bernard_ivo

Posts by bernard_ivo

21) Message boards : Number crunching : Server status page shows different numbers for tasks in progress (Message 62336)
Posted 22 Apr 2020 by bernard_ivo
Post:
That would be great. Would it be possible to remove orphaned WUs also?
These are WUs still "In progress" but no longer at the user's system (even after detach/attach)
22) Message boards : Number crunching : No trickles on webpage (Message 62275)
Posted 1 Apr 2020 by bernard_ivo
Post:
Trickles seem to appeared today. Thanks.
23) Message boards : Number crunching : No trickles on webpage (Message 62270)
Posted 30 Mar 2020 by bernard_ivo
Post:
Hi
It seems there might be a problem with trickles, after 21-22 March
I have at least 3 N216 that do not have their 3&4 trickle on the web despite they finished successfully and upload queues are empty.

here is an example https://www.cpdn.org/cpdnboinc/result.php?resultid=21871312
24) Message boards : Number crunching : New work Discussion (Message 62163)
Posted 27 Feb 2020 by bernard_ivo
Post:
I still believe one way to go is to shorten WU's deadline. There is not so much output of completed windows tasks per 24h compared to tasks in progress. Linux boxes though currently fewer send back higher % tasks than window boxes relative to tasks in progress. This might suggest that even if a user is not hoarding, still tasks may be at rest due to other projects priority.

Edit: And yes there are whole model categories both Linux & Win, that haven't received ready tasks recently despite queued tasks in progress. (sure there are ghost WUs as well)
25) Message boards : climateprediction.net Science : Climate change in the News (Message 62131)
Posted 18 Feb 2020 by bernard_ivo
Post:
Isambard 2 at UK Met Office to be largest Arm supercomputer in Europe

The UK Met Office been awarded £4.1m by EPSRC to build Isambard 2, the largest Arm-based supercomputer in Europe. The powerful new £6.5m facility, to be hosted by the Met Office in Exeter and utilized by the universities of Bath, Bristol, Cardiff and Exeter, will double the size of GW4 Isambard, to 21,504 high performance cores and 336 nodes......

Isambard 2 will continue to support their efforts in developing future systems for weather forecasting and climate predictions...

Details here
https://insidehpc.com/2020/02/isambard-2-at-uk-met-office-to-be-largest-arm-supercomputer-in-europe/

Shall we see CPDN for ARM?
26) Message boards : Number crunching : UK Met Office HadAM4 at N216 resolution (Message 62048)
Posted 26 Jan 2020 by bernard_ivo
Post:
Under my i7-4790 I run only 4 N216s, they checkpoint every 38-40 minutes, 30-31 sec/TS (12,5 days to complete). Some WUs reached 39 sec/TS, but at that time was running 6 or 8 cores with WCG along. So for the moment I do not go over 4 real cores. Reading on the other thread even with RYZEN 3600 (6C, 12T) going beyond 4-5 WUs decreases performance a lot. Completion time seems faster though.

On my other machine with i7-3520M I run one N216 and one WCG. The N216 speed was 24-sec/TS and completed in 10 days.
27) Message boards : Number crunching : Scheduler request too recent (Message 61912)
Posted 5 Jan 2020 by bernard_ivo
Post:


Based on what Les said a few posts ago it seems that the Resource Share option may be one of the root causes of the problem. If BOINC sees one project with jobs that have a one week deadline and another project with jobs that have a one year deadline and it also sees the BOINC client has a full queue for both projects it will try to get as many jobs done as it can before the deadlines in either project. In this case it means that CPDN jobs will sit idle so long that they become useless to the researchers before they are finished crunching unless the user does some very careful micromanagement of the Resource Share settings. Micromanagement of a GUI setting in not just counter intuitive, it usually points to a large underlying problem.


I run WCG and CPDN with resource share 12.5% to 75% (and 12.5% for WUprop) and I rarely have idle CPDN WUs. To be sure I often set no new tasks for WCG in order to ensure full CPDN load when the hopper is full. I also sometimes suspend and resume WCG when CPDN is left idle at 98% perhaps because of the long deadline. Yes this is micromanagement, but with shorter CPDN deadlines I may need to do less so.

I doubt I would invest time to learn to launch multiple BOINC instances on my current 6 machines and micromanage them. One per machine should suffice.
28) Message boards : Number crunching : Scheduler request too recent (Message 61907)
Posted 4 Jan 2020 by bernard_ivo
Post:
... it used to be the case that the argument was: "if CPDN reduces deadlines then CPDN grabs all CPUs at the expense of other projects, which is not being a good BOINC citizen".


Is it still the case? Resource share is a viable option to overcome this.
29) Message boards : Number crunching : Scheduler request too recent (Message 61905)
Posted 4 Jan 2020 by bernard_ivo
Post:
We've been discussing deadlines numerous times, I wonder can't we finally get WUs with 2-3 months deadline top? This will accommodate older hardware and will clean up the queue significantly/
30) Message boards : Number crunching : UK Met Office HadAM4 at N144 resolution (Message 61595)
Posted 22 Nov 2019 by bernard_ivo
Post:
Ok, I will experiment on my i7-4790 at 75% or 6 cores.
Currently I have two N216 and four N144 so I would not push it to 100% per cent. I will monitor how sec/TS changes.
On 4 cores only, N144 runs for 3d22h at 13 sec/TS, while N216 is ready in 12 days at 30-31 sec/TS.


So the two N216 run differently as expected
the old one (4real core) still runs at around 30 sec/TS after 3 trickles, might drop for the 3rd
the new one (6HT) runs at 39 sec/TS and will end for 16.4 days (12 on 4 cores)

The four N144 also run differently as expected
the two old ones started at 13 sec/TS now are at 18 so 28% slower
the two new ones are at 20 sec/TS from the start so >35% slower

Not sure whether it is worth running HT
31) Message boards : Number crunching : UK Met Office HadAM4 at N144 resolution (Message 61572)
Posted 18 Nov 2019 by bernard_ivo
Post:
Ok, I will experiment on my i7-4790 at 75% or 6 cores.
Currently I have two N216 and four N144 so I would not push it to 100% per cent. I will monitor how sec/TS changes.
On 4 cores only, N144 runs for 3d22h at 13 sec/TS, while N216 is ready in 12 days at 30-31 sec/TS.
32) Message boards : Number crunching : UK Met Office HadAM4 at N144 resolution (Message 61546)
Posted 16 Nov 2019 by bernard_ivo
Post:
Would using HT with 848 benefit the output or I should keep on using real cores only?
33) Message boards : Number crunching : Lost tasks: Can they be reactivated? (Message 61535)
Posted 14 Nov 2019 by bernard_ivo
Post:
Hi, try to detach the host from the CPDN project and reattach again. Via BOINC manager Remove CPDN and then add it again. This usually works, not sure in your case though. I still have one from 2013 with a deadline 2023 - almost there.
34) Message boards : climateprediction.net Science : Misconfigured Machine? (Message 61532)
Posted 14 Nov 2019 by bernard_ivo
Post:
https://www.cpdn.org/cpdnboinc/results.php?hostid=1395986

https://www.cpdn.org/cpdnboinc/results.php?hostid=1472944

https://www.cpdn.org/cpdnboinc/results.php?hostid=1457047

https://www.cpdn.org/cpdnboinc/results.php?hostid=1493995

Missing 32 bit libraries, crashing everything
35) Message boards : Number crunching : Slow progress rate for HadAM4 at N216 (Message 61409)
Posted 27 Oct 2019 by bernard_ivo
Post:
the HDD write is around 145 GB for 20 hours.

I need to correct this to 14.5 GB (2x7400 MB) for 20 h, which is much better for the HDD.
Checkpoints are at around 2.5h with no UPS it is too long for my taste.
36) Message boards : Number crunching : Slow progress rate for HadAM4 at N216 (Message 61406)
Posted 27 Oct 2019 by bernard_ivo
Post:
I run 4 HadAM4h on my i7-4790 with 16GB RAM.

They all are above 75%, have run >9 days with estimated 3 days remaining.

the HDD write is around 145 GB for 20 hours.
37) Message boards : Number crunching : Credits (Message 61388)
Posted 26 Oct 2019 by bernard_ivo
Post:
And for a month there has been a message in the BOINC Notices tab in the manager. Final reminder could be sent via e-mail if it is GDPR compliant.
38) Message boards : climateprediction.net Science : RCMIP Reduced Complexity Model Intercomparison (Message 61334)
Posted 22 Oct 2019 by bernard_ivo
Post:
RCMIP is about reduced-complexity, simple climate models and emulators. It focusses on testing and comparing their ability to emulate a range of CMIP6 coupled models. This Reduced Complexity Model Intercomparison Project (RCMIP) is hence not one of the standard "MIPs" that are part of the Sixth Coupled Model Intercomparison Project's (CMIP6) (see CMIP6 protocols by the World Climate Research Programme here).
Summary

Assessing how humans change the climate is a complex task, best investigated by complex Earth System Models. However, coupled atmosphere-ocean-biogeochemistry models are computationally expensive. Thus there is a need for emulators that are able to replicate some aggregate response characteristics of Earth System Models at a fraction of the cost. With such emulators, we can investigate uncertainties and simulate hundreds of possible future emission scenarios, rather than only a handful.

These emulators, ranging from one-line climate models to models with tens of thousands lines of code, are only useful if they can emulate more complex model results with a reasonable degree of accuracy. As a result, a systematic way to assess the ability of these emulators to replicate the results of complex climate models is required. This is what RCMIP is about.

RCMIP provides a standard protocol for one-line models, simple and reduced complexity models (henceforth we refer to the whole basket of models as RCMs) to be compared to the latest CMIP6 results. This provides a standardised test of their ability to replicate ESM projections for e.g. surface air temperatures, ocean heat uptake, gas cycles, effective radiative forcing and sea level rise.

Given the ongoing AR6 assessment cycle of the Intergovernmental Panel on Climate Change (IPCC), RCMIP envisages taking place over multiple stages. The first phase will run until mid November 2019, with a second phase planned for 2020. More phases can be added thereafter. Participation in RCMIP is open to all modelling groups who have peer-reviewed scientific papers that document their models or applications. All data submitted to RCMIP will be published on this website and will be published under an open access license (most likely Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)).

To our knowledge, RCMIP constitutes the first systematic intercomparison project among reduced-complexity climate models. The longest tradition of systematic intercomparisons is without doubt those by the coupled model community. Models of intermediate complexity (EMICs) have also performed systematic intercomparisons in the past. Reduced complexity climate models were often compared in the scientific literature, e.g. by van Vuuren et al. Climatic Change 2011. Nonetheless, RCMIP represents the first attempt to standardise this process in a systematic way.

More on https://www.rcmip.org/

EDIT: This might have been better under the Climate change in the news section. But too late I guess
39) Questions and Answers : Unix/Linux : hadcm3s errors (Message 61196)
Posted 7 Oct 2019 by bernard_ivo
Post:
Hi there,

Though my hadcm3s batch 835 unit finished with success I wanted to report this error from the stderr output.
<core_client_version>7.9.3</core_client_version>
<![CDATA[
<stderr_txt>
MainError:	08:33:20 PM	No files match the supplied pattern.
MainError:	08:33:20 PM	No files match the supplied pattern.
................
MainError:	03:46:23 AM	No files match the supplied pattern.
MainError:	11:16:05 AM	No files match the supplied pattern.
MainError:	11:16:05 AM	No files match the supplied pattern.
Processing restart Year 1935 Month 12 Day 1
MainError:	06:45:38 PM	No files match the supplied pattern.
MainError:	06:45:38 PM	No files match the supplied pattern.
21:45:57 (28890): called boinc_finish(0)
</stderr_txt>
]]>
40) Message boards : Number crunching : Upload failures (Message 61162)
Posted 3 Oct 2019 by bernard_ivo
Post:
I spoke slightly too soon, it seems. I am running 4 cam25 models on a Windows 7 system. All the zips are uploading correctly and regularly except for a restart.zip which repeatedly gets stuck at 84.5% with transient HTTP error/project servers may be temporarily down. I've had this now for a couple of days.

Many [edit: ALL] of my CAMs have now cleared. Thanks, all.

I had the same issue with 4 CAMs after the fix of upload server. I had few zips stuck for more than 5 days after that and I cancelled all. Two WUs reported as successful and two errored out with upload failure. These were on their 1st attempt, but were not reissued.


Previous 20 · Next 20

©2021 climateprediction.net