climateprediction.net home page
Posts by bernard_ivo

Posts by bernard_ivo

21) Message boards : climateprediction.net Science : Misconfigured Machine? (Message 62429)
Posted 18 May 2020 by bernard_ivo
Post:
[quote]This is interesting. The 1107 errors are not surprising, but how did he get a valid UK Met Office HadCM3 short v8.34 i686-pc-linux-gnu?
Don't they require the 32-bit libraries too?
https://www.cpdn.org/results.php?hostid=1472944

This one is still crashing around 12 WUs a day, 99.999% (1494 in total)

https://www.cpdn.org/cpdnboinc/results.php?hostid=1499785 New machine 24 WUs in the last two days
https://www.cpdn.org/cpdnboinc/results.php?hostid=1504413 New machine 25 WUs in the last two days
https://www.cpdn.org/cpdnboinc/results.php?hostid=1473091 2018 machine 100% WUs crashed (43 in total)
22) Message boards : Number crunching : Download errors on UK Met Office HadAM4 at N216 resolution v8.52 tasks (Message 62428)
Posted 15 May 2020 by bernard_ivo
Post:
I have a stuck download of batch 867 WU

Fri 15 May 2020 12:38:32 PM EEST | climateprediction.net | [http] HTTP_OP::init_get(): http://download.cpdn.org/download//batch_867/workunits/hadam4h_a17c_209511_4_867_012013459.zip
Fri 15 May 2020 12:38:32 PM EEST | climateprediction.net | Started download of hadam4h_a17c_209511_4_867_012013459.zip
Fri 15 May 2020 12:38:32 PM EEST | climateprediction.net | [http] HTTP_OP::init_get(): http://download.cpdn.org/download//batch_867/ancils/a17c_867_atmos.gz
Fri 15 May 2020 12:38:32 PM EEST | climateprediction.net | Started download of a17c_867_atmos.gz
Fri 15 May 2020 12:38:32 PM EEST | climateprediction.net | [http] [ID#71828] Info: Connection 3859 seems to be dead!
Fri 15 May 2020 12:38:32 PM EEST | climateprediction.net | [http] [ID#71828] Info: Closing connection 3859
Fri 15 May 2020 12:38:32 PM EEST | climateprediction.net | [http] [ID#71828] Info: TLSv1.2 (OUT), TLS alert, Client hello (1):
Fri 15 May 2020 12:38:32 PM EEST | climateprediction.net | [http] [ID#71829] Info: Found bundle for host download.cpdn.org: 0x559f10a67390 [serially]
Fri 15 May 2020 12:38:33 PM EEST | climateprediction.net | [http] [ID#71828] Info: Trying 129.67.193.131...
..........................
Fri 15 May 2020 12:42:34 PM EEST | climateprediction.net | Temporarily failed download of ic_N216_2002_12_000004.nc.gz: transient HTTP error
Fri 15 May 2020 12:42:34 PM EEST | climateprediction.net | Backing off 00:04:42 on download of ic_N216_2002_12_000004.nc.gz
Fri 15 May 2020 12:42:34 PM EEST | climateprediction.net | Temporarily failed download of HAPPI_1.5K_sst_N216_2095-10-01_2096-04-30.gz: transient HTTP error
Fri 15 May 2020 12:42:34 PM EEST | climateprediction.net | Backing off 00:04:19 on download of HAPPI_1.5K_sst_N216_2095-10-01_2096-04-30.gz
23) Message boards : climateprediction.net Science : Misconfigured Machine? (Message 62406)
Posted 7 May 2020 by bernard_ivo
Post:
Can someone make this one a sticky? Is there a way to block all non-compliant PCs. It takes a lot of resources to report misconfigured machines


https://www.cpdn.org/results.php?hostid=1498245
https://www.cpdn.org/results.php?hostid=1493995 this has been reported in November but hasn't been blocked
https://www.cpdn.org/results.php?hostid=1503343 a new fella created 28 April
https://www.cpdn.org/results.php?hostid=1472002

https://www.cpdn.org/cpdnboinc/show_host_detail.php?hostid=1496283
https://www.cpdn.org/cpdnboinc/show_host_detail.php?hostid=1479116
https://www.cpdn.org/cpdnboinc/show_host_detail.php?hostid=1484003
24) Message boards : Number crunching : New Model Type HadAM4 (Message 62397)
Posted 4 May 2020 by bernard_ivo
Post:
I also have few WUs with this error, however they all finished successfully
https://www.cpdn.org/cpdnboinc/result.php?resultid=21927138
https://www.cpdn.org/cpdnboinc/result.php?resultid=21920061
This computer is on heavy usage with other tasks, hence 2 WUs only but I suspect it just can't handle all the load (i7-3520m 8GB RAM)
25) Message boards : Number crunching : Server status page shows different numbers for tasks in progress (Message 62336)
Posted 22 Apr 2020 by bernard_ivo
Post:
That would be great. Would it be possible to remove orphaned WUs also?
These are WUs still "In progress" but no longer at the user's system (even after detach/attach)
26) Message boards : Number crunching : No trickles on webpage (Message 62275)
Posted 1 Apr 2020 by bernard_ivo
Post:
Trickles seem to appeared today. Thanks.
27) Message boards : Number crunching : No trickles on webpage (Message 62270)
Posted 30 Mar 2020 by bernard_ivo
Post:
Hi
It seems there might be a problem with trickles, after 21-22 March
I have at least 3 N216 that do not have their 3&4 trickle on the web despite they finished successfully and upload queues are empty.

here is an example https://www.cpdn.org/cpdnboinc/result.php?resultid=21871312
28) Message boards : Number crunching : New work Discussion (Message 62163)
Posted 27 Feb 2020 by bernard_ivo
Post:
I still believe one way to go is to shorten WU's deadline. There is not so much output of completed windows tasks per 24h compared to tasks in progress. Linux boxes though currently fewer send back higher % tasks than window boxes relative to tasks in progress. This might suggest that even if a user is not hoarding, still tasks may be at rest due to other projects priority.

Edit: And yes there are whole model categories both Linux & Win, that haven't received ready tasks recently despite queued tasks in progress. (sure there are ghost WUs as well)
29) Message boards : climateprediction.net Science : Climate change in the News (Message 62131)
Posted 18 Feb 2020 by bernard_ivo
Post:
Isambard 2 at UK Met Office to be largest Arm supercomputer in Europe

The UK Met Office been awarded £4.1m by EPSRC to build Isambard 2, the largest Arm-based supercomputer in Europe. The powerful new £6.5m facility, to be hosted by the Met Office in Exeter and utilized by the universities of Bath, Bristol, Cardiff and Exeter, will double the size of GW4 Isambard, to 21,504 high performance cores and 336 nodes......

Isambard 2 will continue to support their efforts in developing future systems for weather forecasting and climate predictions...

Details here
https://insidehpc.com/2020/02/isambard-2-at-uk-met-office-to-be-largest-arm-supercomputer-in-europe/

Shall we see CPDN for ARM?
30) Message boards : Number crunching : UK Met Office HadAM4 at N216 resolution (Message 62048)
Posted 26 Jan 2020 by bernard_ivo
Post:
Under my i7-4790 I run only 4 N216s, they checkpoint every 38-40 minutes, 30-31 sec/TS (12,5 days to complete). Some WUs reached 39 sec/TS, but at that time was running 6 or 8 cores with WCG along. So for the moment I do not go over 4 real cores. Reading on the other thread even with RYZEN 3600 (6C, 12T) going beyond 4-5 WUs decreases performance a lot. Completion time seems faster though.

On my other machine with i7-3520M I run one N216 and one WCG. The N216 speed was 24-sec/TS and completed in 10 days.
31) Message boards : Number crunching : Scheduler request too recent (Message 61912)
Posted 5 Jan 2020 by bernard_ivo
Post:


Based on what Les said a few posts ago it seems that the Resource Share option may be one of the root causes of the problem. If BOINC sees one project with jobs that have a one week deadline and another project with jobs that have a one year deadline and it also sees the BOINC client has a full queue for both projects it will try to get as many jobs done as it can before the deadlines in either project. In this case it means that CPDN jobs will sit idle so long that they become useless to the researchers before they are finished crunching unless the user does some very careful micromanagement of the Resource Share settings. Micromanagement of a GUI setting in not just counter intuitive, it usually points to a large underlying problem.


I run WCG and CPDN with resource share 12.5% to 75% (and 12.5% for WUprop) and I rarely have idle CPDN WUs. To be sure I often set no new tasks for WCG in order to ensure full CPDN load when the hopper is full. I also sometimes suspend and resume WCG when CPDN is left idle at 98% perhaps because of the long deadline. Yes this is micromanagement, but with shorter CPDN deadlines I may need to do less so.

I doubt I would invest time to learn to launch multiple BOINC instances on my current 6 machines and micromanage them. One per machine should suffice.
32) Message boards : Number crunching : Scheduler request too recent (Message 61907)
Posted 4 Jan 2020 by bernard_ivo
Post:
... it used to be the case that the argument was: "if CPDN reduces deadlines then CPDN grabs all CPUs at the expense of other projects, which is not being a good BOINC citizen".


Is it still the case? Resource share is a viable option to overcome this.
33) Message boards : Number crunching : Scheduler request too recent (Message 61905)
Posted 4 Jan 2020 by bernard_ivo
Post:
We've been discussing deadlines numerous times, I wonder can't we finally get WUs with 2-3 months deadline top? This will accommodate older hardware and will clean up the queue significantly/
34) Message boards : Number crunching : UK Met Office HadAM4 at N144 resolution (Message 61595)
Posted 22 Nov 2019 by bernard_ivo
Post:
Ok, I will experiment on my i7-4790 at 75% or 6 cores.
Currently I have two N216 and four N144 so I would not push it to 100% per cent. I will monitor how sec/TS changes.
On 4 cores only, N144 runs for 3d22h at 13 sec/TS, while N216 is ready in 12 days at 30-31 sec/TS.


So the two N216 run differently as expected
the old one (4real core) still runs at around 30 sec/TS after 3 trickles, might drop for the 3rd
the new one (6HT) runs at 39 sec/TS and will end for 16.4 days (12 on 4 cores)

The four N144 also run differently as expected
the two old ones started at 13 sec/TS now are at 18 so 28% slower
the two new ones are at 20 sec/TS from the start so >35% slower

Not sure whether it is worth running HT
35) Message boards : Number crunching : UK Met Office HadAM4 at N144 resolution (Message 61572)
Posted 18 Nov 2019 by bernard_ivo
Post:
Ok, I will experiment on my i7-4790 at 75% or 6 cores.
Currently I have two N216 and four N144 so I would not push it to 100% per cent. I will monitor how sec/TS changes.
On 4 cores only, N144 runs for 3d22h at 13 sec/TS, while N216 is ready in 12 days at 30-31 sec/TS.
36) Message boards : Number crunching : UK Met Office HadAM4 at N144 resolution (Message 61546)
Posted 16 Nov 2019 by bernard_ivo
Post:
Would using HT with 848 benefit the output or I should keep on using real cores only?
37) Message boards : Number crunching : Lost tasks: Can they be reactivated? (Message 61535)
Posted 14 Nov 2019 by bernard_ivo
Post:
Hi, try to detach the host from the CPDN project and reattach again. Via BOINC manager Remove CPDN and then add it again. This usually works, not sure in your case though. I still have one from 2013 with a deadline 2023 - almost there.
38) Message boards : climateprediction.net Science : Misconfigured Machine? (Message 61532)
Posted 14 Nov 2019 by bernard_ivo
Post:
https://www.cpdn.org/cpdnboinc/results.php?hostid=1395986

https://www.cpdn.org/cpdnboinc/results.php?hostid=1472944

https://www.cpdn.org/cpdnboinc/results.php?hostid=1457047

https://www.cpdn.org/cpdnboinc/results.php?hostid=1493995

Missing 32 bit libraries, crashing everything
39) Message boards : Number crunching : Slow progress rate for HadAM4 at N216 (Message 61409)
Posted 27 Oct 2019 by bernard_ivo
Post:
the HDD write is around 145 GB for 20 hours.

I need to correct this to 14.5 GB (2x7400 MB) for 20 h, which is much better for the HDD.
Checkpoints are at around 2.5h with no UPS it is too long for my taste.
40) Message boards : Number crunching : Slow progress rate for HadAM4 at N216 (Message 61406)
Posted 27 Oct 2019 by bernard_ivo
Post:
I run 4 HadAM4h on my i7-4790 with 16GB RAM.

They all are above 75%, have run >9 days with estimated 3 days remaining.

the HDD write is around 145 GB for 20 hours.


Previous 20 · Next 20

©2022 climateprediction.net