climateprediction.net home page
Posts by bernard_ivo

Posts by bernard_ivo

1) Message boards : Number crunching : Dr Lisa Su says up to 192MB L3 on newer Ryzen -- hope it's true (Message 65010)
Posted 26 Jan 2022 by bernard_ivo
Post:
In my case, simply installing
linux-tools-common
linux-tools-generic
which should link to the latest kernel tools did not work
using perf pointed to possible missing tool libraries, and looking at my current kernel number and available packages
I went to add
linux-tools-generic-hwe-20.04 which points to the latest kernel

Then ran perf as superuser and it showed this for my Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz

Performance counter stats for 'system wide':

    12,511,363,439      cache-references                                            
     6,135,922,943      cache-misses              #   49.043 % of all cache refs    

      73.725181985 seconds time elapsed


I run 1/2 of the cores = 4 CPDN WUs, RAM is 16Gb
2) Message boards : climateprediction.net Science : Misconfigured Machine? (Message 64971)
Posted 15 Jan 2022 by bernard_ivo
Post:
This one https://www.cpdn.org/results.php?hostid=1510055 has crashed all ~12k WUs and is continuing to do so.

This one https://www.cpdn.org/results.php?hostid=829775 has been crashing all since 2020 had 67 valid before that

And this one https://www.cpdn.org/results.php?hostid=1517479 has crashed all ~ 10k WUs and is continuing to do so.

Can this reporting be automated somehow? The level of micromanagement CPDN requires and the reluctance of staff to adjust some basic things is becoming daunting.
3) Message boards : Number crunching : Lost tasks (Message 64969)
Posted 14 Jan 2022 by bernard_ivo
Post:
I just got a WU from batch 891. On its first attempt it spend a year with No response (jan 2021-jan 2022). 5 attempts are allocated to this batch. One could expect 5 years in vain in worst case scenario On its second attempt it errored in seconds (I guess missing 64bit libraries). On its 3rd attempt on of my machines got it and it's been computing fine. I'm not waiting someone to tell me that this batch is closed and I should abort. While climate change is accelerating keeping the one year deadline period is a kind of climate change denial. I mean the community here has been asking for shortened period for ages already. I mean how hard that is to be changed and why are we ignored even on the most common sense suggestions?

I even have a ghost task in progress from Jan 2014 with a deadline in Jul 2023 - so I'm close.
4) Message boards : Number crunching : Tasks by application = hoarding (Message 64465)
Posted 14 Sep 2021 by bernard_ivo
Post:
Is the impression of hoarding created by the very high number of tasks shown as in progress for applications that have not issued tasks for some time and where the active users shows zero?

This, surely, is an historical issue of failed tasks that have not been crossed off the list of tasks outstanding.

Would it be possible to synchronise the number shown as outstanding which the number of tasks that are still being processed?


Yes, there are ghost tasks. I have two out of 8 WUs in progress. One of the ghosts was issued in 2014 and its deadline is 2023. So yeah I run it for 7 years. Several times there have been requests to clean up the ghosts. Not much result. Yes detach, reattach from the project sometimes work, but not always.

And yes a shorter deadline circa 4-6 months is completely reasonable to accommodate older machines who run other projects as well.

Reissuing tasks might be useful for researches but I've crunched numerous times batches that were no longer of interest to anyone. Yeah my machines saved the last 3rd or 5th attempt of the WU after few years idling on someone's computer. Old batches are not always pulled out.

Sometimes I had to manually abort WUs no to waste resources on WUs of no interest. Shorter deadline could fix that as well, but hey it seems too much to ask every time this pops up.
5) Message boards : Number crunching : Completed tasks not showing on server (Message 64082)
Posted 25 Jun 2021 by bernard_ivo
Post:
Hi folks,

I have this one successfully finished but still In progress on the web https://www.cpdn.org/result.php?resultid=22089752
6) Message boards : Number crunching : batches closed (Message 63588)
Posted 2 Mar 2021 by bernard_ivo
Post:
Hi,
I just got one from batch 837 from 2019 - hadcm3s_qu49_190012_240_837 on its 2nd attempt. Its 1st attend status is - Didn't need. Should I crunch it? It is on a new machine attached to the project, so I might as well monitor whether it behaves well, but still if the WU is not needed, then should move to a more recent batch.
7) Questions and Answers : Unix/Linux : Missing options on current versions of BOINC (Message 63447)
Posted 1 Feb 2021 by bernard_ivo
Post:
Have you ticked, "Stop running tasks when exiting the BOINC Manager?" If not the client keeps going in the background. If you don't have these options when exiting Manger, go to options>other options and enable Manager exit dialogue and client shutdown dialogue.


No intention to divert the thread topic, but under Ubuntu 20.04 and BOINC 7.16.6 this function does not work (I have two machines like that) No matter if I select the option, the Manager always closes without the dialogue window mentioned.

And thanks to Andy, will keep posting mis-configured machines.
8) Message boards : climateprediction.net Science : Misconfigured Machine? (Message 63406)
Posted 24 Jan 2021 by bernard_ivo
Post:
These two are crashing 100% of WUs
https://www.cpdn.org/results.php?hostid=1484543
https://www.cpdn.org/results.php?hostid=1497751

Does it make sense to report machines or I just waste my time?
9) Questions and Answers : Unix/Linux : computation error at 100% complete (Message 62856)
Posted 5 Nov 2020 by bernard_ivo
Post:
I changed the values following the process suggested and restatred the client. So far all 4 WUs are running ok, 1 zip uploaded. I have one more task on another machine, but since I have relatively high speed internet I will risk with this one.
10) Questions and Answers : Unix/Linux : computation error at 100% complete (Message 62848)
Posted 4 Nov 2020 by bernard_ivo
Post:
Ok,
I have 4 WUs that progressed between 20 and 75 %. So some zips have uploaded. Should I change all instances of
<max_nbytes>150000000.000000</max_nbytes> for yeach WU or only to the remaining zips?

There are also other files have <max_nbytes>0.000000</max_nbytes> so I guess I should not alter these?
11) Message boards : Number crunching : New work Discussion (Message 62846)
Posted 4 Nov 2020 by bernard_ivo
Post:
#877 and #878 have had all tasks waiting to go out withdrawn as many produce uploads greater than the limit allowed causing some to fail at 100% completion. They will be re-issued shortly.

Edit: Those already sent out will be left to run. Credit will be granted.


I have few of these, should I let them finish or should I abort?
12) Message boards : Number crunching : New work Discussion (Message 62792)
Posted 25 Oct 2020 by bernard_ivo
Post:
It would be great if someone could post system requirements for the OpenIFS once more WUs are tested. It seems my i7-4790 with 16Gb and 21 Gb var space and 5.6 Gb swap may not be able to handle more than two WUs at once
13) Message boards : Number crunching : Welcome back/checking if everything is working? (Message 62773)
Posted 9 Oct 2020 by bernard_ivo
Post:
There's quite a few batches of those, with only a small number left in each one.
I had a look at a few; some are "stuck", but some have just started running, and are returning trickles.

I'll see what the project thinks about wiping everything.


It would be great if some clean up happens. I have one orphaned Full Resolution Ocean since 2014 in my "In progress" web tab and set to expire in 2023. I'm almost there.
14) Message boards : Number crunching : UK Met Office HadAM4 at N216 resolution (Message 62772)
Posted 9 Oct 2020 by bernard_ivo
Post:
I also got 5 of the new ones. So with 6 N216 my /var climbed to ~ 16 GB. With 4 WCG ARP in the queue I almost ran out of space on /var ~20GB and BOINC manager crashed. I needed to clean some journals. Luckily no CPDN models crashed due to the low disk issue. With reducing work to real cores and cleaning ARPs will get things back to normal.
15) Message boards : Number crunching : UK Met Office HadAM4 at N216 resolution (Message 62760)
Posted 7 Oct 2020 by bernard_ivo
Post:
And I have just picked up one from #843 as well. (On its fifth and final attempt.

I also got another one but from #842. On its second attempt after a whole year with no response. I still think deadlines should be shortened.
16) Message boards : Number crunching : UK Met Office HadAM4 at N216 resolution (Message 62743)
Posted 3 Oct 2020 by bernard_ivo
Post:
There's been quite a few fails, and several hundred still running, (possibly not for the first time), so if you put your foot down and go for it, you're in with a chance. :)

Good then, I will let it run. Thanks.
17) Message boards : Number crunching : UK Met Office HadAM4 at N216 resolution (Message 62741)
Posted 1 Oct 2020 by bernard_ivo
Post:
Yey I got one from batch 843. The task timed out after one year no response. I wonder if it is of any use except for upping my points.
18) Message boards : Number crunching : Updated BOINC Clients 7.16.11 - Windows 64-bit and Mac OS X (64-bit Intel) (Message 62723)
Posted 16 Sep 2020 by bernard_ivo
Post:
While waiting for WUs isn't it possible that some improvements are made?
Shorter deadlines, being able to select tasks, check for 32 bit libs for Linux, clean up ghost WUs, better communication......
19) Message boards : Number crunching : Big models (Message 62657)
Posted 11 Aug 2020 by bernard_ivo
Post:
Well, it looks like no one is against bigger uploads, so the researchers can go ahead with the current model.


What would be the checkpoint interval? I can't recall well, but checkpoint on my i7-4790 was 40-60 mins. Any considerations to reduce it a bit?
20) Message boards : climateprediction.net Science : Misconfigured Machine? (Message 62429)
Posted 18 May 2020 by bernard_ivo
Post:
[quote]This is interesting. The 1107 errors are not surprising, but how did he get a valid UK Met Office HadCM3 short v8.34 i686-pc-linux-gnu?
Don't they require the 32-bit libraries too?
https://www.cpdn.org/results.php?hostid=1472944

This one is still crashing around 12 WUs a day, 99.999% (1494 in total)

https://www.cpdn.org/cpdnboinc/results.php?hostid=1499785 New machine 24 WUs in the last two days
https://www.cpdn.org/cpdnboinc/results.php?hostid=1504413 New machine 25 WUs in the last two days
https://www.cpdn.org/cpdnboinc/results.php?hostid=1473091 2018 machine 100% WUs crashed (43 in total)


Next 20

©2022 climateprediction.net