climateprediction.net home page
Posts by Alan K

Posts by Alan K

21) Message boards : Number crunching : OpenIFS Discussion (Message 68251)
Posted 10 Feb 2023 by Profile Alan K
Post:
"Initially the BOINC estimated run time is off likely due to the new app version that BOINC has no data for yet."

For the six that I have from batch 990 the estimated run time is 2days 23hrs compared to 16hrs (ish) for the previous batches.

Edit: Actually running at 5.04% per hour. First one 73% complete after 14 hrs, remaining estimated at 19hrs so adjusting as it goes.
22) Message boards : Number crunching : no credit awarded? (Message 68229)
Posted 8 Feb 2023 by Profile Alan K
Post:
I found I was getting lots of suspend requests when CPU use was set to less than 100%. Now using 100% and no suspends.
23) Message boards : Number crunching : The uploads are stuck (Message 68058)
Posted 26 Jan 2023 by Profile Alan K
Post:
All my waiting uploads cleared overnight. Have resumed computing.
24) Message boards : Number crunching : OpenIFS Discussion (Message 67922)
Posted 20 Jan 2023 by Profile Alan K
Post:
Not sure if this is the right place for this but I have had a task fail with a compute error after the last zip file (122) was written. Stderr message is:

<![CDATA[
<message>
Process still present 5 min after writing finish file; aborting</message>
<stderr_txt>
irectory: /var/lib/boinc-client/slots/1/ICMSHhq0f+002316

WU 12189428 task 22274970.
25) Message boards : Number crunching : The uploads are stuck (Message 67876)
Posted 18 Jan 2023 by Profile Alan K
Post:
"If you don't install the 32-bit libraries your tasks will eventually keep crashing, and hence your device will get jailed. "

Unfortunately I don't think that is the case - though I stand to be corrected.
26) Message boards : Number crunching : OpenIFS Discussion (Message 67106)
Posted 28 Dec 2022 by Profile Alan K
Post:
Copied from "uploads stuck"

From Andy

Hi Dave,

Thanks. I have looked at this. This machine keeps losing it's SSH port and HTTP port. I reset it and it keeps losing it again. I am going to have a look at this again tomorrow further.

Best wishes,

Andy

and

Update to this: I have made a request to the JASMIN cloud service where this machine resides to look into this.
27) Message boards : Number crunching : OpenIFS Discussion (Message 67029)
Posted 24 Dec 2022 by Profile Alan K
Post:
Getting transient HTTP message:

Sat 24 Dec 2022 17:54:15 GMT | climateprediction.net | Temporarily failed upload of oifs_43r3_ps_0873_1981050100_123_950_12167517_0_r1054526626_61.zip: transient HTTP error
Sat 24 Dec 2022 17:54:15 GMT | climateprediction.net | Backing off 00:02:50 on upload of oifs_43r3_ps_0873_1981050100_123_950_12167517_0_r1054526626_61.zip
Sat 24 Dec 2022 17:54:15 GMT | climateprediction.net | Started upload of oifs_43r3_ps_0873_1981050100_123_950_12167517_0_r1054526626_62.zip
Sat 24 Dec 2022 17:54:16 GMT | | Internet access OK - project servers may be temporarily down.


Guess that's it for the next few days because of the Hols.

Network activity suspended and tasks reduced to 1 until I hear otherwise.

Happy Xmas everyone.
28) Message boards : Number crunching : OpenIFS Discussion (Message 67019)
Posted 23 Dec 2022 by Profile Alan K
Post:
I hadn't seen those task states before. The full list from source code (in boinc_db.py) is

PROCESS_UNINITIALIZED = 0
PROCESS_EXECUTING = 1
PROCESS_SUSPENDED = 9
PROCESS_ABORT_PENDING = 5
PROCESS_QUIT_PENDING = 8
PROCESS_COPY_PENDING = 10
PROCESS_EXITED = 2
PROCESS_WAS_SIGNALED = 3
PROCESS_EXIT_UNKNOWN = 4
PROCESS_ABORTED = 6
PROCESS_COULDNT_START = 7
(don't ask me why they're not in numerical order)

But it looks as if the same task - _ps_0852_ - has exited, and then restarted. But it's jumped from upload file 63 to 77. Have you been having communications problems - did upload 63 fail earlier, and been retried here?


Unfortunately I don't know. The computer has been "misbehaving" in that it unexpectedly freezes so I have to do a hard restart, and this is more frequent recently, however the tasks seem to restart OK. I had to do a restart earlier this evening so lost the event log and I'm not sure whether the details will be elsewhere. Not sure if I am pushing the RAM to its limit (24Gb) with running 3 tasks at once (75% core usage). I'll go back to 2 cores.
29) Message boards : Number crunching : OpenIFS Discussion (Message 67015)
Posted 23 Dec 2022 by Profile Alan K
Post:
Interesting segment from my event log this morning

Fri 23 Dec 2022 08:32:17 GMT | climateprediction.net | Started upload of oifs_43r3_ps_0852_1981050100_123_950_12167496_0_r1643238008_63.zip
Fri 23 Dec 2022 08:33:05 GMT | climateprediction.net | Finished upload of oifs_43r3_ps_0852_1981050100_123_950_12167496_0_r1643238008_63.zip
Fri 23 Dec 2022 08:33:15 GMT | climateprediction.net | [task] task_state=QUIT_PENDING for oifs_43r3_ps_0852_1981050100_123_950_12167496_0 from request_exit()
Fri 23 Dec 2022 08:33:15 GMT | | request_exit(): PID 5839 has 1 descendants
Fri 23 Dec 2022 08:33:15 GMT | | PID 5842
Fri 23 Dec 2022 08:34:15 GMT | climateprediction.net | [task] Process for oifs_43r3_ps_0852_1981050100_123_950_12167496_0 exited, status 256, task state 8
Fri 23 Dec 2022 08:34:15 GMT | climateprediction.net | [task] task_state=UNINITIALIZED for oifs_43r3_ps_0852_1981050100_123_950_12167496_0 from handle_exited_app
Fri 23 Dec 2022 08:34:15 GMT | climateprediction.net | [task] ACTIVE_TASK::start(): forked process: pid 7134
Fri 23 Dec 2022 08:34:15 GMT | climateprediction.net | [task] task_state=EXECUTING for oifs_43r3_ps_0852_1981050100_123_950_12167496_0 from start
Fri 23 Dec 2022 08:35:55 GMT | climateprediction.net | Started upload of oifs_43r3_ps_0872_1981050100_123_950_12167516_0_r1281185810_77.zip
Fri 23 Dec 2022 08:36:06 GMT | climateprediction.net | Finished upload of oifs_43r3_ps_0872_1981050100_123_950_12167516_0_r1281185810_77.zip

Running 3 tasks, but a fourth apparently started.
Any comments?
30) Message boards : Number crunching : OpenIFS Discussion (Message 66814)
Posted 7 Dec 2022 by Profile Alan K
Post:
I limit CPUs on the ifs models anyway as I don't have enough RAM (24Gb for a 4 core CPU).
31) Message boards : Number crunching : OpenIFS Discussion (Message 66813)
Posted 7 Dec 2022 by Profile Alan K
Post:
Suspected tempeature problems. I need to clean the case fan inlets and the CPU heatsink. One of the joys of having semi-long haired cats!
32) Message boards : Number crunching : OpenIFS Discussion (Message 66804)
Posted 7 Dec 2022 by Profile Alan K
Post:
Typo. Should have been "by setting CPU usage to 100%" rather than 90%.
33) Message boards : Number crunching : OpenIFS Discussion (Message 66800)
Posted 6 Dec 2022 by Profile Alan K
Post:
For me a rather alarming 10 completed out of 22. Two of the failures I can put down to a forced reboot when everything "froze" and some of the early ones to using 4cpu's rather than 3 for the amount of RAM I have. Some were also -ve theta fails. I have eliminated the repeated suspends I was getting but setting 100% on CPU usage.
34) Message boards : Number crunching : OpenIFS Discussion (Message 66626)
Posted 29 Nov 2022 by Profile Alan K
Post:
Got 3. Posting zips about every 10 mins. Estimated completion about 17hours. (3.5GHz i5, 24Gb ram).


Got another 4 and set CPU to 100% (i.e. 4 cores). Getting message that one task is running or waiting for memory as expected.
35) Message boards : Number crunching : OpenIFS Discussion (Message 66614)
Posted 28 Nov 2022 by Profile Alan K
Post:
Got 3. Posting zips about every 10 mins. Estimated completion about 17hours. (3.5GHz i5, 24Gb ram).
36) Message boards : Number crunching : New work discussion - 2 (Message 66585)
Posted 25 Nov 2022 by Profile Alan K
Post:
Showing as in progress
37) Message boards : Number crunching : New work discussion - 2 (Message 66542)
Posted 22 Nov 2022 by Profile Alan K
Post:
Just make it work in Windows please! You'd get so many more hosts....
With all the Linux hosts that are missing the libraries for the Met office models, there will be a lot more Linux hosts to run the tasks hopefully successfully than with previous tasks.
It amazes me there wasn't an automated way of telling the users those libraries should be installed.

Unfortunately they are the "set it and forget it" brigade who don't look at why they are getting hundreds (or thousands) of "computation error" messages on their tasks. Also I wouldn't mind betting that they don't look at these message boards either.
38) Questions and Answers : Unix/Linux : *** Running 32bit CPDN from 64bit Linux - Discussion *** (Message 66387)
Posted 13 Nov 2022 by Profile Alan K
Post:
I'll just try it an see if it works then.


Apparently did work. However just switched my machine back on in the hope of getting some of the latest batch (after energy saving pause). When tried to start BOINC it absolutely refused to co-operate (Ubuntu 22.04 and BOINC version 7.18.1). Have reverted to 20.04 but got BOINC version 7.16.1 installed via the apt-get method. Fingers crossed for some work.
39) Questions and Answers : Unix/Linux : *** Running 32bit CPDN from 64bit Linux - Discussion *** (Message 66326)
Posted 9 Nov 2022 by Profile Alan K
Post:
Try running "apt update" before the command for to install BOINC.
40) Message boards : Number crunching : New work discussion - 2 (Message 66262)
Posted 28 Oct 2022 by Profile Alan K
Post:
"BOINC starts multiple OpenIFS tasks because there are free CPU slots, even though the total memory for the tasks exceeds what's available. "

Can this be overcome by limiting the number of cores available to BOINC before downloading any of the IFS models? Allthough I have a four core CPU the box only has 24Gb of RAM.


Previous 20 · Next 20

©2024 climateprediction.net