climateprediction.net home page
Posts by klepel

Posts by klepel

1) Message boards : Number crunching : New work discussion - 2 (Message 69552)
Posted 31 Aug 2023 by klepel
Post:
I still think, I will buy this 128 GB of RAM, RAM speed did not matter, wasn't it?

Then I will be able to equip a third computer with 64 GB RAM (4*16 GB) - a virtual box…

I remember the new OpenIFS shall be multithreaded.. not single threaded…
2) Message boards : Number crunching : New work discussion - 2 (Message 69541)
Posted 30 Aug 2023 by klepel
Post:
Glen
I would like to ask you, do you have any idea how many of the new OpenIFS work(-unites) will have to be processed and how long they might last based your previous experience with the “test-batch” (I can´t remember, was it last year or this year?)?

I’m asking as I bought 64 GB of RAM for climateprediction – do not really need for my daily tasks – the last time and I got a little bit upset as this OpenIFS units lasted only a few months and then you made the comment, the big OpenIFS unites might never make it to BOINC at all.

Now I am looking forward to the new badge of OpenIFS units announced for October and I am wondering if I shall buy 128 GB of additional RAM for another computer in preparation/advance, however I really would like to avoid the same experience from last time, that this investment is only worth for a few months. Especially as I would be able to buy for the similar amount a graphic card for GPUGRID.

Thanks a lot for your comments
klepel
3) Message boards : Number crunching : New work discussion - 2 (Message 68654)
Posted 15 Apr 2023 by klepel
Post:
I need your help!
I updated Lubuntu to version 22.04 the day before yesterday.
As I have already read in the forums, that boinc won’t work anymore, however I hoped it would.
I know that Gianfranco´s ppa is the solution (Boinc version: 7.20.5, my old version is 7.14.2), that it runs again. However I would like to install boincmgr, boinccmd, boinc in my old folder under// home/…/boinc and not in the standard folder – where ever it is.
So my question: How do I down-load the ppa without installing it in the default folder or where can I find the program files of new versions after installing as ppa? So I am able to transfer those to the desired location.
Thanks a lot!

Solved: Found the programms, transfered these and it works somehow.
4) Message boards : Number crunching : New work discussion - 2 (Message 68653)
Posted 15 Apr 2023 by klepel
Post:
I need your help!
I updated Lubuntu to version 22.04 the day before yesterday.
As I have already read in the forums, that boinc won’t work anymore, however I hoped it would.
I know that Gianfranco´s ppa is the solution (Boinc version: 7.20.5, my old version is 7.14.2), that it runs again. However I would like to install boincmgr, boinccmd, boinc in my old folder under// home/…/boinc and not in the standard folder – where ever it is.
So my question: How do I down-load the ppa without installing it in the default folder or where can I find the program files of new versions after installing as ppa? So I am able to transfer those to the desired location.
Thanks a lot!
5) Message boards : Number crunching : OpenIFS Discussion (Message 66851)
Posted 10 Dec 2022 by klepel
Post:
Glenn will you release the oifs_43r3_bl and oifs_43r3_ps apps in parallel or in sequence? As I am bandwidth limited I can only run max 3 WUs in parallel on all 3 computers assigned to climateprediction.net. My app_config.xml is configured as follows:
<app_config>
 <project_max_concurrent>4</project_max_concurrent>  
[…………………………………….]
   <app>
      <name>oifs_43r3</name>
      <max_concurrent>2</max_concurrent>
      <report_results_immediately/>
   </app>
   <app>
      <name>oifs_43r3_bl</name>
      <max_concurrent>2</max_concurrent>
      <report_results_immediately/>
   </app>
   <app>
      <name>oifs_43r3_ps</name>
      <max_concurrent>2</max_concurrent>
      <report_results_immediately/>
   </app>
</app_config>

So there might run total of 4 WUs of oifs_43r3_bl and oifs_43r3_ps in parallel: I was hesitating to limite project_max_concurrent further as there might appear some HadSM4 WUs (they do not have the bandwidth problem) and I happily crunch them in parallel the oIFS or I might forget to increase it after the oIFS disappear.
For the BOINC specialists, if I set one of the two apps in app_config.xml to 0 (zero), as an example:
   <app>
      <name>oifs_43r3_bl</name>
      <max_concurrent>0</max_concurrent>
      <report_results_immediately/>
   </app>
   <app>
      <name>oifs_43r3_ps</name>
      <max_concurrent>2</max_concurrent>
      <report_results_immediately/>
   </app>
WUs from this app name won’t be downloaded to the computer, means limit the climateprediction.net WUs on a particular computer further?
Thanks!
6) Message boards : Number crunching : OpenIFS Discussion (Message 66743)
Posted 3 Dec 2022 by klepel
Post:
About OpenIFS failure modes:
The one host with errors was the only one on which I suspended all tasks to disk, rebooted the host, and resumed the tasks. I strongly believe that all of these 54 tasks went through this suspend–resume.[…]
The host with errors has reported only successful tasks for a while now, which is another hint that the error episode was just the aftermath of the suspend-resume cycle.
I think I had the same error. The computer was shut-down and restarted. Although BOINC View reported progress, no Tickle / Upload file has been created.
https://www.cpdn.org/result.php?resultid=22248984 as you can see there is a difference between CPU-time and execution-time.

Then there was an WU with Code 9 error https://www.cpdn.org/result.php?resultid=22248970:
9 (0x00000009) Unknown error code

Hope this helps.
7) Message boards : Number crunching : OpenIFS Discussion (Message 66701)
Posted 1 Dec 2022 by klepel
Post:
Update.
After meeting yesterday with CPDN, the disk and memory requirements for these tasks need revising: memory requirement up & disk down. What was not taken into account when setting the memory was the additional amount required by the wrapper code & all the boinc functions it uses (such as zipping). Hopefully this will eliminate some of the memory errors.
The plan is to put out a repeat of the first batch with corrected limits to check how it performs before sending out the rest of this experiment.
Sure this will help!
On trickles, agree these longer (3 month) runs are producing too many trickle files which I'll adjust. However, I looked at the output filesize per output instance and it's reasonable and at the lower limit of what the scientist needs. I am reluctant to change it.
Understood! Hope less tickles might help for smoother uploads.
Question for ADSL people: knowing your bottleneck is network, are you happy just reducing the no. of tasks running concurrently? What's your sustainable data-flow rate you would be happy with (give me a number to work with).
I do not have any problems to reduce the number of tasks running on my computers to fit into my ADSL bandwidth. I have to remind myself, I offer the scientist a certain amount of compute power, but they have to accept the offer – there are a lot of other worthy BOINC projects! (Hopefully I will remind myself of it, when I will go out shipping computer parts for climatepretiction.net I do not need for my personal daily computer requirements!) However, I am still concerned, how many climateprediction.net participants are reading the Forums and how many users are out there, who have installed BOINC and attached to climateprediction.net, but never check their machines. You might end up, with a lot of OpenIFS results piling up on computers with slow internet connections, wasting energy and resources and never help science. I will send you a PM with my ADSL speed, so you have a number of WUs, I am likely to contribute each day. It is not much!
8) Message boards : Number crunching : OpenIFS Discussion (Message 66667)
Posted 30 Nov 2022 by klepel
Post:
As I said in my earlier message 66661, which klepel only partially quotes, disk limits can be checked either on the server before the task is issued, or on the client before the task is run. Different checks may be applied at either stage: we need to consider them as separate problems.
Richard, you know better than me how BOINC works. My point is: It seems to me, that the model OpenIFS indicates BOINC, it needs more space than it actually needs. And as several pointed out, causes the problem that the assigned disk-space to BOINC has to be quite large. This is the case with WSL2 and Linux Computers as well. Some of my Linux installations are dual boot on small SSDs so there is no disc-space for 40 GB for BOINC alone (Just checked: One of my Linux computers does not download OpenIFS because it is 7 GB short). As I understand Glenn, he has a lot of WUs to run, so I try to unlock more computers for climateprediction.net.
9) Message boards : Number crunching : OpenIFS Discussion (Message 66663)
Posted 30 Nov 2022 by klepel
Post:
Either or both may be related to the XML specification for the workunit, which contains:
<rsc_disk_bound> 40,000,000,000
My calculator makes that 37.25 GB, using 1024x steps.
Glenn, this is the problem, the model asks for too much disk space before it even starts to download a OpenIFS Wu. It is not, as you have written in the other thread, that the model has already downloaded OpenIFS Wus and after several crashes has run out of allocated disk space in WSL2. BOINC simple sees there is not sufficient space allowed on the hard-disk on a particular computer and therefor refuses starting to download a WU.
I have already checked if my WSL2 disk is full with crashed models and there is none! As far as I understand the researcher has to reduce this to the size it is really needed:
<rsc_disk_bound> 40,000,000,000
It might be just one decimal less.

In the meanwhile another model crashed. As there has been a “traffic jam” with another BOINC project (SIDOCK produces small WUs with a huge data-output, and releases these once a day) climateprediction.net tickles have not been uploaded in time. OpenIFS reports no tickles – they have been waiting in line…

I have another question: Once the multicore models start to be released, the tickles will be even larger and appear in a higher frequency or is there no direct link between multithreaded and tickle size/frequency?
10) Message boards : Number crunching : OpenIFS Discussion (Message 66656)
Posted 30 Nov 2022 by klepel
Post:
I bought 64 GB RAM for one of my Linux Computers, so I would able to process in parallel more of the new OpenIFS WUs. Has there be even mentioned, that each WU will generate about 1 GB of tickles in the previous conversation in preparation of this OpenIFS WUs?

ADSL:
ADSL in the US can be very, very badly asymmetric. Not that many years back, I had 25Mbit down, 768kbit up (yes, not even 1Mbit up).
Same here (2Mbit down, 768kbit up)! I am not able to change to something faster as I work with it! I have 3 working LINUX computers behind this ADSL line and intended to start 2 more. But this will not happen, I am not even able to upload tickles of 2 parallel running WUs.
That is, I can only put a fraction of my CPUs to OpenIFS in steady state due to my upload bandwidth limitation.
This wraps it up nicely! This data volume is not sustainable! Now it is not 32bit-libs, now it is bandwidth!
Error: " climateprediction.net: Notice from server
OpenIFS 43r3 Perturbed Surface needs 21968.66MB more disk space. You currently have 16178.31 MB available and it needs 38146.97 MB.
Tue 29 Nov 2022 09:56:52 AM CET"
38 Gb sounds alot, normal?
Same problem with my WSL2 installation – this computer would not have a pronounced bandwidth problem, but does not download any tasks as OpenIFS asks constantly for 38146.97 MB! climateprediction.net project-folder on the Linux Computer needs only 8.1 GB!

I think less tickles with around 100MB each would be easier for BOINC to handle. But there are more knowledgable participants in the forum. And yes the overall size should be reduced to something more manageable as bandwidth limiting the throughput of WUs over all participants.
OpenIFS Errors:
https://www.cpdn.org/result.php?resultid=22245289
https://www.cpdn.org/result.php?resultid=22245630
After adjusting app_config from max 4 WUs to 1 WU and afterwards to 2 WUs.
11) Message boards : Number crunching : New work discussion - 2 (Message 66348)
Posted 11 Nov 2022 by klepel
Post:
On the verge of the release of the OpenIFS tasks, would you mind to give the exact app names, we have to use in the app_config files to restrict the number of concurrent WUs to a X<CPU- cores:
app_config>
   <app>
      <name>OpenIFS</name>
      <max_concurrent>1</max_concurrent>
      <report_results_immediately/>
   </app>
</app_config>

Would this be sufficient for all the sub-projects of OpenIFS as well? Please confirm.
Regards,
klepel
12) Message boards : Number crunching : New work Discussion (Message 65782)
Posted 7 Aug 2022 by klepel
Post:

Hold them back, i still have more than enough work downloaded ^^

https://www.cpdn.org/results.php?hostid=1521318&offset=0&show_names=0&state=1&appid=

Some day i looked into the VM, and it was more than full :)

Greets
Felix

Dear Felix,
I know, I shouldn’t write anything, but you have 227 tasks in process:
73 N114 tasks and 154 N216 tasks (quick calculation).
Your computer finishes the tasks:
N114 in about 300000 [s]: 300000 [s]*73=21900000 [s]
and N216 in about 950000 [s]: 950000 [s]*154=146300000 [s]
Your VM computer has 4 processors, therefore your VM will finish all tasks in about 487 days, well above the 365 days deadline: About 122 days afterwards or 33% of your tasks will not finish in time.
Do you mind to release about 33% of your tasks? Preferable N216 tasks – just kill them! So other computers (idle on CPDN tasks) might work on them and the batches are finished in a useful time for the researcher!
Thanks a lot,
klepel
13) Message boards : Number crunching : No work for Windows OR Linux?! (Message 65359)
Posted 13 Apr 2022 by klepel
Post:
"I recently figured out that you can view and manage both the Windows and WSL2 clients from the same BOINC Manager (in Windows). That's been really helpful."

I am interested in this part! How to do it? Please step by step.
14) Questions and Answers : Unix/Linux : Run Linux work units with Windows 10 WSL (Message 65142)
Posted 11 Feb 2022 by klepel
Post:
Windows Up-Date killed just the last two hadam4h on this computer https://www.cpdn.org/results.php?hostid=1517859. Approx. 8 days before finishing it. Sorry for that!
15) Message boards : Number crunching : New work Discussion (Message 64896)
Posted 5 Jan 2022 by klepel
Post:
Am I the only one with problems with the new short tasks (UK Met Office HadCM3 short v8.36)?

It seems all error with:
<core_client_version>7.16.6</core_client_version>
<![CDATA[
<message>
process exited with code 22 (0x16, -234)</message>
<stderr_txt>
Suspended CPDN Monitor - Suspend request from BOINC...
forrtl: severe (17): syntax error in NAMELIST input, unit 5, file /home/roland/projects/climateprediction.net/hadcm3s_1dei_200012_168_926_012128606/jobs/climate.cpdc, line 396, position 20
Image              PC        Routine            Line        Source             
hadcm3s_um_8.36_i  0851D9E5  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  085429B6  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  0832EC95  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  081FD206  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  081FED33  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  0848CCB5  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  0848BE04  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  08496BAD  Unknown               Unknown  Unknown
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=240, iMonCtr=1
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
forrtl: severe (17): syntax error in NAMELIST input, unit 5, file /home/roland/projects/climateprediction.net/hadcm3s_1dei_200012_168_926_012128606/jobs/climate.cpdc, line 396, position 20
Image              PC        Routine            Line        Source             
hadcm3s_um_8.36_i  0851D9E5  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  085429B6  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  0832EC95  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  081FD206  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  081FED33  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  0848CCB5  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  0848BE04  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  08496BAD  Unknown               Unknown  Unknown
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=240, iMonCtr=1
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
forrtl: severe (17): syntax error in NAMELIST input, unit 5, file /home/roland/projects/climateprediction.net/hadcm3s_1dei_200012_168_926_012128606/jobs/climate.cpdc, line 396, position 20
Image              PC        Routine            Line        Source             
hadcm3s_um_8.36_i  0851D9E5  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  085429B6  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  0832EC95  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  081FD206  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  081FED33  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  0848CCB5  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  0848BE04  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  08496BAD  Unknown               Unknown  Unknown
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=240, iMonCtr=1
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
forrtl: severe (17): syntax error in NAMELIST input, unit 5, file /home/roland/projects/climateprediction.net/hadcm3s_1dei_200012_168_926_012128606/jobs/climate.cpdc, line 396, position 20
Image              PC        Routine            Line        Source             
hadcm3s_um_8.36_i  0851D9E5  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  085429B6  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  0832EC95  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  081FD206  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  081FED33  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  0848CCB5  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  0848BE04  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  08496BAD  Unknown               Unknown  Unknown
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=240, iMonCtr=1
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
forrtl: severe (17): syntax error in NAMELIST input, unit 5, file /home/roland/projects/climateprediction.net/hadcm3s_1dei_200012_168_926_012128606/jobs/climate.cpdc, line 396, position 20
Image              PC        Routine            Line        Source             
hadcm3s_um_8.36_i  0851D9E5  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  085429B6  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  0832EC95  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  081FD206  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  081FED33  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  0848CCB5  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  0848BE04  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  08496BAD  Unknown               Unknown  Unknown
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=240, iMonCtr=1
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
forrtl: severe (17): syntax error in NAMELIST input, unit 5, file /home/roland/projects/climateprediction.net/hadcm3s_1dei_200012_168_926_012128606/jobs/climate.cpdc, line 396, position 20
Image              PC        Routine            Line        Source             
hadcm3s_um_8.36_i  0851D9E5  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  085429B6  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  0832EC95  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  081FD206  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  081FED33  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  0848CCB5  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  0848BE04  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  08496BAD  Unknown               Unknown  Unknown
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=240, iMonCtr=1
Model crash detected, will try to restart...
Sorry, too many model crashes! :-(
17:23:42 (240): called boinc_finish(22)

</stderr_txt>
]]>


Sorry not to be precis, this is one of my WSL computers, on the Linux Computers I get:
core_client_version>7.16.5</core_client_version>
<![CDATA[
<message>
process exited with code 22 (0x16, -234)</message>
<stderr_txt>
SIGSEGV: segmentation violation
Stack trace (10 frames):
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu(boinc_catch_signal+0x67)[0x84ff4f7]
linux-gate.so.1(__kernel_sigreturn+0x0)[0xf7f1eb60]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x84277ad]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x80e8e67]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8089442]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8479d6e]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8494feb]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x848be04]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8496bad]
/lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0xf7bfcee5]

Exiting...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3927, iMonCtr=1
Model crash detected, will try to restart...
SIGSEGV: segmentation violation
Stack trace (10 frames):
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu(boinc_catch_signal+0x67)[0x84ff4f7]
linux-gate.so.1(__kernel_sigreturn+0x0)[0xf7f65b60]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x84277ad]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x80e8e67]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8089442]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8479d6e]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8494feb]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x848be04]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8496bad]
/lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0xf7c43ee5]

Exiting...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3927, iMonCtr=1
Model crash detected, will try to restart...
SIGSEGV: segmentation violation
Stack trace (10 frames):
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu(boinc_catch_signal+0x67)[0x84ff4f7]
linux-gate.so.1(__kernel_sigreturn+0x0)[0xf7f4db60]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x84277ad]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x80e8e67]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8089442]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8479d6e]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8494feb]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x848be04]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8496bad]
/lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0xf7c2bee5]

Exiting...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3927, iMonCtr=1
Model crash detected, will try to restart...
SIGSEGV: segmentation violation
Stack trace (10 frames):
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu(boinc_catch_signal+0x67)[0x84ff4f7]
linux-gate.so.1(__kernel_sigreturn+0x0)[0xf7f63b60]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x84277ad]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x80e8e67]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8089442]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8479d6e]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8494feb]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x848be04]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8496bad]
/lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0xf7c41ee5]

Exiting...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3927, iMonCtr=1
Model crash detected, will try to restart...
SIGSEGV: segmentation violation
Stack trace (10 frames):
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu(boinc_catch_signal+0x67)[0x84ff4f7]
linux-gate.so.1(__kernel_sigreturn+0x0)[0xf7f16b60]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x84277ad]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x80e8e67]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8089442]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8479d6e]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8494feb]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x848be04]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8496bad]
/lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0xf7bf4ee5]

Exiting...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3927, iMonCtr=1
Model crash detected, will try to restart...
SIGSEGV: segmentation violation
Stack trace (10 frames):
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu(boinc_catch_signal+0x67)[0x84ff4f7]
linux-gate.so.1(__kernel_sigreturn+0x0)[0xf7fc8b60]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x84277ad]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x80e8e67]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8089442]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8479d6e]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8494feb]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x848be04]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8496bad]
/lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0xf7ca6ee5]

Exiting...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3927, iMonCtr=1
Model crash detected, will try to restart...
Sorry, too many model crashes! :-(
03:08:41 (3927): called boinc_finish(22)

</stderr_txt>
]]>

I know, this
SIGSEGV: segmentation violation
is normally associated to RAM overclocking but these computers do quite well the long WUs (hadam4h).
16) Message boards : Number crunching : New work Discussion (Message 64727)
Posted 28 Oct 2021 by klepel
Post:
Just had two with the file size error... at least I understand now, what you were talking about.
And yes, I have a slow internet connection.
17) Message boards : Number crunching : Site problems (Message 64605)
Posted 9 Oct 2021 by klepel
Post:
Let´s try with 4 months... I produced two ghost WUs by re-attaching to the project on Thursday https://www.cpdn.org/results.php?hostid=1522605 after I disconnected the monitor to try it on another computer. The 2 WUs will never be processed on this computer at all as all files are whipped out!
18) Message boards : Number crunching : Credit Question Answered (Message 64546)
Posted 30 Sep 2021 by klepel
Post:
Bill F, send me the right links! Thanks! For my purposes (Pay-out of some GRC), there is no problem if it is not up-dated every day. It is even better, it is only every week;-) I do it by hand on spreadsheets...
19) Message boards : Number crunching : Erroneous disk space notices (Message 64545)
Posted 30 Sep 2021 by klepel
Post:
EDIT: I looked in the CP project folder and it's obvious there's many outdated folders. I deleted them and it started playing nice with others again.
Great that it worked! Unfortunatelly, this is a little house keeping one has to do on climateprediction.net, when there is no disk space left.
20) Message boards : Number crunching : Erroneous disk space notices (Message 64544)
Posted 30 Sep 2021 by klepel
Post:
With respect to crashed tasks not cleaning up after themselves, this seems to me much less of a problem than it used to be and it only rarely seems to happen to me now whereas it used to happen frequently. That may be because outside of testing branch, I only get the very occasional crashed task these days.
I would not say so: If Aurum has the problem with disk space and, as it seems to me, lot of crashed models, these crashed models will eat up a lot of space quite fast! Since I got WSL working on two Win10 computers, I had to clean-up by hand crushed WUs every times Win10 decided to restart my computer after the monthly Up-Date cycle without my intervention. And I remember well going around my Linux computers with WU numbers written down reported on climateprediction.net as crashed and cleaning it up on the hard disk so new ones could be downloaded again. This is the reason I do not run climateprediction.net on my server.


Next 20

©2024 climateprediction.net