climateprediction.net home page
Posts by klepel

Posts by klepel

1) Message boards : Number crunching : OpenIFS Discussion (Message 66851)
Posted 10 Dec 2022 by klepel
Post:
Glenn will you release the oifs_43r3_bl and oifs_43r3_ps apps in parallel or in sequence? As I am bandwidth limited I can only run max 3 WUs in parallel on all 3 computers assigned to climateprediction.net. My app_config.xml is configured as follows:
<app_config>
 <project_max_concurrent>4</project_max_concurrent>  
[…………………………………….]
   <app>
      <name>oifs_43r3</name>
      <max_concurrent>2</max_concurrent>
      <report_results_immediately/>
   </app>
   <app>
      <name>oifs_43r3_bl</name>
      <max_concurrent>2</max_concurrent>
      <report_results_immediately/>
   </app>
   <app>
      <name>oifs_43r3_ps</name>
      <max_concurrent>2</max_concurrent>
      <report_results_immediately/>
   </app>
</app_config>

So there might run total of 4 WUs of oifs_43r3_bl and oifs_43r3_ps in parallel: I was hesitating to limite project_max_concurrent further as there might appear some HadSM4 WUs (they do not have the bandwidth problem) and I happily crunch them in parallel the oIFS or I might forget to increase it after the oIFS disappear.
For the BOINC specialists, if I set one of the two apps in app_config.xml to 0 (zero), as an example:
   <app>
      <name>oifs_43r3_bl</name>
      <max_concurrent>0</max_concurrent>
      <report_results_immediately/>
   </app>
   <app>
      <name>oifs_43r3_ps</name>
      <max_concurrent>2</max_concurrent>
      <report_results_immediately/>
   </app>
WUs from this app name won’t be downloaded to the computer, means limit the climateprediction.net WUs on a particular computer further?
Thanks!
2) Message boards : Number crunching : OpenIFS Discussion (Message 66743)
Posted 3 Dec 2022 by klepel
Post:
About OpenIFS failure modes:
The one host with errors was the only one on which I suspended all tasks to disk, rebooted the host, and resumed the tasks. I strongly believe that all of these 54 tasks went through this suspend–resume.[…]
The host with errors has reported only successful tasks for a while now, which is another hint that the error episode was just the aftermath of the suspend-resume cycle.
I think I had the same error. The computer was shut-down and restarted. Although BOINC View reported progress, no Tickle / Upload file has been created.
https://www.cpdn.org/result.php?resultid=22248984 as you can see there is a difference between CPU-time and execution-time.

Then there was an WU with Code 9 error https://www.cpdn.org/result.php?resultid=22248970:
9 (0x00000009) Unknown error code

Hope this helps.
3) Message boards : Number crunching : OpenIFS Discussion (Message 66701)
Posted 1 Dec 2022 by klepel
Post:
Update.
After meeting yesterday with CPDN, the disk and memory requirements for these tasks need revising: memory requirement up & disk down. What was not taken into account when setting the memory was the additional amount required by the wrapper code & all the boinc functions it uses (such as zipping). Hopefully this will eliminate some of the memory errors.
The plan is to put out a repeat of the first batch with corrected limits to check how it performs before sending out the rest of this experiment.
Sure this will help!
On trickles, agree these longer (3 month) runs are producing too many trickle files which I'll adjust. However, I looked at the output filesize per output instance and it's reasonable and at the lower limit of what the scientist needs. I am reluctant to change it.
Understood! Hope less tickles might help for smoother uploads.
Question for ADSL people: knowing your bottleneck is network, are you happy just reducing the no. of tasks running concurrently? What's your sustainable data-flow rate you would be happy with (give me a number to work with).
I do not have any problems to reduce the number of tasks running on my computers to fit into my ADSL bandwidth. I have to remind myself, I offer the scientist a certain amount of compute power, but they have to accept the offer – there are a lot of other worthy BOINC projects! (Hopefully I will remind myself of it, when I will go out shipping computer parts for climatepretiction.net I do not need for my personal daily computer requirements!) However, I am still concerned, how many climateprediction.net participants are reading the Forums and how many users are out there, who have installed BOINC and attached to climateprediction.net, but never check their machines. You might end up, with a lot of OpenIFS results piling up on computers with slow internet connections, wasting energy and resources and never help science. I will send you a PM with my ADSL speed, so you have a number of WUs, I am likely to contribute each day. It is not much!
4) Message boards : Number crunching : OpenIFS Discussion (Message 66667)
Posted 30 Nov 2022 by klepel
Post:
As I said in my earlier message 66661, which klepel only partially quotes, disk limits can be checked either on the server before the task is issued, or on the client before the task is run. Different checks may be applied at either stage: we need to consider them as separate problems.
Richard, you know better than me how BOINC works. My point is: It seems to me, that the model OpenIFS indicates BOINC, it needs more space than it actually needs. And as several pointed out, causes the problem that the assigned disk-space to BOINC has to be quite large. This is the case with WSL2 and Linux Computers as well. Some of my Linux installations are dual boot on small SSDs so there is no disc-space for 40 GB for BOINC alone (Just checked: One of my Linux computers does not download OpenIFS because it is 7 GB short). As I understand Glenn, he has a lot of WUs to run, so I try to unlock more computers for climateprediction.net.
5) Message boards : Number crunching : OpenIFS Discussion (Message 66663)
Posted 30 Nov 2022 by klepel
Post:
Either or both may be related to the XML specification for the workunit, which contains:
<rsc_disk_bound> 40,000,000,000
My calculator makes that 37.25 GB, using 1024x steps.
Glenn, this is the problem, the model asks for too much disk space before it even starts to download a OpenIFS Wu. It is not, as you have written in the other thread, that the model has already downloaded OpenIFS Wus and after several crashes has run out of allocated disk space in WSL2. BOINC simple sees there is not sufficient space allowed on the hard-disk on a particular computer and therefor refuses starting to download a WU.
I have already checked if my WSL2 disk is full with crashed models and there is none! As far as I understand the researcher has to reduce this to the size it is really needed:
<rsc_disk_bound> 40,000,000,000
It might be just one decimal less.

In the meanwhile another model crashed. As there has been a “traffic jam” with another BOINC project (SIDOCK produces small WUs with a huge data-output, and releases these once a day) climateprediction.net tickles have not been uploaded in time. OpenIFS reports no tickles – they have been waiting in line…

I have another question: Once the multicore models start to be released, the tickles will be even larger and appear in a higher frequency or is there no direct link between multithreaded and tickle size/frequency?
6) Message boards : Number crunching : OpenIFS Discussion (Message 66656)
Posted 30 Nov 2022 by klepel
Post:
I bought 64 GB RAM for one of my Linux Computers, so I would able to process in parallel more of the new OpenIFS WUs. Has there be even mentioned, that each WU will generate about 1 GB of tickles in the previous conversation in preparation of this OpenIFS WUs?

ADSL:
ADSL in the US can be very, very badly asymmetric. Not that many years back, I had 25Mbit down, 768kbit up (yes, not even 1Mbit up).
Same here (2Mbit down, 768kbit up)! I am not able to change to something faster as I work with it! I have 3 working LINUX computers behind this ADSL line and intended to start 2 more. But this will not happen, I am not even able to upload tickles of 2 parallel running WUs.
That is, I can only put a fraction of my CPUs to OpenIFS in steady state due to my upload bandwidth limitation.
This wraps it up nicely! This data volume is not sustainable! Now it is not 32bit-libs, now it is bandwidth!
Error: " climateprediction.net: Notice from server
OpenIFS 43r3 Perturbed Surface needs 21968.66MB more disk space. You currently have 16178.31 MB available and it needs 38146.97 MB.
Tue 29 Nov 2022 09:56:52 AM CET"
38 Gb sounds alot, normal?
Same problem with my WSL2 installation – this computer would not have a pronounced bandwidth problem, but does not download any tasks as OpenIFS asks constantly for 38146.97 MB! climateprediction.net project-folder on the Linux Computer needs only 8.1 GB!

I think less tickles with around 100MB each would be easier for BOINC to handle. But there are more knowledgable participants in the forum. And yes the overall size should be reduced to something more manageable as bandwidth limiting the throughput of WUs over all participants.
OpenIFS Errors:
https://www.cpdn.org/result.php?resultid=22245289
https://www.cpdn.org/result.php?resultid=22245630
After adjusting app_config from max 4 WUs to 1 WU and afterwards to 2 WUs.
7) Message boards : Number crunching : New work discussion - 2 (Message 66348)
Posted 11 Nov 2022 by klepel
Post:
On the verge of the release of the OpenIFS tasks, would you mind to give the exact app names, we have to use in the app_config files to restrict the number of concurrent WUs to a X<CPU- cores:
app_config>
   <app>
      <name>OpenIFS</name>
      <max_concurrent>1</max_concurrent>
      <report_results_immediately/>
   </app>
</app_config>

Would this be sufficient for all the sub-projects of OpenIFS as well? Please confirm.
Regards,
klepel
8) Message boards : Number crunching : New work Discussion (Message 65782)
Posted 7 Aug 2022 by klepel
Post:

Hold them back, i still have more than enough work downloaded ^^

https://www.cpdn.org/results.php?hostid=1521318&offset=0&show_names=0&state=1&appid=

Some day i looked into the VM, and it was more than full :)

Greets
Felix

Dear Felix,
I know, I shouldn’t write anything, but you have 227 tasks in process:
73 N114 tasks and 154 N216 tasks (quick calculation).
Your computer finishes the tasks:
N114 in about 300000 [s]: 300000 [s]*73=21900000 [s]
and N216 in about 950000 [s]: 950000 [s]*154=146300000 [s]
Your VM computer has 4 processors, therefore your VM will finish all tasks in about 487 days, well above the 365 days deadline: About 122 days afterwards or 33% of your tasks will not finish in time.
Do you mind to release about 33% of your tasks? Preferable N216 tasks – just kill them! So other computers (idle on CPDN tasks) might work on them and the batches are finished in a useful time for the researcher!
Thanks a lot,
klepel
9) Message boards : Number crunching : No work for Windows OR Linux?! (Message 65359)
Posted 13 Apr 2022 by klepel
Post:
"I recently figured out that you can view and manage both the Windows and WSL2 clients from the same BOINC Manager (in Windows). That's been really helpful."

I am interested in this part! How to do it? Please step by step.
10) Questions and Answers : Unix/Linux : Run Linux work units with Windows 10 WSL (Message 65142)
Posted 11 Feb 2022 by klepel
Post:
Windows Up-Date killed just the last two hadam4h on this computer https://www.cpdn.org/results.php?hostid=1517859. Approx. 8 days before finishing it. Sorry for that!
11) Message boards : Number crunching : New work Discussion (Message 64896)
Posted 5 Jan 2022 by klepel
Post:
Am I the only one with problems with the new short tasks (UK Met Office HadCM3 short v8.36)?

It seems all error with:
<core_client_version>7.16.6</core_client_version>
<![CDATA[
<message>
process exited with code 22 (0x16, -234)</message>
<stderr_txt>
Suspended CPDN Monitor - Suspend request from BOINC...
forrtl: severe (17): syntax error in NAMELIST input, unit 5, file /home/roland/projects/climateprediction.net/hadcm3s_1dei_200012_168_926_012128606/jobs/climate.cpdc, line 396, position 20
Image              PC        Routine            Line        Source             
hadcm3s_um_8.36_i  0851D9E5  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  085429B6  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  0832EC95  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  081FD206  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  081FED33  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  0848CCB5  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  0848BE04  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  08496BAD  Unknown               Unknown  Unknown
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=240, iMonCtr=1
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
forrtl: severe (17): syntax error in NAMELIST input, unit 5, file /home/roland/projects/climateprediction.net/hadcm3s_1dei_200012_168_926_012128606/jobs/climate.cpdc, line 396, position 20
Image              PC        Routine            Line        Source             
hadcm3s_um_8.36_i  0851D9E5  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  085429B6  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  0832EC95  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  081FD206  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  081FED33  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  0848CCB5  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  0848BE04  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  08496BAD  Unknown               Unknown  Unknown
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=240, iMonCtr=1
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
forrtl: severe (17): syntax error in NAMELIST input, unit 5, file /home/roland/projects/climateprediction.net/hadcm3s_1dei_200012_168_926_012128606/jobs/climate.cpdc, line 396, position 20
Image              PC        Routine            Line        Source             
hadcm3s_um_8.36_i  0851D9E5  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  085429B6  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  0832EC95  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  081FD206  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  081FED33  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  0848CCB5  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  0848BE04  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  08496BAD  Unknown               Unknown  Unknown
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=240, iMonCtr=1
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
forrtl: severe (17): syntax error in NAMELIST input, unit 5, file /home/roland/projects/climateprediction.net/hadcm3s_1dei_200012_168_926_012128606/jobs/climate.cpdc, line 396, position 20
Image              PC        Routine            Line        Source             
hadcm3s_um_8.36_i  0851D9E5  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  085429B6  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  0832EC95  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  081FD206  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  081FED33  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  0848CCB5  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  0848BE04  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  08496BAD  Unknown               Unknown  Unknown
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=240, iMonCtr=1
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
forrtl: severe (17): syntax error in NAMELIST input, unit 5, file /home/roland/projects/climateprediction.net/hadcm3s_1dei_200012_168_926_012128606/jobs/climate.cpdc, line 396, position 20
Image              PC        Routine            Line        Source             
hadcm3s_um_8.36_i  0851D9E5  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  085429B6  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  0832EC95  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  081FD206  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  081FED33  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  0848CCB5  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  0848BE04  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  08496BAD  Unknown               Unknown  Unknown
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=240, iMonCtr=1
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
forrtl: severe (17): syntax error in NAMELIST input, unit 5, file /home/roland/projects/climateprediction.net/hadcm3s_1dei_200012_168_926_012128606/jobs/climate.cpdc, line 396, position 20
Image              PC        Routine            Line        Source             
hadcm3s_um_8.36_i  0851D9E5  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  085429B6  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  0832EC95  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  081FD206  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  081FED33  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  0848CCB5  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  0848BE04  Unknown               Unknown  Unknown
hadcm3s_um_8.36_i  08496BAD  Unknown               Unknown  Unknown
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=240, iMonCtr=1
Model crash detected, will try to restart...
Sorry, too many model crashes! :-(
17:23:42 (240): called boinc_finish(22)

</stderr_txt>
]]>


Sorry not to be precis, this is one of my WSL computers, on the Linux Computers I get:
core_client_version>7.16.5</core_client_version>
<![CDATA[
<message>
process exited with code 22 (0x16, -234)</message>
<stderr_txt>
SIGSEGV: segmentation violation
Stack trace (10 frames):
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu(boinc_catch_signal+0x67)[0x84ff4f7]
linux-gate.so.1(__kernel_sigreturn+0x0)[0xf7f1eb60]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x84277ad]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x80e8e67]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8089442]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8479d6e]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8494feb]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x848be04]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8496bad]
/lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0xf7bfcee5]

Exiting...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3927, iMonCtr=1
Model crash detected, will try to restart...
SIGSEGV: segmentation violation
Stack trace (10 frames):
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu(boinc_catch_signal+0x67)[0x84ff4f7]
linux-gate.so.1(__kernel_sigreturn+0x0)[0xf7f65b60]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x84277ad]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x80e8e67]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8089442]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8479d6e]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8494feb]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x848be04]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8496bad]
/lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0xf7c43ee5]

Exiting...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3927, iMonCtr=1
Model crash detected, will try to restart...
SIGSEGV: segmentation violation
Stack trace (10 frames):
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu(boinc_catch_signal+0x67)[0x84ff4f7]
linux-gate.so.1(__kernel_sigreturn+0x0)[0xf7f4db60]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x84277ad]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x80e8e67]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8089442]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8479d6e]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8494feb]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x848be04]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8496bad]
/lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0xf7c2bee5]

Exiting...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3927, iMonCtr=1
Model crash detected, will try to restart...
SIGSEGV: segmentation violation
Stack trace (10 frames):
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu(boinc_catch_signal+0x67)[0x84ff4f7]
linux-gate.so.1(__kernel_sigreturn+0x0)[0xf7f63b60]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x84277ad]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x80e8e67]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8089442]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8479d6e]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8494feb]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x848be04]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8496bad]
/lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0xf7c41ee5]

Exiting...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3927, iMonCtr=1
Model crash detected, will try to restart...
SIGSEGV: segmentation violation
Stack trace (10 frames):
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu(boinc_catch_signal+0x67)[0x84ff4f7]
linux-gate.so.1(__kernel_sigreturn+0x0)[0xf7f16b60]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x84277ad]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x80e8e67]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8089442]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8479d6e]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8494feb]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x848be04]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8496bad]
/lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0xf7bf4ee5]

Exiting...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3927, iMonCtr=1
Model crash detected, will try to restart...
SIGSEGV: segmentation violation
Stack trace (10 frames):
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu(boinc_catch_signal+0x67)[0x84ff4f7]
linux-gate.so.1(__kernel_sigreturn+0x0)[0xf7fc8b60]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x84277ad]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x80e8e67]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8089442]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8479d6e]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8494feb]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x848be04]
/home/kle1boinc/BOINC/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8496bad]
/lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0xf7ca6ee5]

Exiting...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3927, iMonCtr=1
Model crash detected, will try to restart...
Sorry, too many model crashes! :-(
03:08:41 (3927): called boinc_finish(22)

</stderr_txt>
]]>

I know, this
SIGSEGV: segmentation violation
is normally associated to RAM overclocking but these computers do quite well the long WUs (hadam4h).
12) Message boards : Number crunching : New work Discussion (Message 64727)
Posted 28 Oct 2021 by klepel
Post:
Just had two with the file size error... at least I understand now, what you were talking about.
And yes, I have a slow internet connection.
13) Message boards : Number crunching : Site problems (Message 64605)
Posted 9 Oct 2021 by klepel
Post:
Let´s try with 4 months... I produced two ghost WUs by re-attaching to the project on Thursday https://www.cpdn.org/results.php?hostid=1522605 after I disconnected the monitor to try it on another computer. The 2 WUs will never be processed on this computer at all as all files are whipped out!
14) Message boards : Number crunching : Credit Question (Message 64546)
Posted 30 Sep 2021 by klepel
Post:
Bill F, send me the right links! Thanks! For my purposes (Pay-out of some GRC), there is no problem if it is not up-dated every day. It is even better, it is only every week;-) I do it by hand on spreadsheets...
15) Message boards : Number crunching : Erroneous disk space notices (Message 64545)
Posted 30 Sep 2021 by klepel
Post:
EDIT: I looked in the CP project folder and it's obvious there's many outdated folders. I deleted them and it started playing nice with others again.
Great that it worked! Unfortunatelly, this is a little house keeping one has to do on climateprediction.net, when there is no disk space left.
16) Message boards : Number crunching : Erroneous disk space notices (Message 64544)
Posted 30 Sep 2021 by klepel
Post:
With respect to crashed tasks not cleaning up after themselves, this seems to me much less of a problem than it used to be and it only rarely seems to happen to me now whereas it used to happen frequently. That may be because outside of testing branch, I only get the very occasional crashed task these days.
I would not say so: If Aurum has the problem with disk space and, as it seems to me, lot of crashed models, these crashed models will eat up a lot of space quite fast! Since I got WSL working on two Win10 computers, I had to clean-up by hand crushed WUs every times Win10 decided to restart my computer after the monthly Up-Date cycle without my intervention. And I remember well going around my Linux computers with WU numbers written down reported on climateprediction.net as crashed and cleaning it up on the hard disk so new ones could be downloaded again. This is the reason I do not run climateprediction.net on my server.
17) Message boards : Number crunching : Credit Question (Message 64522)
Posted 29 Sep 2021 by klepel
Post:
I paid out some GRC to climateprediction.net users the last couple of weeks. I was always wondering, how to obtain this user stats XML files directly from climateprediction.net (Last update user XML 2021-09-23 12:40:28 UTC (5 days 09:04:14 old)).
I would highly appreciate, if anyone might help/guide me to obtain the files. Please PM me.
So, I will not have to copy past it from boincstats.com and might semi-automate pay-outs.
Thanks a lot!
klepel
18) Message boards : Number crunching : Erroneous disk space notices (Message 64520)
Posted 28 Sep 2021 by klepel
Post:
134	World Community Grid	9/28/2021 10:15:20 AM	Message from server: OpenPandemics - COVID 19 needs 200.00MB more disk space.  You currently have 0.00 MB available and it needs 200.00 MB.	
135	World Community Grid	9/28/2021 10:15:20 AM	Message from server: Mapping Cancer Markers needs 500.00MB more disk space.  You currently have 0.00 MB available and it needs 500.00 MB.

You do not have enough disk space available. You might reconfigure your BOINC options as indicated by the post above.
And you are right, if a climateprediction.net WU crashes the zip files will not be cleaned up afterwards. So, you will have a lot of worthless information eating up your disk space. Someone mentioned it before, there are two solutions:
restart project
or clean-up all the crashed WUS by hand in the corresponding project folder.
Yes, it is not easy and hassle free to run climateprediction.net, but therefore it is fun and will help further generations!
19) Message boards : Number crunching : Computation Errors (Message 64349)
Posted 12 Aug 2021 by klepel
Post:
I also limit to 4 WUs on AMD 3950, 2600, 1700. This seems to be a good compromise, with less WUs the calculation is definitely faster and with more the calculation is noticeable slower. All other threads are used for TN-grid or SiDock (and they are also impacted by the number of climateprediction.net WUs).
This is why I limit the WUs on AMD 3950 although this chip has more cache than the others two.
On my two WSL Computers I do limit climateprediction.net to 2 WUs per virtual machine, otherwise the RAM use is too high and the other WUs on Win10 are heavily impacted.
20) Message boards : Number crunching : Credit Question (Message 64250)
Posted 1 Aug 2021 by klepel
Post:
Strange Server Status says there's 10613 WUs ready to send. They must be Windoze since my Linux computers are not getting enough. The Applications page shows the work is dominated by Linux. I'm completely baffled by the relevance of Windows apps to whitelisting for Gridcoin which requires daily stats updates. People want to help but it sounds like you don't want any help.

Are you sure you have all the lib32 apps installed? I do not have any problems to download WUs on Linux or Win WSL. By the way, it is best not to run climateprediction.net on all cores or threads. But this you already know as a heavy BOINC user.


Next 20

©2023 climateprediction.net