climateprediction.net home page
Posts by nairb

Posts by nairb

41) Questions and Answers : Unix/Linux : computation error at 100% complete (Message 62836)
Posted 2 Nov 2020 by nairb
Post:
Well,well,well its a success. And it was still uploading the last 192meg file when it reached 100%.

So I will try the new method of stopping the processing of w/u. I had worked with computers for endless years and always hated using the on/off switch to solve issues. But power cuts never seem to kill a climate w/u. Just luck I guess.

Good idea to use top to check if the process really has cleared off.
Lets hope the re-issued w/u work better when they arrive.
42) Questions and Answers : Unix/Linux : computation error at 100% complete (Message 62832)
Posted 2 Nov 2020 by nairb
Post:

It may be a stupid question but did you exit the client as well as suspending computation when you made the changes? If you just suspend, the changes get reversed by the running client - I have tried it in the past without stopping the client.


No,no, good question. I dident think to quit the client. So I just checked and the client_state.xml was as without the changes.

So I quit the client................. made the changes again. and restarted. 2 other w/u error(ed) straight away and died. They had only!! been running a day or so. The w/u that is due to complete soon restarted ok. I rechecked the client_state.xml and the changes were still there.

So I will have to redo the changes on 2 other machines again................ every time I suspend a job or restart boinc I seem to lose a w/u or 2.

I now just pull the power core out.
I will report the outcome of the remaining 877 with 1hr 10mins left to run

The 2 failed w/u after the restart showed
hadam4h_c0ds_206511_5_878_012030354_0
<core_client_version>7.16.6</core_client_version>
<![CDATA[
<message>
process got signal 65</message>
<stderr_txt>
Signal 2 received: Interrupt
Signal 2 received: Illegal instruction - invalid function image
Signal 2 received: Floating point exception
Signal 2 received: Segment violation
Signal 2 received: Software termination signal from kill
Signal 2 received: Abnormal termination triggered by abort call
Signal 2 received, exiting...
20:14:24 (1310): called boinc_finish(193)
Signal 2 received: Interrupt
Signal 2 received: Illegal instruction - invalid function image
Signal 2 received: Floating point exception
Signal 2 received: Segment violation
Signal 2 received: Software termination signal from kill
Signal 2 received: Abnormal termination triggered by abort call
Signal 2 received, exiting...
20:14:25 (983): called boinc_finish(193)

</stderr_txt>

along with loads of messages like
02-Nov-2020 20:32:07 [climateprediction.net] Output file hadam4h_c0ds_206511_5_878_012030354_0_r381159273_4.zip for task hadam4h_c0ds_206511_5_878_012030354_0 absent
43) Questions and Answers : Unix/Linux : computation error at 100% complete (Message 62829)
Posted 2 Nov 2020 by nairb
Post:
Well the fix had been applied to the client_state.xml. Both the max-nbytes & rsc_disk_bound. All those lines that needed changing. Maybe I made an error but the next w/u to finish just now also failed with....
<core_client_version>7.16.6</core_client_version>
<![CDATA[
<stderr_txt>
Suspended CPDN Monitor - Suspend request from BOINC...
19:06:44 (1926): called boinc_finish(0)

</stderr_txt>
<message>
upload failure: <file_xfer_error>
<file_name>hadam4h_b0ft_200911_5_877_012028747_0_r1809679561_5.zip</file_name>
<error_code>-131 (file size too big)</error_code>
</file_xfer_error>
</message>
]]>

There is another 877 due to finish in a couple of hrs. Followed by a bunch of 878's. I am beginning to think is not worth letting any of these w/u's to run. Maybe abort the entire lot and wait for a fixed batch. Little point in waiting 20(ish) days to find they fail as well......
44) Questions and Answers : Unix/Linux : computation error at 100% complete (Message 62822)
Posted 2 Nov 2020 by nairb
Post:
Edit find and replace <max_nbytes>150000000.000000</max_nbytes> to <max_nbytes>300000000.000000</max_nbytes> and hitting replace all. should sort it.


Its been done..... it does seem that only some w/u are affected. Better to find out its not a machine problem tho.

Ta
Nairb
45) Questions and Answers : Unix/Linux : computation error at 100% complete (Message 62816)
Posted 1 Nov 2020 by nairb
Post:

Edit:With a text editor that will save the file as plain text without adding any end of file characters when saved I have opened client_state.xml and with all of batch 877 and 878 on my machine I have looked for <rsc_disk_bound> for these tasks and doubled the value from 2000000000.000000 to 4000000000.000000.
My tasks which might be affected have only just started so it will be about a week till I know whether it works or not. In the meantime I will let Andy know that this is a problem.


Ok, its been done on the 2 machines. There are 3 877's to complete in the next 20 hrs. I do hope they are more successful.
46) Questions and Answers : Unix/Linux : computation error at 100% complete (Message 62814)
Posted 1 Nov 2020 by nairb
Post:
On a separate machine I have just had the same thing happen to
hadam4h_b0cw_200811_5_877_012028642_0

<core_client_version>7.16.6</core_client_version>
<![CDATA[
<stderr_txt>
22:21:59 (31404): called boinc_finish(0)

</stderr_txt>
<message>
upload failure: <file_xfer_error>
<file_name>hadam4h_b0cw_200811_5_877_012028642_0_r803748994_5.zip</file_name>
<error_code>-131 (file size too big)</error_code>
</file_xfer_error>
</message>
]]>

This machine had 11.6gb free disk space.
The final zip file was 193.11mb

I did not get time to check the <rsc_disk_bound> value
47) Questions and Answers : Unix/Linux : computation error at 100% complete (Message 62809)
Posted 1 Nov 2020 by nairb
Post:
So I left the machine with 15 mins to go before completion on task hadam4h_b0tg_201211_5_877_012029238_0.
When I returned it had failed with computation error.
It says that the w/u returned 5 trickles.
There was a street wide power failure a couple of weeks ago which shut the machine(s) down but all w/u's resumed ok.

I have 3 more due to finish in the next day. It would be a shame if all failed the same way.

I am beginning to suspect a shortage of disk space on this machine....... is this what the stderr_txt is actually trying to say??
Ta
Nairb


<core_client_version>7.16.6</core_client_version>
<![CDATA[
<stderr_txt>
16:13:32 (1925): called boinc_finish(0)

</stderr_txt>
<message>
upload failure: <file_xfer_error>
<file_name>hadam4h_b0tg_201211_5_877_012029238_0_r1378671937_2.zip</file_name>
<error_code>-131 (file size too big)</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadam4h_b0tg_201211_5_877_012029238_0_r1378671937_5.zip</file_name>
<error_code>-131 (file size too big)</error_code>
</file_xfer_error>
</message
48) Questions and Answers : Unix/Linux : fedora 30 64 bit (Message 62784)
Posted 22 Oct 2020 by nairb
Post:
For those who might be trying fedora 30.... I have finally got round to doing a minimal fedora 30 install. I usually go for the kde-plasma desktop gui. We all know how resource hungry some fedora spins can be. But its nice to have some of the gui tools. I checked my 80gig disk only to find its getting bit short of space with climate/einstein etc installed.

So it was time to get back to basics. A min install + admin tools. And all you get is a command line. I use tigervnc & run boinc thru an xterm. All very simple ..... like it used to be. It now all fits in the corner of a 60gig disk.
49) Message boards : Number crunching : Welcome back/checking if everything is working? (Message 62754)
Posted 5 Oct 2020 by nairb
Post:

Les has contacted the project, some cleaning up will be done but probably not before some more work appears which will be part of the new season Msc programme which should in the next few weeks have work for both Windows and Linux machines. (Not sure about Mac.



I presume these will still require some 32bit libs and not the full blown 64bit jobbies. For linux w/u. I had better make sure the fedora 30 hard disk is plugged in.

Assuming I am "lucky" to snare a w/u that is.
50) Message boards : Number crunching : Updated BOINC Clients 7.16.11 - Windows 64-bit and Mac OS X (64-bit Intel) (Message 62736)
Posted 23 Sep 2020 by nairb
Post:
One thing about these ARP tasks is that they can be suspended/stopped/re-started endless times without having a fit and dying with a computation error.

Very endearing.
51) Message boards : Number crunching : Updated BOINC Clients 7.16.11 - Windows 64-bit and Mac OS X (64-bit Intel) (Message 62730)
Posted 19 Sep 2020 by nairb
Post:

And almost as difficult to get a w/u for. There are a few now and then........ and there does not seem to be a status page showing if there are w/u's to be had. Or have I missed it?.


I seem to be able to consistently run 6 of these. On a multicore computer the secret is to go to the profile - default, home, work or school the computer is using, click on custom profile and then up the number of concurrent work units for the project from the default of 1.

(Took me a while to work this out!)



Wow, Thanks for that info. I had not found those custom/home profiles per device. I now have as many ARP's as I want.

With the demise of seti and sparse w/u here on cpdn, WCG has some interesting projects.

Altho if seti had found a signal ............. still my favorite.
52) Message boards : Number crunching : Updated BOINC Clients 7.16.11 - Windows 64-bit and Mac OS X (64-bit Intel) (Message 62726)
Posted 17 Sep 2020 by nairb
Post:
And while weather rather than climate there is the Africa Rainfall Project with World Community Grid.


And almost as difficult to get a w/u for. There are a few now and then........ and there does not seem to be a status page showing if there are w/u's to be had. Or have I missed it?.
53) Message boards : Number crunching : Welcome back/checking if everything is working? (Message 62670)
Posted 2 Sep 2020 by nairb
Post:
AMD Ryzen 7 3700X 8-Core Processor [Family 23 Model 113 Stepping 0]

Asus B550-plus 32GB Ram


Out of interest whats the power consumption of this machine with all threads running. I still have a dual slot2 xeon machine... 700mhz. The lights used to dim when that was fired up. Gone are the days of cheap electricity. I think my I7 machine is about 26 quid a month to run. Thats English pounds.........
54) Questions and Answers : Unix/Linux : fedora 30 64 bit (Message 62202)
Posted 6 Mar 2020 by nairb
Post:
Huuurrray....... after a month of effort and several fails along the way the machine has completed a w/u and the system says its "Outcome Success".................. all-tho the stderr_txt seems to say Segment violation??. Or was that from a previous w/u that failed. It did make it to 100% before completing.

So it would seem that fedora 30 does work with the addition of a couple of libs.
And its not wise to do a kernel update before the w/u completion.
Oh, and if restarting then make sure the slots are ok.

Hopefully the 2 remaining w/u will also play the game.
55) Questions and Answers : Unix/Linux : fedora 30 64 bit (Message 62176)
Posted 2 Mar 2020 by nairb
Post:
Oh dear... I needed to restart the desktop. All of the 3 models resumed - then one failed with computing error. It had only been running for 3 days. But I checked how successful the remaining models had been with other computers. Gulp....... not one had been successful.

I needed to restart the desktop machine after an software update which included changes/updates to the fc30 kernel. Maybe this is not a
wise thing to do when a model has started. I doubt this is a fedora 30 problem tho.
56) Questions and Answers : Unix/Linux : fedora 30 64 bit (Message 62152)
Posted 24 Feb 2020 by nairb
Post:
One model has produced a 137mb upload file for the first trickle...... Better than last time. Dare I hope??.
57) Questions and Answers : Unix/Linux : fedora 30 64 bit (Message 62148)
Posted 21 Feb 2020 by nairb
Post:
Thanks for the info. I was a bit lazy in not doing a ldd on all the executables. It turned out that the hadam4_se_8.52_i686-pc-linux-gnu.so exe was missing the lib - libnsl.so.1.
I did notice that there was no data being uploaded during the trickle-up times but since the model did not crash I thought all was working ok.
I aborted the other model at 84%. There was little point in letting it finish and fail.
I have started the 3rd model and will watch and see at 25% if there is data produced to be uploaded.
I suppose I could have followed the instruction on using Ubuntu but I have used Fedora from the beginning.... starting with redhat 5.2 and on to fedora 4...... I still have machines with those o/s on. Fedora seems to be ok with several of the other boinc projects.

Fingers crossed for the next model.............
58) Questions and Answers : Unix/Linux : fedora 30 64 bit (Message 62142)
Posted 19 Feb 2020 by nairb
Post:
So at 99.81% complete I was getting ready for a celebration..... I wander off for a couple of hrs and when I return the blasted w/u had error-ed. It looks like one of the libs was missing.
The error was libnsl.so.1 was missing. I wonder how many more are missing as well.

A quick yum install found and installed the lib. I turns out it has no-longer been included in fedora since release 28. Shame I had to wait 16 odd days to find out.

So anybody using fedora 30 needs to install this lib also. (libnsl.so.1) not sure if I should look for a 32bit version as well.

In a couple of days the second w/u should finish....... lets hope it finished as it should
59) Questions and Answers : Unix/Linux : fedora 30 64 bit (Message 62085)
Posted 7 Feb 2020 by nairb
Post:
Yup, just after 25% the first model produced a trickle up. It will be interesting to see how much faster an 4 core I5 cpu will be. It still seems fine for memory. Very little swap space used.
Now its fingers crossed the model does not crash.
60) Questions and Answers : Unix/Linux : fedora 30 64 bit (Message 62080)
Posted 5 Feb 2020 by nairb
Post:
So far it is on course to do a w/u in 16 days. I have let 2 models run at once. And some seti w/u also. I does slow down will all 4 logical running. The m/b will take an I5 cpu with 4 cores, so I will update to one of these. And in the meantime just run the 2 climate models.
At 20% there are still no trickle up's.
But this test was to see if Fedora 30 would run the 32 bit apps. This is the last version of fedora that will support 32 bits. But it should be good for a few years.
Fedora 30 does run seti, einstein & rosetta fine. So far its doing good on climate also.


Previous 20 · Next 20

©2024 climateprediction.net