climateprediction.net home page
Posts by nairb

Posts by nairb

61) Questions and Answers : Unix/Linux : fedora 30 64 bit (Message 62071)
Posted 3 Feb 2020 by nairb
Post:
An update on using fedora linux
I finally grabbed some w/u’s. So I suspended them before they could start and did the following to try and find out which libs the climate app used.

Here is the linux that I am using. Fedora 30 workstation, 64 bit
uname -mr
5.4.12-100.fc30.x86_64 x86_64

The following is the output from ldd
ldd hadam4_8.52_i686-pc-linux-gnu  
       linux-gate.so.1 (0xf7efa000)
       libpthread.so.0 => /lib/libpthread.so.0 (0xf7ea7000)
       libdl.so.2 => /lib/libdl.so.2 (0xf7ea1000)
       libstdc++.so.6 => not found
       libm.so.6 => /lib/libm.so.6 (0xf7dcf000)
       libgcc_s.so.1 => /lib/libgcc_s.so.1 (0xf7db1000)
       libc.so.6 => /lib/libc.so.6 (0xf7c0a000)
       /lib/ld-linux.so.2 (0xf7efb000)

SO, it looked like the libstdc++.so.6 is missing. So I installed it
And another way of finding (some?) of the libs
(Part) of the output of readelf -d hadam4_8.52_i686-pc-linux-gnu

0x00000001 (NEEDED) Shared library: [libpthread.so.0]
0x00000001 (NEEDED) Shared library: [libdl.so.2]
0x00000001 (NEEDED) Shared library: [libstdc++.so.6]
0x00000001 (NEEDED) Shared library: [libm.so.6]
0x00000001 (NEEDED) Shared library: [libgcc_s.so.1]
0x00000001 (NEEDED) Shared library: [libc.so.6]

It looked like I had all the libs installed (hopefully).

I un-suspended one of the tasks And…. So far its done almost 1% without falling over.
Its a simple and cheap upgrade to an I5 processor or even one of the I7 ones and a bit more ram. If this test with fedora works then a better desktop is the way forward.
62) Questions and Answers : Unix/Linux : fedora 30 64 bit (Message 62063)
Posted 29 Jan 2020 by nairb
Post:
Has anybody tried using fedora 30 workstation for climate w/u. Its running on an intel i3 processor. Does the download of new w/u automatically select 64bit models or will I need to get some of those 32bit libs?
Thats assuming there are some 64bit linux to be had.

ta
Nairb
63) Message boards : Number crunching : New work Discussion (Message 60014)
Posted 25 Apr 2019 by nairb
Post:
Well I have just grabbed 4 more w/u. 2 are safra/790. I checked their history to find they have both failed on other computers with "computing error". I'm not too keen to let them run for 3-4 days to have another "computing error". I would rather let the 2 sam50's have a go. Is it fair game to abort the safra ones??.
64) Message boards : Number crunching : New work Discussion (Message 60010)
Posted 25 Apr 2019 by nairb
Post:
Boooo another "Signal 11 received: Segment violation"

on a "wah2_safr50_a0vl_201612_24_790_011752393_0"

I thought these were ok.


Another safra/790 with Segment violation. Pleased I have run out of these now..

Are the sam50's OK?
65) Message boards : Number crunching : New work Discussion (Message 59974)
Posted 14 Apr 2019 by nairb
Post:
Boooo another "Signal 11 received: Segment violation"

on a "wah2_safr50_a0vl_201612_24_790_011752393_0"

I thought these were ok.
66) Message boards : Number crunching : Upload failures (Message 59956)
Posted 9 Apr 2019 by nairb
Post:
An email is on it's way south.

Magic email seems to have done the job. All those anz w/u zips now uploaded.
67) Message boards : Number crunching : Upload failures (Message 59947)
Posted 9 Apr 2019 by nairb
Post:
nairb

That's a different matter.
Yours are heading in the opposite direction, to a big data center in Hobart Australia.

I'll email Andy


I guess the anz in anz50 w/u gives it away.
68) Message boards : Number crunching : Upload failures (Message 59942)
Posted 8 Apr 2019 by nairb
Post:
I am getting upload fails as well

08/04/2019 20:25:10 | climateprediction.net | [error] Error reported by file upload server: Server is out of disk space
08/04/2019 20:25:10 | climateprediction.net | Temporarily failed upload of wah2_anz50_n3qc_201612_20_794_011767222_0_r1766311403_11.zip: transient upload error
08/04/2019 20:25:10 | climateprediction.net | Backing off 00:18:01 on upload of wah2_anz50_n3qc_201612_20_794_011767222_0_r1766311403_11.zip

Better than segmentation fault!!.
69) Message boards : Number crunching : New work Discussion (Message 59850)
Posted 21 Mar 2019 by nairb
Post:
I thought I would give it another go and have a safr50 791 & sam50 804 running for a whole 24 hrs and still not had a fit.(:Segment violation) they are at about 13%. I dont have that warm feeling of confidence.
70) Message boards : Number crunching : New work Discussion (Message 59740)
Posted 8 Mar 2019 by nairb
Post:
I was finding that the Intel I5 win10 laptop was having many, many wireless dropouts while doing the w/u. Now they have all died the machine seems to be fine again. A coincidence maybe. It was fine with other climate w/u.
Once, back in the depths of time I had what was called a "farm" of some 55-60 computers doing several projects. Before the arrival of the mighty pentium4 and a pentium Pro was still good enough for seti work.

Rising eleccy prices means I usually only use 1 laptop nowadays. With other machines joining in occasionally. I have not found many aliens yet.
71) Message boards : Number crunching : New work Discussion (Message 59738)
Posted 8 Mar 2019 by nairb
Post:
It seems that all 3 w/u have failed with :Segment violation.

I have one left to do but may as well abort it. 3 out of 3 fail is not encouraging for the 4th w/u which is a 789 I think.
72) Message boards : Number crunching : New work Discussion (Message 59734)
Posted 8 Mar 2019 by nairb
Post:
[Nairb wrote]... My question is, are these models restartable. In other words if I get 25 days into a model and there is a power cut...

The model saves intermediate files as it runs - "checkpoint" files - and these files should allow the model to continue after a PC restart. Sometimes the models won't restart from the checkpoint file and will fail, but usually the models are fine.

Right. No worries.


One of the w/u has crashed with
Signal 11 received: Segment violation.
It did manage 3 days before having a fit. No restart during that time.... Wonder how the others will do!!
Edit: 2nd one failed with "Signal 11 received: Segment violation"... maybe a memory issue??
73) Message boards : Number crunching : New work Discussion (Message 59713)
Posted 4 Mar 2019 by nairb
Post:
Just managed to get 4 new tasks. They are the Wah2_safr50-... Its says they have a runtime of approx 30days. My question is, are these models restartable. In other words if I get 25 days into a model and there is a power cut... And I lose the 4 models its a waste of effort.

ta
Nairb
74) Message boards : Number crunching : Credits (Message 59632)
Posted 14 Feb 2019 by nairb
Post:
yup, lots of credits.... nice to see
75) Message boards : Number crunching : Batch 783 (Message 59522)
Posted 27 Jan 2019 by nairb
Post:
Well success at last. These 783's do indeed work better..... no credit for nearly 2 weeks but the uploads work fine and the model completes which is progress.
76) Message boards : Number crunching : Batch 783 (Message 59494)
Posted 22 Jan 2019 by nairb
Post:
Yup, I had 5 of these drop thru. So just it case I have suspended 4 of them and the one thats running is 16hr and still OK. But then the 781's ran fine almost to completion.

RIP 871's
77) Message boards : Number crunching : Error while computing??? (Message 59473)
Posted 20 Jan 2019 by nairb
Post:
Well its not a huge success.... I have completed 2 w/u since rejoining. Both have crashed with
Model crashed: REPLANCA: PP HEADERS ON ANCILLARY FILE DO NOT MATCH
Both at the end of their runs.. a total of 10.5 days of processing/science wasted. Not a big deal in the world of climate prediction but not very encouraging to get more work. Time to wander off for a while I think.
78) Message boards : Number crunching : Error while computing??? (Message 59386)
Posted 10 Jan 2019 by nairb
Post:
Thanks for the info. So the zip files are the science bit. So if a w/u fails at some point and the zip files are still waiting to get uploaded then the science is lost also?.
Are partial completed w/u still of value to the project?. Its frustrating seeing 5 days of processing going to waste.... I will give it another go when the zip upload issues go away.
79) Message boards : Number crunching : Error while computing??? (Message 59379)
Posted 10 Jan 2019 by nairb
Post:
The w/u ran to 100% and then gave "computing error" with msg of Model crashed: REPLANCA: PP HEADERS ON ANCILLARY FILE DO NOT MATCH

The job had 7 trickles waiting to upload.... when it reported the end of job the trickles were aborted (They disappeared anyway).

So I guess its a loss all round. I dont seem to do to well with Climate w/u with almost a 50% fail rate.
80) Message boards : Number crunching : transient HTTP error (Message 59369)
Posted 9 Jan 2019 by nairb
Post:
Nooooooo .... still getting to 100% then fails with transient HTTP error.


Previous 20 · Next 20

©2024 climateprediction.net