climateprediction.net home page
New work Discussion

New work Discussion

Message boards : Number crunching : New work Discussion
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 23 · 24 · 25 · 26 · 27 · 28 · 29 . . . 92 · Next

AuthorMessage
nairb

Send message
Joined: 3 Sep 04
Posts: 84
Credit: 4,470,980
RAC: 0
Message 59734 - Posted: 8 Mar 2019, 3:22:22 UTC - in response to Message 59726.  
Last modified: 8 Mar 2019, 4:08:16 UTC

[Nairb wrote]... My question is, are these models restartable. In other words if I get 25 days into a model and there is a power cut...

The model saves intermediate files as it runs - "checkpoint" files - and these files should allow the model to continue after a PC restart. Sometimes the models won't restart from the checkpoint file and will fail, but usually the models are fine.

Right. No worries.


One of the w/u has crashed with
Signal 11 received: Segment violation.
It did manage 3 days before having a fit. No restart during that time.... Wonder how the others will do!!
Edit: 2nd one failed with "Signal 11 received: Segment violation"... maybe a memory issue??
ID: 59734 · Report as offensive
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 3482
Credit: 10,616,461
RAC: 2,208
Message 59735 - Posted: 8 Mar 2019, 8:06:41 UTC - in response to Message 59734.  

One of the w/u has crashed with Signal 11 received: Segment violation.


There have been batches in the past where all have failed with this. Several other batches have had a small percentage with this error.I and other mods will keep an eye open to see which this is.

I don't have any of the SAFR tasks running at the moment but 43 of batch 789 have completed successfully so far.
ID: 59735 · Report as offensive
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 3482
Credit: 10,616,461
RAC: 2,208
Message 59736 - Posted: 8 Mar 2019, 11:40:41 UTC

I also see that both seem to have crashed shortly after uploading the 8th zip file. 43 of 789 have finished successfully when I last looked. but about 9% have failed. Project has been informed and will be keeping an eye on this.
ID: 59736 · Report as offensive
Jim1348

Send message
Joined: 15 Jan 06
Posts: 623
Credit: 26,741,519
RAC: 117
Message 59737 - Posted: 8 Mar 2019, 14:08:00 UTC - in response to Message 59736.  

The first three of my 789 (as well at the first of my 790) that have run on my new Ryzen 2600 machine have failed.
https://www.cpdn.org/cpdnboinc/results.php?hostid=1480861&offset=0&show_names=0&state=6&appid=

The run times look great for the remaining 789's; about 4 days 8 hours. That is running on 9 cores, with 3 cores free.
ID: 59737 · Report as offensive
nairb

Send message
Joined: 3 Sep 04
Posts: 84
Credit: 4,470,980
RAC: 0
Message 59738 - Posted: 8 Mar 2019, 14:30:57 UTC

It seems that all 3 w/u have failed with :Segment violation.

I have one left to do but may as well abort it. 3 out of 3 fail is not encouraging for the 4th w/u which is a 789 I think.
ID: 59738 · Report as offensive
Jim1348

Send message
Joined: 15 Jan 06
Posts: 623
Credit: 26,741,519
RAC: 117
Message 59739 - Posted: 8 Mar 2019, 15:28:06 UTC - in response to Message 59738.  

I would keep it going. Mine failed at 7, 8 or 9 trickles. The others have gone to 10 trickles or more by now, and might make it.
ID: 59739 · Report as offensive
nairb

Send message
Joined: 3 Sep 04
Posts: 84
Credit: 4,470,980
RAC: 0
Message 59740 - Posted: 8 Mar 2019, 16:09:46 UTC

I was finding that the Intel I5 win10 laptop was having many, many wireless dropouts while doing the w/u. Now they have all died the machine seems to be fine again. A coincidence maybe. It was fine with other climate w/u.
Once, back in the depths of time I had what was called a "farm" of some 55-60 computers doing several projects. Before the arrival of the mighty pentium4 and a pentium Pro was still good enough for seti work.

Rising eleccy prices means I usually only use 1 laptop nowadays. With other machines joining in occasionally. I have not found many aliens yet.
ID: 59740 · Report as offensive
Jim1348

Send message
Joined: 15 Jan 06
Posts: 623
Credit: 26,741,519
RAC: 117
Message 59741 - Posted: 8 Mar 2019, 16:36:12 UTC

My first 793 resulted in a download error. I may never get to find out if they run or not.
ID: 59741 · Report as offensive
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 3482
Credit: 10,616,461
RAC: 2,208
Message 59742 - Posted: 8 Mar 2019, 17:51:58 UTC

Another message to the project I think. Steven thinks 790's may fail more than 789's just because they are longer at twenty months.
ID: 59742 · Report as offensive
rbpeake

Send message
Joined: 27 Feb 08
Posts: 41
Credit: 1,402,356
RAC: 0
Message 59743 - Posted: 8 Mar 2019, 19:36:51 UTC

ID: 59743 · Report as offensive
Profile Iain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1060
Credit: 6,463,915
RAC: 0
Message 59746 - Posted: 9 Mar 2019, 10:26:30 UTC
Last modified: 9 Mar 2019, 22:55:41 UTC

And another slug of Australia and New Zealand (batch #793: 8320 x ANZ at 50 km resolution for 20 months, batch list).

[Edit: As mentioned by geophi above, this is batch #794 not #793.]
ID: 59746 · Report as offensive
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2117
Credit: 58,047,187
RAC: 22
Message 59749 - Posted: 9 Mar 2019, 21:51:48 UTC
Last modified: 9 Mar 2019, 21:54:12 UTC

I believe Iain meant batch 794 in his last post.

Also, 480, 60-month SAM25 tasks released in batch 795. These are like the long-running ones released in early November of last year. Will likely take 25+ days on the fastest PCs.
ID: 59749 · Report as offensive
Profile Alan K

Send message
Joined: 22 Feb 06
Posts: 442
Credit: 19,654,732
RAC: 5,283
Message 59752 - Posted: 9 Mar 2019, 22:52:51 UTC - in response to Message 59743.  

Also got a failure:

08-Mar-2019 18:22:40 [climateprediction.net] Computation for task wah2_eas50_e2cw_201704_18_787_011733412_0 finished
08-Mar-2019 18:22:40 [climateprediction.net] Output file wah2_eas50_e2cw_201704_18_787_011733412_0_r48652267_16.zip for task wah2_eas50_e2cw_201704_18_787_011733412_0 absent
08-Mar-2019 18:22:40 [climateprediction.net] Output file wah2_eas50_e2cw_201704_18_787_011733412_0_r48652267_17.zip for task wah2_eas50_e2cw_201704_18_787_011733412_0 absent
08-Mar-2019 18:22:40 [climateprediction.net] Output file wah2_eas50_e2cw_201704_18_787_011733412_0_r48652267_18.zip for task wah2_eas50_e2cw_201704_18_787_011733412_0 absent
08-Mar-2019 18:22:40 [climateprediction.net] Output file wah2_eas50_e2cw_201704_18_787_011733412_0_r48652267_restart.zip for task wah2_eas50_e2cw_201704_18_787_011733412_0 absent
08-Mar-2019 18:22:40 [climateprediction.net] Output file wah2_eas50_e2cw_201704_18_787_011733412_0_r48652267_out.zip for task wah2_eas50_e2cw_201704_18_787_011733412_0 absent
ID: 59752 · Report as offensive
Jim1348

Send message
Joined: 15 Jan 06
Posts: 623
Credit: 26,741,519
RAC: 117
Message 59753 - Posted: 10 Mar 2019, 0:32:37 UTC - in response to Message 59749.  

Also, 480, 60-month SAM25 tasks released in batch 795. These are like the long-running ones released in early November of last year. Will likely take 25+ days on the fastest PCs.

I presume that many people will try to run them on notebooks. It will be a disaster, or at least a waste of computing time with all the errors. They should allow us to select the ones we want to do.
ID: 59753 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7618
Credit: 24,240,330
RAC: 80
Message 59754 - Posted: 10 Mar 2019, 0:41:59 UTC

I think that a better idea is to be selective about the computers.

The Requirements page is still OK for the older models, but will be hopeless for the new ones. I'm going to start a discussion about it, starting with at least upgrading the minimum requirements.

And I think a new thread here to talk about the monster machines that are starting to show up.
ID: 59754 · Report as offensive
Jim1348

Send message
Joined: 15 Jan 06
Posts: 623
Credit: 26,741,519
RAC: 117
Message 59757 - Posted: 10 Mar 2019, 2:38:23 UTC - in response to Message 59754.  

Great idea. I think anything you do to better match the wide range of work to the wide range of computers will help. The crunchers will be happier too.
ID: 59757 · Report as offensive
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 3482
Credit: 10,616,461
RAC: 2,208
Message 59758 - Posted: 10 Mar 2019, 6:49:17 UTC - in response to Message 59749.  
Last modified: 10 Mar 2019, 14:03:56 UTC

I believe Iain meant batch 794 in his last post.


793 are the actual runs, 794 the Natural but it is too early and I can't remember exactly what that means.
ID: 59758 · Report as offensive
Profile Iain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1060
Credit: 6,463,915
RAC: 0
Message 59760 - Posted: 10 Mar 2019, 12:26:44 UTC

Small batch #796 of 38 global models at 25 km resolution for 1 month (batch list).
ID: 59760 · Report as offensive
Lockleys

Send message
Joined: 13 Jan 07
Posts: 195
Credit: 10,581,566
RAC: 0
Message 59763 - Posted: 10 Mar 2019, 14:36:02 UTC - in response to Message 59749.  

With these longer models now coming on stream, I have gone back to the old days and once more started making regular backups of BOINC_Data so that power outages, vacuum cleaner interference et al don't lose a shed load of invested time.
ID: 59763 · Report as offensive
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2117
Credit: 58,047,187
RAC: 22
Message 59764 - Posted: 10 Mar 2019, 16:34:24 UTC

Batches 797 and 798 are SAM 25km models of 24 and 13 months respectively. There are 3000+ tasks in each batch.

My i7 grabbed 2 of the batch 797 tasks and they each failed with a Signal 11 error 2 minutes into the run.
ID: 59764 · Report as offensive
Previous · 1 . . . 23 · 24 · 25 · 26 · 27 · 28 · 29 . . . 92 · Next

Message boards : Number crunching : New work Discussion

©2022 climateprediction.net