climateprediction.net home page
New work Discussion

New work Discussion

Message boards : Number crunching : New work Discussion
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 25 · 26 · 27 · 28 · 29 · 30 · 31 . . . 92 · Next

AuthorMessage
Profile Alan K

Send message
Joined: 22 Feb 06
Posts: 442
Credit: 19,654,732
RAC: 5,283
Message 59815 - Posted: 14 Mar 2019, 19:17:38 UTC - in response to Message 59813.  

I have one that is just over 6% after 1 day on my 3.5Ghz i5. One on my slower i5 failed after 4 minutes - seg violation!
ID: 59815 · Report as offensive
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1151
Credit: 21,387,228
RAC: 2,568
Message 59816 - Posted: 14 Mar 2019, 19:33:17 UTC - in response to Message 59814.  
Last modified: 14 Mar 2019, 19:40:23 UTC

Thanks, will try suspending everything else to see if it speeds up. A few hours should show if their is going to be any significant speed up.
ID: 59816 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7618
Credit: 24,240,330
RAC: 80
Message 59817 - Posted: 14 Mar 2019, 19:47:08 UTC

It is thought that processing the vegetation data as well as the usual climate data may be why the models fail just as they try to start the regional model.

This adds a LOT to the hardware requirements, mostly in the memory area, which covers caches, the FPU, and the data channels between everything.

So trying to cram as model tasks onto a computer as possible may well be what is exacerbating the failures for some people.

As Clint Eastwood's character, Dirty Harry, once said: "A man's got to know his limitations".
ID: 59817 · Report as offensive
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 3482
Credit: 10,616,461
RAC: 2,208
Message 59821 - Posted: 15 Mar 2019, 9:48:52 UTC - in response to Message 59817.  

So trying to cram as model tasks onto a computer as possible may well be what is exacerbating the failures for some people.


I have certainly noticed that some tasks on my laptop (N3540 @ 2.16GHz) slow down if all four cores are crunching. When I notice this, I cut my computing down to two or three cores till the affected tasks have cleared. I would certainly say that the minimum memory should be 2GB/core these days. If things go the way of all tasks being so demanding, I will probably end up setting it to only use 75% of available CPUs.

I can however understand that needing to do this might frustrate those for whom credit is more important than it is for myself.
ID: 59821 · Report as offensive
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 656
Credit: 9,983,069
RAC: 277
Message 59822 - Posted: 15 Mar 2019, 13:56:13 UTC - in response to Message 59821.  

I would certainly say that the minimum memory should be 2GB/core these days.


Well, I have four cores and 8 GBytes of RAM. Another 8 GBytes of RAM are on order and should arrive soon. Four 2GByte modules installed and four 2 GByte modules on order. My machine could hold 512 GBytes of RAM if someone else would buy me the modules -- but that would be silly for the way I use my machine these days.

I currently have climateprediction set to Won't get new tasks because I run Linux most of the time, but am rebooting to Windows to run my Income Tax program. When that is done, I will be back to running Linux 24/7, and will start accepting climateprediction tasks again.
ID: 59822 · Report as offensive
Profile Alan K

Send message
Joined: 22 Feb 06
Posts: 442
Credit: 19,654,732
RAC: 5,283
Message 59830 - Posted: 16 Mar 2019, 23:22:59 UTC - in response to Message 59815.  

This one has now failed with seg violation at about 9% after 2 trickles and zips.
ID: 59830 · Report as offensive
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 3482
Credit: 10,616,461
RAC: 2,208
Message 59831 - Posted: 17 Mar 2019, 6:12:30 UTC - in response to Message 59830.  

This one has now failed with seg violation at about 9% after 2 trickles and zips.


Shucks, I had thought my two 797s were safe having both uploaded their first zip. I will carry on crunching with at least one core free to see what happens.
ID: 59831 · Report as offensive
Profile Iain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1060
Credit: 6,463,915
RAC: 0
Message 59843 - Posted: 19 Mar 2019, 23:47:52 UTC

Three new batches for South America:

batch #802 = 500 x SAM50/13
batch #803 = 800 x SAM50/13
batch #804 = 2200 x SAM50/24

(See batch list.)
ID: 59843 · Report as offensive
Jim1348

Send message
Joined: 15 Jan 06
Posts: 623
Credit: 26,741,519
RAC: 117
Message 59844 - Posted: 20 Mar 2019, 1:29:06 UTC - in response to Message 59804.  
Last modified: 20 Mar 2019, 1:30:23 UTC

I am not sure if it is a CPU difference or not, but all seven of the 797's have failed on my two Ryzen 2600's, but three are still going fine (after 4, 5 and 6 zips) on my i7-4771.
https://www.cpdn.org/cpdnboinc/result.php?resultid=21555978
https://www.cpdn.org/cpdnboinc/result.php?resultid=21541267
https://www.cpdn.org/cpdnboinc/result.php?resultid=21555753

For that matter, it could be an OS difference, since the Ryzen 2600's are on Win10 (1809), while the i7-4771 is on Win7. None of them are rebooted much, especially the Ryzens, which are dedicated machines, and they all run 24/7. No other CPU jobs are running either, so the CPDN work is never suspended.
ID: 59844 · Report as offensive
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 3482
Credit: 10,616,461
RAC: 2,208
Message 59845 - Posted: 20 Mar 2019, 7:09:41 UTC - in response to Message 59843.  

Three new batches for South America:

And another one!

batch #805 = 2100 x SAM50/13
ID: 59845 · Report as offensive
gchrist

Send message
Joined: 17 Jul 05
Posts: 7
Credit: 6,120,263
RAC: 0
Message 59847 - Posted: 20 Mar 2019, 22:57:49 UTC
Last modified: 20 Mar 2019, 23:50:00 UTC

I am happy to see that the new sam50 models do not give the same errors after 3-4 minutes such as the sams25 usually do on my Win10 computer. Does anybody know which things have been changed within these models?
ID: 59847 · Report as offensive
Profile astroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 1,152
Message 59848 - Posted: 20 Mar 2019, 23:08:34 UTC

I'm still waiting to see how grotesque the upload files are before all downloaded tasks are allowed to start ...
"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 59848 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7618
Credit: 24,240,330
RAC: 80
Message 59849 - Posted: 20 Mar 2019, 23:13:45 UTC

Only that the resolution of the high res regional area is half that of the previous 25K models, and it's thought that the amount of memory suddenly needed may have something to do with the previous failures.

It hasn't been discussed yet, and it's too early to guess.
There's been 6 failures so far, 5 in 802, and 1 in 803.
ID: 59849 · Report as offensive
nairb

Send message
Joined: 3 Sep 04
Posts: 84
Credit: 4,470,980
RAC: 0
Message 59850 - Posted: 21 Mar 2019, 0:05:17 UTC
Last modified: 21 Mar 2019, 0:18:15 UTC

I thought I would give it another go and have a safr50 791 & sam50 804 running for a whole 24 hrs and still not had a fit.(:Segment violation) they are at about 13%. I dont have that warm feeling of confidence.
ID: 59850 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7618
Credit: 24,240,330
RAC: 80
Message 59856 - Posted: 21 Mar 2019, 9:30:36 UTC

There are a few failures, but well below "worrying".

The project coordinator has said that the sam50s have all been run before, so should be OK.
The project people have been rather busy lately, so they haven't done much research on what was wrong with the sam25s. And they won't be run again until this is known.

Testing WILL be done soon to try and find out.
ID: 59856 · Report as offensive
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 3482
Credit: 10,616,461
RAC: 2,208
Message 59877 - Posted: 23 Mar 2019, 7:27:55 UTC

And three of the 797 batch have completed successfully now. My two are past their 4th and second zips respectively so both well past where they got to on their first attempt. Hoping that keeping at least one core free will let them finish without segfaulting.

Of the three that have finished, 2 are under win7, one win server2012. However of the first four listed as having completed for #798 three are win10 and one is win7 so my initial thoughts about it being a problem with win10 have I think gone out of the proverbial window.
ID: 59877 · Report as offensive
Jim1348

Send message
Joined: 15 Jan 06
Posts: 623
Credit: 26,741,519
RAC: 117
Message 59878 - Posted: 23 Mar 2019, 11:44:56 UTC - in response to Message 59877.  
Last modified: 23 Mar 2019, 11:49:06 UTC

I am still holding on to that theory. All seven of my 797's have failed in under four hours on Win10 (on two Ryzen 2600's), but all three of the 797's that I have run on my Win7 machine (i7-4771) are still going after at least seven days.

I think it is the OS rather than the CPU difference, from what I have seen on other machines.
ID: 59878 · Report as offensive
rbpeake

Send message
Joined: 27 Feb 08
Posts: 41
Credit: 1,402,356
RAC: 0
Message 59879 - Posted: 23 Mar 2019, 11:58:35 UTC - in response to Message 59878.  

My Win10 machines have generally been fine.
Regards,
Bob P.
ID: 59879 · Report as offensive
Jim1348

Send message
Joined: 15 Jan 06
Posts: 623
Credit: 26,741,519
RAC: 117
Message 59880 - Posted: 23 Mar 2019, 12:06:17 UTC - in response to Message 59879.  
Last modified: 23 Mar 2019, 12:15:46 UTC

I don't see that you have even run any 797's on them.
(My Win10 machines have been running fine for the most part otherwise too. But 797, 798 and 799 are problematic; maybe others too.)
ID: 59880 · Report as offensive
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 3482
Credit: 10,616,461
RAC: 2,208
Message 59886 - Posted: 23 Mar 2019, 20:40:28 UTC
Last modified: 24 Mar 2019, 8:25:06 UTC

I am still holding on to that theory.


I will have another look when there is a bit more data to go on.

I will also try and see if I can work out a way to identify machines like mine running Linux which pretend to be Windows 10 ;)

Edit:Six completed now, the new ones since yesterday are two xp and one win7 so still no 10s.
ID: 59886 · Report as offensive
Previous · 1 . . . 25 · 26 · 27 · 28 · 29 · 30 · 31 . . . 92 · Next

Message boards : Number crunching : New work Discussion

©2022 climateprediction.net