climateprediction.net home page
New work Discussion

New work Discussion

Message boards : Number crunching : New work Discussion
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 15 · 16 · 17 · 18 · 19 · 20 · 21 . . . 46 · Next

AuthorMessage
Thund3rb1rd

Send message
Joined: 18 Jun 05
Posts: 24
Credit: 2,500,676
RAC: 0
Message 58525 - Posted: 2 Aug 2018, 23:01:02 UTC

The climate models used here are from the UK Met Office, where they run on supercomputers, so it's unlikely that there's still any bugs after all of this time.


At the risk of being thought disagreeable, I respectfully disagree. This situation regarding numerous failing tasks is purely the result of inadequate; nay, POOR CPDN software design, aka bugs... perhaps an entire nest of them!

The entire purpose of BOINC is to enable multiple projects to be run on individual PC's, not supercomputers. Dinking around with the global settings inherent in BOINC to PERHAPS stabilize one project - i.e., CPDN - at the risk of destabilizing other BOINC-related projects - i.e., SETI, LHC, Cosmology, Milky Way, etc, etc, etc - is NOT a solution and is in fact foolhardy.

The tasks may or may not contain garbage data - if they do, then it is up to the programmers to determine what that bad data may contain and adjust the operating code to compensate, OR to adjust the code creating the tasks to edit the data more courageously.

In any event, comparing the operating system and processing software that may be running on whatever mainframe CPDN uses to the myriad operating systems being used by BOINC volunteers in a vain hope to stabilize CPDN is just simply useless. To reiterate a comment I made recently on this subject in another thread, NO ONE really understands what the problem is, let alone what a solution may be.

To mis-quote the Bard,

The fault, dear Brutus, is not in our PC's, but in CPDN, for we are underlings.

ID: 58525 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7233
Credit: 23,154,247
RAC: 85
Message 58526 - Posted: 3 Aug 2018, 0:50:23 UTC - in response to Message 58524.  

And batch 742 has been paused while thinking is in progress.

So Should I continue crunching my 742s and those in my queue?


Yes, crunch away. I am.
zips at about every 8%, 92.6+ Megs for each, about 10 days total crunching.

Sending back lots of data is a good way to help, either to find out what's wrong, or simply to return good data, if that's what the un-failed models are doing.
Only the researchers can search the vast amounts of data to see what's what.

(By "paused", they mean that downloads from this batch have been stopped.)
ID: 58526 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 2740
Credit: 3,408,488
RAC: 1,951
Message 58527 - Posted: 3 Aug 2018, 9:04:17 UTC - in response to Message 58526.  

And batch 742 has been paused while thinking is in progress.


Thinking has been done and the people at Oxford are certain the tasks that don't crash are producing worthwhile data. The sending out of these tasks has therefore resumed.

The entire purpose of BOINC is to enable multiple projects to be run on individual PC's, not supercomputers. Dinking around with the global settings inherent in BOINC to PERHAPS stabilize one project - i.e., CPDN - at the risk of destabilizing other BOINC-related projects - i.e., SETI, LHC, Cosmology, Milky Way, etc, etc, etc - is NOT a solution and is in fact foolhardy.


There are many reasons for tinkering about with the global settings of BOINC, These are mostly related to how different projects play together or how any other programs running on the computer work alongside BOINC. The settings which reduce the chances of CPDN tasks crashing are likely to reduce the chances of tasks crashing from other projects also, though crashes I personally have had on other projects with the exception of the Android platform which isn't supported by CPDN have all crashed on all the other computers they have run on and not showed the frustrating pattern or sometimes lack of pattern that crashes with CPDN show.

I would say that with regards to data being useful, past history has shown that a high percentage of crashes with the sementation violation has never rendered the data from the tasks which do complete invalid.
ID: 58527 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 2740
Credit: 3,408,488
RAC: 1,951
Message 58533 - Posted: 3 Aug 2018, 19:02:11 UTC

Batch 745 is I think about 1,000 eu25 13month tasks. Possible it may be more and not showing on the server status page yet.
ID: 58533 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7233
Credit: 23,154,247
RAC: 85
Message 58536 - Posted: 4 Aug 2018, 11:01:42 UTC

Front page says batch 745 is 20,000 (wow), but the status page says 10,000, and I think a lot of those are batch 742.

So either tasks are going fast, or 745 hasn't been fully released yet.

And the trickle program has stopped running.
I'll go and see if anyone's home.
ID: 58536 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 2740
Credit: 3,408,488
RAC: 1,951
Message 58537 - Posted: 4 Aug 2018, 13:08:29 UTC - in response to Message 58536.  

Front page says batch 745 is 20,000 (wow), but the status page says 10,000, and I think a lot of those are batch 742.


Now wondering if I misread the numbers when I said, 1,000 or if that's all there were when I looked. It wasn't up on the front page then so any misreading would have been my looking at the workunits.
ID: 58537 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 2740
Credit: 3,408,488
RAC: 1,951
Message 58553 - Posted: 6 Aug 2018, 8:51:57 UTC - in response to Message 58537.  

Batch 746: EUR25 2010-2016 with 10newPP

8,600 simulations.
(from front page.)
ID: 58553 · Report as offensive     Reply Quote
Profile Iain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1019
Credit: 5,511,676
RAC: 1,098
Message 58558 - Posted: 7 Aug 2018, 11:11:09 UTC - in response to Message 58536.  

[Les Bayliss wrote:]Front page says batch 745 is 20,000 (wow), but the status page says 10,000, and I think a lot of those are batch 742.

So either tasks are going fast, or 745 hasn't been fully released yet.

Split batch is what I'm seeing too: batch list. More to come if the front page total is right.
ID: 58558 · Report as offensive     Reply Quote
Profile Iain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1019
Credit: 5,511,676
RAC: 1,098
Message 58574 - Posted: 8 Aug 2018, 23:27:59 UTC

Some long models just added: batch #747 PNW at 25 km for 121 months (2000) and 61 months (1000) (batch list).
ID: 58574 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 2740
Credit: 3,408,488
RAC: 1,951
Message 58614 - Posted: 16 Aug 2018, 11:36:13 UTC - in response to Message 58574.  
Last modified: 16 Aug 2018, 11:38:12 UTC

Batch 748 200 Hadcm3s tasks and batch 749 420 Hadcm3s tasks And

THEY RUN ON LINUX!!!

Well start at least, mine are 5 minutes in with no problems so far.

Edit, all four have checkpointed. Will report back when they have been going a bit longer.

And they won't last long!
ID: 58614 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 381
Credit: 3,690,501
RAC: 5
Message 58620 - Posted: 16 Aug 2018, 13:55:00 UTC - in response to Message 58614.  

Batch 748 200 Hadcm3s tasks and batch 749 420 Hadcm3s tasks And

THEY RUN ON LINUX!!!


Perhaps they do, but since I get so few work units (none lately) my BOINC client now queries the server only once every three days, so unless three days worth of LINUX-worthy work units turn up, I am unlikely to get any.
ID: 58620 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 2740
Credit: 3,408,488
RAC: 1,951
Message 58621 - Posted: 16 Aug 2018, 14:08:55 UTC - in response to Message 58614.  

And server status page showing just one left now, though if as I suspect these are Linux or Linux and Mac only, some will come back in because of people without the 32bit libs installed. Mine are running at just over 3.5hours/1% Currently a bit over .6%completed.The danger point when some batches have failed is just before completion of first zip.
ID: 58621 · Report as offensive     Reply Quote
rjs5

Send message
Joined: 16 Jun 05
Posts: 12
Credit: 8,397,641
RAC: 0
Message 58622 - Posted: 16 Aug 2018, 16:19:11 UTC - in response to Message 58621.  

And server status page showing just one left now, though if as I suspect these are Linux or Linux and Mac only, some will come back in because of people without the 32bit libs installed. Mine are running at just over 3.5hours/1% Currently a bit over .6%completed.The danger point when some batches have failed is just before completion of first zip.


I got one of the Linux WU on my 64-bit Linux machine and it seems to be running fine.

There is a lot of "HOW TO run 32-bit dynamic apps on 64-bit Linux" information about making sure that a 64-bit installation has the right 32-bit libraries. Seems like a pretty easy to check to make sure the right 32-bit libraries are installed is by writing a small 32-bit test app that needs the same libraries.

Seems like the KEY would be the build. 32-bit COMPILED to "a.out" with the command line that forces the correct libraries to be present:

g++ h.cpp -m32 -lpthread -ldl -lstdc++ -lm -lgcc_s -lc -lz -lnsl



Example of any c++ program (Hello World): h.cpp
cat h.cpp

#include <iostream>
using namespace std;
int main (int argc, char** argv)
{
cout << "Hello world!" << endl;
return 0;
}



32-bit libraries I needed for my 32-bit creation to say "Hello World":

ldd a.out
linux-gate.so.1 (0xf7fcd000)
libpthread.so.0 => /lib/libpthread.so.0 (0xf7f7d000)
libdl.so.2 => /lib/libdl.so.2 (0xf7f78000)
libstdc++.so.6 => /lib/libstdc++.so.6 (0xf7df3000)
libgcc_s.so.1 => /lib/libgcc_s.so.1 (0xf7dd6000)
libz.so.1 => /lib/libz.so.1 (0xf7dbd000)
libnsl.so.1 => /lib/libnsl.so.1 (0xf7da2000)
libm.so.6 => /lib/libm.so.6 (0xf7ca8000)
libc.so.6 => /lib/libc.so.6 (0xf7b10000)
/lib/ld-linux.so.2 (0xf7fcf000)

Notice the same 32-bit libraries I needed for CPDN application:

ldd *gnu *gnu.so
hadcm3s_8.34_i686-pc-linux-gnu:
linux-gate.so.1 (0xf7fd1000)
libpthread.so.0 => /lib/libpthread.so.0 (0xf7f81000)
libdl.so.2 => /lib/libdl.so.2 (0xf7f7c000)
libstdc++.so.6 => /lib/libstdc++.so.6 (0xf7df7000)
libm.so.6 => /lib/libm.so.6 (0xf7cfd000)
libgcc_s.so.1 => /lib/libgcc_s.so.1 (0xf7ce0000)
libc.so.6 => /lib/libc.so.6 (0xf7b48000)
/lib/ld-linux.so.2 (0xf7fd3000)
hadcm3s_um_8.34_i686-pc-linux-gnu:
linux-gate.so.1 (0xf7f69000)
libdl.so.2 => /lib/libdl.so.2 (0xf7f33000)
libm.so.6 => /lib/libm.so.6 (0xf7e39000)
libpthread.so.0 => /lib/libpthread.so.0 (0xf7e1a000)
libc.so.6 => /lib/libc.so.6 (0xf7c82000)
/lib/ld-linux.so.2 (0xf7f6b000)
hadcm3s_se_8.34_i686-pc-linux-gnu.so:
linux-gate.so.1 (0xf7f2d000)
libz.so.1 => /lib/libz.so.1 (0xf7e59000)
libnsl.so.1 => /lib/libnsl.so.1 (0xf7e3e000)
libstdc++.so.6 => /lib/libstdc++.so.6 (0xf7cb9000)
libm.so.6 => /lib/libm.so.6 (0xf7bbf000)
libgcc_s.so.1 => /lib/libgcc_s.so.1 (0xf7ba2000)
libc.so.6 => /lib/libc.so.6 (0xf7a0a000)
/lib/ld-linux.so.2 (0xf7f2f000)
ID: 58622 · Report as offensive     Reply Quote
pj
Avatar

Send message
Joined: 15 Dec 12
Posts: 8
Credit: 519,068
RAC: 0
Message 58624 - Posted: 16 Aug 2018, 20:43:20 UTC

It's been quite a while, but for the first time since the melt down, my iMac has gotten three projects to run that will take 2 days and 22.5 hrs.
Keep them coming!
ID: 58624 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 2740
Credit: 3,408,488
RAC: 1,951
Message 58633 - Posted: 18 Aug 2018, 15:41:20 UTC - in response to Message 58614.  

Just noticed, the 748's are only 12 months while the 749's are 120months.
ID: 58633 · Report as offensive     Reply Quote
Profile Bill F
Avatar

Send message
Joined: 17 Jan 09
Posts: 74
Credit: 981,964
RAC: 463
Message 58640 - Posted: 25 Aug 2018, 1:16:02 UTC

Interesting before the Great Crash ... CPDN... not the Stock Market.

We had so many users active that getting a WU was a prize. We have lost so many active users that WU's are laying in the system begging to be taken.

How times change.

Bill F
Dallas TX
ID: 58640 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7233
Credit: 23,154,247
RAC: 85
Message 58641 - Posted: 25 Aug 2018, 3:13:21 UTC

Get them while you can.
Things are about to change. :)

And something is seriously wrong with your i5-5200U.
Perhaps it's still using the "training wheels" setting ?
ID: 58641 · Report as offensive     Reply Quote
Alex Plantema

Send message
Joined: 3 Sep 04
Posts: 122
Credit: 26,128,640
RAC: 658
Message 58642 - Posted: 25 Aug 2018, 10:45:27 UTC - in response to Message 58574.  

Some information in the batch list about problems with a batch would be useful, e.g. what to do with it or a link to a message about it.
ID: 58642 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 2740
Credit: 3,408,488
RAC: 1,951
Message 58643 - Posted: 25 Aug 2018, 18:37:41 UTC - in response to Message 58642.  

Some information in the batch list about problems with a batch would be useful, e.g. what to do with it or a link to a message about it.


Is there a specific batch you are having problems with? If so some of us may be able to respond with some more information.
ID: 58643 · Report as offensive     Reply Quote
Profile Alan K

Send message
Joined: 22 Feb 06
Posts: 335
Credit: 16,789,387
RAC: 1,047
Message 58644 - Posted: 25 Aug 2018, 22:49:39 UTC - in response to Message 58642.  

Certainly a lot of computing error failures with batches 738 and 742 if that helps. There is an error thread further down number crunching.
ID: 58644 · Report as offensive     Reply Quote
Previous · 1 . . . 15 · 16 · 17 · 18 · 19 · 20 · 21 . . . 46 · Next

Message boards : Number crunching : New work Discussion

©2020 climateprediction.net