climateprediction.net home page
Posts by old_user52163

Posts by old_user52163

1) Message boards : Number crunching : Zero status (Message 23011)
Posted 3 Jun 2006 by old_user52163
Post:

Alinator posted this in seti/number crunching:
*****************
So what happened is the science app thought that BOINC had died so it exited (which it is designed to do). When BOINC restarted the app it saw that the status was zero, so it thought the result was done and looked for the output data file. Since in fact it wasn\'t done, it sent the log message and let the science app continue from the last checkpoint.

HTH,

Alinator
****************
Claude
2) Message boards : Number crunching : If you are subscribed to BAM and running Linux (Message 22844)
Posted 21 May 2006 by old_user52163
Post:
Message on the BouncStats home page:

2006-05-21: BOINCstats - URGENT: BAM and Linux hosts
Due to a small difference between AMS requests from Windows and Linux systems, BAM was unable to determine which project a Linux host was attach to. This problem is now solved, but this means that BAM has no record of attached projects for these systems, and BAM will order these hosts to detach from all projects.

To resolve this, you have to login to BAM, and check these hosts, and select the projects they should attach to.
3) Questions and Answers : Unix/Linux : New HADCM3L model (Message 21547)
Posted 24 Mar 2006 by old_user52163
Post:
FYI:

HADCM3L is running on a P4 - 1.6ghz, MKD 10.2 2005LE system, slowly.

I took a chance, and re-enabled \'get new work\', downloaded a dataset, and BOINC went into \'earliest deadline first\' mode. It ran thru the Seti and Einstine WU\'s cached, then started the CP WU.

D/Led 3/17, deadline of 2/27/2007, Boincview indicated that it would finish in 373 days. It has been running for 6 days, and now should finish in 376 days.

Boincview indicates that the Est. Speed = 674 MFIOps, and the trickle report shows 5.02 s/TS.

I\'m going to resist the temptation to \'micromanage\', and just let it run.

:-)

Claude
[Edit for some spelling]
4) Message boards : Number crunching : sulphur model - Linux - Signal 11 (Message 18065)
Posted 11 Dec 2005 by old_user52163
Post:
After 24 hrs processing of a sulphur model, host blnt7 again aborted wtih a signal 11.

I\'ve detached that host from CPN.

My other Linux host is still doing a slab model, and I will watch it when it finishes, as it probabily will get a sulphur model to chew on.

5) Message boards : Number crunching : sulphur model - Linux - Signal 11 (Message 17955)
Posted 9 Dec 2005 by old_user52163
Post:
A while back, [ on host blnt5 ] I had a problem with the cp task not running when it\'s run time came up. So I set the prefs to \'not keep in memory\', which \'fixed\' this problem. I was at BOINC V4.x at that time. I am now at BOINV V5.2.8 .

Following geophi\'s response, I changed the prefs to \'keep in memory\', and monitored things [ on host blnt5 ] for a few task switch cycles. I didn\'t see any repeat of the \'not running\' problem.

So I crossed my fingers, and re-attached host blnt7 to cpn, and have been monitoring things for the past several hours. I haven\'t seen any sign of the \'not running\' problem on either host, and, so far, no sign of a \'sig 11\'.

I will post to this thread if I see a repeat of the failure.

[edit] P.S. Both hosts are running MKD V10.2 LE Linux.
6) Message boards : Number crunching : sulphur model - Linux - Signal 11 (Message 17908)
Posted 8 Dec 2005 by old_user52163
Post:
I decided to attach one of my Linux systems to CP and all went well, \'til BOINC did a task switch.

[ Log clip from BoincView ]

Location Host Project Date ID Message
blnt7 blnt7 blnt7 climateprediction.net 12/8/2005 12:10:36 PM 1364 Requesting 34560 seconds of new work, and reporting 1 results
blnt7 blnt7 blnt7 climateprediction.net 12/8/2005 12:10:36 PM 1363 Reason: To fetch work
blnt7 blnt7 blnt7 climateprediction.net 12/8/2005 12:10:36 PM 1362 Sending scheduler request to http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi
blnt7 blnt7 blnt7 climateprediction.net 12/8/2005 12:09:33 PM 1361 Computation for result sulphur_eq2e_000686966_0 finished
blnt7 blnt7 blnt7 --- 12/8/2005 12:09:33 PM 1360 request_reschedule_cpus: process exited
blnt7 blnt7 blnt7 climateprediction.net 12/8/2005 12:09:33 PM 1359 Unrecoverable error for result sulphur_eq2e_000686966_0 (process got signal 11)
blnt7 blnt7 blnt7 climateprediction.net 12/8/2005 12:09:30 PM 1358 Pausing result sulphur_eq2e_000686966_0 (removed from memory)

I\'ve detached this system from CP \'til this problem can be resolved, as I don\'t want to \'waste\' cpu cycles.

:-) :-)

I have the preferences set to \'remove from memory\' because, on another Linux box, the task wouldn\'t continue to run when it was task switched in.

Is this a problem with Linux, the Linux api, or with the sulphur model?

7) Message boards : Number crunching : Cannot connect to upload1.atm.ox.ac.uk (Message 16880)
Posted 30 Oct 2005 by old_user52163
Post:
As of about one hour ago, the .zip files have begun to upload.

So someone must have kicked the box on the \'sweet\' spot. :-)

8) Message boards : Number crunching : Cannot connect to upload1.atm.ox.ac.uk (Message 16858)
Posted 29 Oct 2005 by old_user52163
Post:
Boinc V 5.2.5 on Linux.

Finished a slab data set, scheduler responded OK, but the upload of the .zip files is failing.

Recent log snippage:

Location Host Project Date ID Message
blnt5 blnt5 blnt5 climateprediction.net 10/29/2005 8:17:09 AM 712 Backing off 3 hours, 16 minutes, and 13 seconds on upload of file 13sx_200072123_1_2.zip
blnt5 blnt5 blnt5 climateprediction.net 10/29/2005 8:17:09 AM 711 Temporarily failed upload of 13sx_200072123_1_2.zip: system I/O
blnt5 blnt5 blnt5 --- 10/29/2005 8:17:08 AM 710 Couldn\'t connect to hostname [uploader1.atm.ox.ac.uk]
blnt5 blnt5 blnt5 climateprediction.net 10/29/2005 8:17:08 AM 709 Started upload of 13sx_200072123_1_2.zip

This started yesterday early AM.

I note that Microtoxic reported this on the other CP forum.

9) Message boards : Number crunching : A bad pointer? (Message 14761)
Posted 30 Jul 2005 by old_user52163
Post:
> Downloading 2 work units is a problem that I thought was supposed to be fixed
> but it still seems to crop up.
>
> Due to a need to reallocate credits to where they are supposed to go, some
> workunits were reallocated to the person who has got furthest. Unfortunately
> this gave a problem with merged host. The consequence is that results for
> merged hosts have now been transfered to your latest computer. This was
> considered to be far better than not being able to see them at all. Sorry
> about this.
> _______________________________
&gt; Visit <a> href="http://boinc-doc.net/boinc-wiki/index.php?title=Climateprediction_FAQ"&gt;BOINC
&gt; WIKI</a> for help
&gt;
&gt; And join <a href="http://www.boincsynergy.com/">BOINC Synergy</a> for all the
&gt; news in one place.
&gt;

OK, not a problem on my end. :-)

I did make an adjustment to the resource shares so that the new machine could have a chance of finishing both data sets before deadline.

However, I had to detach/reattach because of corrupted client_status.xml files recently, before adding the new machine, and some of the data sets listed are no longer on my computers.

2au7_200128448
2plh_300147763
2uzz_300154834
3vit_100202636

should be reassigned to others, if possible.

10) Message boards : Number crunching : A bad pointer? (Message 14750)
Posted 30 Jul 2005 by old_user52163
Post:
I added a new computer to my \'farm\' ( 201038 ), which d/led a data set, then d/led another, so it has 2 data sets to work on, one running and one ready to run.

It sent it\'s first trickle last night.

Looking at the computers info shows that it has \'12\' data sets, and when clicking on that entry I am presented with the list of all of my \'farms\' results.

I think that a pointer got messed up in the master database.





©2024 climateprediction.net