climateprediction.net home page
computer is overcommitted...

computer is overcommitted...

Message boards : climateprediction.net Science : computer is overcommitted...
Message board moderation

To post messages, you must log in.

AuthorMessage
old_user120137

Send message
Joined: 28 Nov 05
Posts: 8
Credit: 3,293
RAC: 0
Message 25488 - Posted: 6 Dec 2006, 12:11:58 UTC

What is this overcommitted meaning ? What is the cause and what are the consequences ?

12/6/2006 12:59:32 |climateprediction.net|Finished download of 1081_ocean.year.gz
12/6/2006 12:59:32 |climateprediction.net|Throughput 412838 bytes/sec
12/6/2006 12:59:33 ||request_reschedule_cpus: files downloaded
12/6/2006 12:59:33 ||Suspending work fetch because computer is overcommitted.
12/6/2006 12:59:33 ||Using earliest-deadline-first scheduling because computer is overcommitted.
12/6/2006 12:59:33 |climateprediction.net|Starting result hadcm3ohc_110s_05596339_0 using hadcm3 version 515
12/6/2006 13:01:12 ||request_reschedule_cpus: project op
ID: 25488 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 25491 - Posted: 6 Dec 2006, 14:43:11 UTC

This is the \'official\' explanation.

ID: 25491 · Report as offensive     Reply Quote
old_user120137

Send message
Joined: 28 Nov 05
Posts: 8
Credit: 3,293
RAC: 0
Message 25492 - Posted: 6 Dec 2006, 15:34:28 UTC

I somehow appear to be too stupid. Do I have more than one workload and try to get an additional ? If so, why am I not told how many in excess I have ? Till now it appeared that climateprediction refused to download for empty diskspace(resolved), it didn\'t start for seti currently running(resolved), so I pressed the update button multiple times. The messages just weren\'t clear.

12/5/2006 10:40:21 |climateprediction.net|Sending scheduler request to http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi
12/5/2006 10:40:21 |climateprediction.net|Reason: To fetch work
12/5/2006 10:40:21 |climateprediction.net|Requesting 8640 seconds of new work, and reporting 1 results
12/5/2006 10:40:26 |climateprediction.net|Scheduler request to http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi succeeded
12/5/2006 10:40:26 |climateprediction.net|Message from server: No work sent
12/5/2006 10:40:26 |climateprediction.net|Message from server: (reached daily quota of 1 results)
12/5/2006 10:40:26 |climateprediction.net|No work from project
ID: 25492 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 25494 - Posted: 6 Dec 2006, 19:00:30 UTC

Your models keep crashing, so you should only have one in the Tasks tab.
This is your account page with the list of models that you have been sent over the past year.

The reason that you have been having disk space problems, is that when a model crashs, the remanents remain, (in folders with the model names), under projects\\climateprediction.net
All of these can be deleted, except for any that you still have listed in the Tasks tab.

There are 4 README files here, with hints and tips. Start with the one on Crashes and other problems, which may help you to find out why your computer keeps crashing the models.

ID: 25494 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 25496 - Posted: 6 Dec 2006, 20:08:11 UTC

In the Running the model README, I recommend reading the top tips. And in the Crashes README mentioned by Les, I suggest items #1 by Les, #5 by Mike and #6 by Thyme Lawn.
Cpdn news
ID: 25496 · Report as offensive     Reply Quote
old_user120137

Send message
Joined: 28 Nov 05
Posts: 8
Credit: 3,293
RAC: 0
Message 25516 - Posted: 8 Dec 2006, 8:38:42 UTC
Last modified: 8 Dec 2006, 8:55:45 UTC

Thanks Les and mo.v for your explanations. Apparently there is something very wrong. Why wasn\'t I (and am still not) able to find the \"my results\" page by myself ? There are tons of menu items to the left, but none with this vital information. I wasn\'t able to detect that my computer did spend thousends of hours but was never able to return a result. Client error (unspecified) is not very helpful. Sorry. I was having a regular look at the messages in the Boinc controller and they appeared cryptic but otherwise suggested everything was in order. It was never communicated that the boinc processes do have to be shut down manually. In addition to that I almost never shut the machine down. Every evening I hybernate the machine to the next morning, meaning Windows throws the lot from the memory onto the disk and reverses this process when it is de-hybernated. The only software that really balks at this are USB drivers to peripherals that were unplugged in the hybernated state, which is intuitively understandable. The kick to windows for unstability as seen in some of the readme texts is not justified. I only reboot when truely forced by some rouge software that insists. Which happens very seldom, in the order of months.
As side note, I don\'t run the visual part with the rotating globe. No time to follow it. I\'m doing electromagnetic field simulations and the climate is allowed to fill the unused cpu time. By the way, there are other ways to have multiple applications communicate than through possibly otherwise used ports.
Back to the start of this thread. How do I solve or remove these unspecified client (some : -161) errors ? There is no competing anti-virus software.

Thanks again for your time

Rene
ID: 25516 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 25517 - Posted: 8 Dec 2006, 9:16:27 UTC

OK, your results page.

There are 3 ways to get there.
In the blue menu to the left of here, 9 down from the top, click on Your account.
This also appears in the vertical left hand menu in the Projects tab of the BOINC manager, when you click on the project name to select it.
Finally, you can click on your name just to the left of your posts.

One will take you to a page where you can just click on Results to get there, and the other 2 will take you to a page where you fist have to click on Computers, and then Results.

If you now click on each of the numbers for the models in the Result ID column, you\'ll be able to see the individual page for each model, and the messages associated with it.

**********

It was never communicated that the boinc processes do have to be shut down manually.

This is no different to saving a document after you write it in Windows Word or Notepad. If you don\'t, you\'ll lose it when you shut down Windows.

This requirement is in the FAQ in the blue menu to the left of here, under System Shutdown Procedure

**********

hibernate


I thought that there was advice against doing this in the FAQ as well, but I can\'t see it now.
However, BOINC doesn\'t get along to well with the hibernate function, so the program should first of all be Exited, just as with shutting down the computer.

***********

Preventing future errors.

Please read the README files as posted previously.

The ports used have been registered with the appropriate people. It\'s just that Microsoft doesn\'t care, and uses them anyway. And that part of the software is from California, and is out of the hands of all projects using BOINC.

ID: 25517 · Report as offensive     Reply Quote
old_user120137

Send message
Joined: 28 Nov 05
Posts: 8
Credit: 3,293
RAC: 0
Message 25519 - Posted: 8 Dec 2006, 12:46:06 UTC

Thanks a lot Les,
Since Boinc automatically starts after power up, it can be (and is) assumed that it also handles the shutdown. Quite unlike a Notepad or such. Windows sends shutdown messages around to every application running, and it is assumed that these messages are used to eg write final data to the disk and so on. As to hybernation. I\'ll try to shut Boinc down as first measure. But from my understanding hybernation is save for selfcontained applications. I sometimes write multithreaded software too and found no problems when the application is frozen to disk and revived. Of course communication to the external world has to be robust such that it reconnects and retries. The IP may change on dynamically assigned networks.
As to using ports. The server, in this case boing grabs a free port. It then can write its number into an accessible config file or the port can be requested by clients(climate) by a message sent to an application name (boinc) or such. The problem should be solvable with some flexibility.

Never mind. Thanks anyway.

Rene
ID: 25519 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 25542 - Posted: 10 Dec 2006, 3:15:45 UTC

Hi again

In his post on 6 Dec, Les posted a link to the 4 README posts. In the post below that, I suggested specific items in the READMEs. It would still be a good idea to go there.
Cpdn news
ID: 25542 · Report as offensive     Reply Quote
Profile MikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 25543 - Posted: 10 Dec 2006, 17:04:00 UTC
Last modified: 10 Dec 2006, 18:29:51 UTC

Hi tschaggelar,

The problem with windows shutdown is that after a fixed period (20 seconds by default) it kills the processes. If it\'s a slower PC, and other stuff is also being shut down, then sometimes Boinc hasn\'t fully written it\'s state files and exited by the time Windows kills it.

The 20s timeout can be modified by editing the system registry.

I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 25543 · Report as offensive     Reply Quote
old_user120137

Send message
Joined: 28 Nov 05
Posts: 8
Credit: 3,293
RAC: 0
Message 25552 - Posted: 11 Dec 2006, 15:09:09 UTC - in response to Message 25542.  

Hi again

In his post on 6 Dec, Les posted a link to the 4 README posts. In the post below that, I suggested specific items in the READMEs. It would still be a good idea to go there.


Thanks mo.v,
hints do not get more helpful by multiple indirectly refering to them, nor do vanihed references reappear. I apparently had client errors where the upload went wrong. There is no solution, nor a manual upload. IMO, the climate package needs some refining. The upload should work. The package should retry or tell the user what to do in the boinc message list. This error -161 is bull****. Give me a button for manual upload if it needs to be.

rene
ID: 25552 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 25562 - Posted: 11 Dec 2006, 19:49:48 UTC

The 161 error means that there are no files to upload, so a manual upload button would be useless.

One of the reasons for the missing files, is that the BOINC \'worker program\' (boinc.exe), has for some reason lost contact with the science app program, and then thinks that this is because said program has completed the work unit (WU).

So it sets the \"percentage completed\" to 100%, (visible in the gui), flags the WU as being completed in it\'s work list for that WU, then proceeds to the next stage, which is to upload the finally zips.
Which don\'t yet exist. Which results in error 161 (file not found).

However, various other messages, which will hopefully include some about the reason for the actual model crash DO get sent to the server, where you, and others, can read them.

The upload should work. The package should retry ..

It does, for up to 14 days.

ID: 25562 · Report as offensive     Reply Quote
Profile astroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 25563 - Posted: 11 Dec 2006, 20:01:44 UTC

My turn as target.

Automatic start of boinc at system startup is a user-defined option; it isn\'t a defined default. To follow your logic, the user chose the option, so control of shutdown is also in user\'s hands. To me, that makes as much sense as your assumption.

Computer programs could always be better, eh? Given what this program does, for how long, for how many OS and computer configurations, and with such a small support staff, it\'s quite amazing that these little machines have already produced many thousands of successful results using software developed to run on a Cray.

Being designed to support a range of projects, boinc necessarily has defaults. Given that CPDN is so large and uses so many files at a time, for one of those defaults it\'s on the margins of safety for a \"let the machine take care of it\" shutdown. By itself, it seems to take care of itself rather well. Usually. With other programs to be shut down as well, it gets \"iffy\".

Hang in there Rene. We have to do surprisingly little to keep these Models on track. It is, however, a bit more than nothing.

"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 25563 · Report as offensive     Reply Quote
old_user120137

Send message
Joined: 28 Nov 05
Posts: 8
Credit: 3,293
RAC: 0
Message 25652 - Posted: 18 Dec 2006, 17:14:37 UTC - in response to Message 25563.  

Sorry guys for having been too harsh. It was just the realization that having spent hundreds of hours of cpu time without any result that was a bit much.
The climate model appears not to be suited for my computer usage. Only daytime, and often just a day a week or perhaps less. That might change soon though as this computer will be moved.
ID: 25652 · Report as offensive     Reply Quote
Profile MikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 25653 - Posted: 18 Dec 2006, 17:56:03 UTC

Results are uploaded throughout the lifetime of the model, since the researchers realise it\'s hard to keep a model going for the full 160 years. A summary is uploaded every model year (AKA the \'trickle\'), and a more detailed summary at each model decade.

Every 40 model years (1960, 2000, and 2040) a full restart dump is uploaded to the servers, which theoretically allow someone else to take over from that point (\'theoretically\' because the software hasn\'t been written yet, but once it is then the project can issue partial models, for example 2000-2080).

The -161 error is very frustrating to all of us, since it hides the real cause of the error and makes it hard to fix problems. The most recent versions of the model (5.15 onwards) do sometimes give more useful information alongside the error code.
I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 25653 · Report as offensive     Reply Quote

Message boards : climateprediction.net Science : computer is overcommitted...

©2024 climateprediction.net