climateprediction.net home page
News and Announcements

News and Announcements

Message boards : Number crunching : News and Announcements
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 11 · Next

AuthorMessage
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 42337 - Posted: 5 Jun 2011, 20:47:40 UTC

Just for completeness in this News thread.

Posted Friday June 3, by Jonathan, the projects Sysadmin:
Bad news here - there has been a water leak in the offices, which means that the electricity is to be turned off for some hours at 1 pm GMT.

So that's it for the day and the week. :(


Backups: Here
ID: 42337 · Report as offensive
Profile Thyme Lawn
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1283
Credit: 15,824,334
RAC: 0
Message 42347 - Posted: 6 Jun 2011, 17:19:51 UTC

CPDN Main Project

Jonathan posted the following update on the phpBB forum:

The firewall alterations mentioned above were made at 11:15 BST today, so downloads from manticore should be taking place now.

On a related issue, in the coming few days we hope to transfer all the downloads to a new server:
http://cpdn-downloads.oerc.ox.ac.uk/
(and yes, we have checked that it is accessible outside the firewall )

This will be taking place over the next few days, and once the transfers are complete we hope to relieve uploader1, climatepreciction.net and manticore of their download responsibilities. These servers, (together with climateapps2) will ultimately be redirecting clients to the above server to manage all download requests.

We will update people as to the progress as and when it happens.

Some users might find that downloads continue to fail until they restart BOINC (that was certainly the case for me).
"The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer
ID: 42347 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 42357 - Posted: 7 Jun 2011, 22:00:11 UTC

Two new batches of the hadcm3N were created in the last few days.
One of these was a control set, the other included forcings.
One of the forcings was a bit overly enthusiastic, and has/is causing models to fail after a few seconds.

The project people know about this, and there's no need to report these failures.

More thumb twiddling time. :)


Backups: Here
ID: 42357 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 42393 - Posted: 10 Jun 2011, 21:06:05 UTC

A new batch of 3,000 RAPID-RAPIT models are currently being created and released.

This is for a short term project, and the results are needed soonest.
The models are 40 years long, (if they complete :) ).
If your computer hasn't run a long climate model before, BOINC may be shocked into 'panic mode'. Don't worry; the time-to-completion will drop fairly fast as BOINC learns.
DON'T abort one of them just because you think that it'll take too long. Just keep calm and carry on. Or, at least, let BOINC carry on. :)

Some that I picked up overnight are saying 970-980 hours. This is about 40 days, which is about right for my machines for these long models.


Backups: Here
ID: 42393 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 42394 - Posted: 10 Jun 2011, 21:06:37 UTC
Last modified: 11 Jun 2011, 20:34:12 UTC

A new problem has appeared for the latest batch of hadcm3N models, which may fail at about 13 hours in. :(
More details when details are known.

No need to start posting about it. :)
ID: 42394 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 42432 - Posted: 20 Jun 2011, 23:06:24 UTC

The next phase of the RAPIT/RAPID models are now being auto-generated from the completed models from the first phase.

Note 1: They're being grabbed as soon as they appear, and the one hour back off still applies to prevent a few computers from hogging all of the work.
Note 2: Attempts to use the Update button to speed things up will have the opposite effect - the timer will be reset to 1 hour (+/-), and you'll be back to square one.

************************

More of the Regional models will be available Real Soon Now, for people who prefer shorter models.



Backups: Here
ID: 42432 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 42460 - Posted: 24 Jun 2011, 22:00:48 UTC

To make it a bit more "official", I'm re-posting part of one of my posts from another thread:

It must also be kept in mind what was probably posted a long time ago now - There were only about 3,000 models issued for the spinup part of the hadcm3n series. And each subsequent batch will be the next phase, automatically generated from those in the first batch that completed.
Some of them will have been aborted, some abandoned, and some will still be running on computers with low resources for this project.
So the number issued for the 2nd batch will be less than 3,000. And the project's front page says that there are currently 34,812 active computers. Down a fair bit from the 40,000+ of a few weeks ago, but still a less than 1 in 10 chance of getting a hadcm3n.

There'll be a better chance when the next lot of regional models get released, but that won't happen until after the ongoing problems with the RAPIT lot is sorted out.

Basically, there's little to no work from this project at the moment.


Backups: Here
ID: 42460 · Report as offensive
Profile Thyme Lawn
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1283
Credit: 15,824,334
RAC: 0
Message 42497 - Posted: 29 Jun 2011, 20:09:45 UTC
Last modified: 29 Jun 2011, 20:21:22 UTC

CPDN Main Project

After a number of false starts with faulty greenhouse gas (GHG) forcing parameters the second phase of the RAPIT project has now started in earnest.

HadCM3N tasks have names in the format hadcm3n_{umid}_{start year}_40_* (where 40 is the number of model years the task runs for).

{umid} is a 4 character universal model identity, with the first character being the main indicator of the type of model being run as follows:

  • 'o', 'p' and 'q' are control models with no GHG forcing. These should continue through the resubmission processes at 1940 and 1980 (and beyond if the scientists decide that's required) with no problems.

  • at the first resubmission (hadcm3n_{umid}_1940_40_*) each successful control should, in addition, spawn a new series of workunits with a range of GHG forcing parameters. This has been the problem area, with the first character of the {umid} set as follows:

    • 'r' - a large batch of workunits which all fail at the end of the first model year before a trickle is generated. Most of these have been completed but if you have a task with the name format hadcm3n_rXXX_1940_40_ it should be aborted.

    • 's' - another large batch, this time failing at the end of the 10th model year, just before the trickle and upload file are generated. If you have a task with the name format hadcm3n_sXXX_1940_40_ it should be aborted. The workunits for these tasks have been cancelled on the server to prevent reissues.

    • 'b' to 'i' - small test batches which will fail at the end of the first or 10th model year. These should be aborted.

    • 'j' - a small test batch with GHG forcing brought forward from 1950 to 1941. These tasks should complete as long as the climate doesn't go wild.

    • 'k' - a small stress test batch with highly variable GHG forcing. These tasks should also complete as long as the climate doesn't go wild.

    • 't' - this is the large batch of work currently being generated from the 'o' series controls which have completed the first phase. Tasks with the name format hadcm3n_tXXX_1940_40_ should complete as long as the climate doesn't go wild.


A small number of hadcm3n_tXXX_1980_40_ workunits have been generated from 'o' series tasks which have already completed the second phase. This shouldn't have happened (the _tXXX_1980_ batch should be a continuation of the _tXXX_1940_ batch). These have been cancelled on the server to prevent reissues and should be aborted.

NOTE: tasks with the name format hadcm3n_tXXX_1980_40_ will start appearing again in 2 or 3 weeks and will be from genuine resubmission workunits. These should not be aborted.


"The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer
ID: 42497 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 42508 - Posted: 30 Jun 2011, 18:44:28 UTC

Lots of regional models (EU at present), now available, with more to come.


Backups: Here
ID: 42508 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 42539 - Posted: 3 Jul 2011, 21:17:14 UTC

The recently released regional models were of two types:

Some new ones, which are failing for reasons as yet unknown, and
Some auto-regen models, which are apparently running OK. This is confirmed by a few reports.


Backups: Here
ID: 42539 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 42555 - Posted: 4 Jul 2011, 19:41:36 UTC - in response to Message 42539.  

A new batch of regional models were created over the weekend, all of them regens from previously completed work, so they should be OK.

Backups: Here
ID: 42555 · Report as offensive
Profile Thyme Lawn
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1283
Credit: 15,824,334
RAC: 0
Message 42583 - Posted: 5 Jul 2011, 17:02:13 UTC

Mac users on CPDN Main and Beta Projects

We have evidence from a number of users that all CPDN applications can start failing immediately after an upgrade to BOINC 6.12.26. This appears to be related to a permissions change which makes it impossible for the controller process to launch the worker. Resetting the project (or detaching and reattaching) should fix the problem.

See here for further details.
"The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer
ID: 42583 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 42621 - Posted: 15 Jul 2011, 21:04:08 UTC

Climateapps2 has been (mostly) restored, with an upgrade to both the OS and the BOINC server software. It was complicated by the server being located in a room of a different department, and needing the IT person from there to do the work.

Hopefully things will be more stable for a while. :)


Backups: Here
ID: 42621 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 42653 - Posted: 22 Jul 2011, 22:00:22 UTC

climateapps2, (which hosts this board, members pages, etc), is still experiencing some problems, hopefully only caused by excessive load from computers wanting to chat.
There is a message on our other, php board, about the recent problems, so I'm posting this link to it.


Backups: Here
ID: 42653 · Report as offensive
Profile Ananas
Volunteer moderator

Send message
Joined: 31 Oct 04
Posts: 336
Credit: 3,316,482
RAC: 0
Message 42657 - Posted: 23 Jul 2011, 19:58:54 UTC
Last modified: 23 Jul 2011, 22:17:25 UTC

The BOINC server IP seems to have changed, if you still experience problems connecting to the scheduler, you might need to restart your BOINC client.

Allem Anschein nach hat sich die IP des BOINC-Servers geaendert, wenn weiterhin Probleme beim Schedulerkontakt auftreten, muesst Ihr evtl. Euren BOINC-Client einmal durchstarten.


edit :

This fixes only the "Couldn't connect to server" error, server side errors cannot be fixed that easily.

Das behebt nur den Fehler "Couldn't connect to server", Fehler auf Serverseite kann man leider nicht so leicht beheben.
ID: 42657 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 42734 - Posted: 2 Aug 2011, 10:42:07 UTC
Last modified: 2 Aug 2011, 10:46:49 UTC

The recent 2 batches of regional models, (pnw and then eu), have a problem with trying to return the final file, zip13.
This is the restart file, and is intended to go to climateapps1.oucs. This is one of the servers that was moved from the oerc department to the oucs department. Redirects were put in place for the present, but the one to climateaps1 isn't working.

This is resulting in HTTP errors when attempting to upload zip13 files.

To fix this, an edit of client_state.xml is necessary.
========================

Suspend BOINC in the manager, and then exit from both the manager and the client parts.
With a plain text editor, e.g. Notepad, open client_state.xml.
Locate the uploader section of the pnw/eu model.
Keep "locating" until you reach the one for zip13.

It's in the <file_info> section for each model, just after <upload_when_present/>.
DON'T touch the second one! It's in a signed (security) section!

Change the 4 characters in the string uploader.oerc.ox.ac.uk from oerc to oucs

Do this for all pnw/eu models THAT ARE RUNNING.

Save the file
Restart BOINC.
The files should now upload. (Been there, done that, as they say. )
========================

Don't bother with pnw models that haven't started yet. They have another problem, and should be aborted. (They're going to be regenerated.)
(The project may use a Killer trickle on these unstarted models.)

Also abort them if they've been started, but you haven't applied the fpops fix to them.

This is getting complicated, so feel free to post. The big problem is the people who don't read either of the boards.

Backups: Here
ID: 42734 · Report as offensive
Profile Thyme Lawn
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1283
Credit: 15,824,334
RAC: 0
Message 42749 - Posted: 3 Aug 2011, 14:12:58 UTC - in response to Message 42734.  

Change the 4 characters in the string uploader.oerc.ox.ac.uk from oerc to oucs

A DNS redirection is now in place and the final upload for HadAM3P regional models (the *_13.zip file) should now work without this change.

If you continue to have problems uploading the final upload file please let us know.
"The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer
ID: 42749 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 42756 - Posted: 5 Aug 2011, 7:12:58 UTC

I've just finished uploading the last file (zip13), for an eu model, so the DNS redirect is working.


Backups: Here
ID: 42756 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 42828 - Posted: 28 Aug 2011, 8:20:29 UTC

Posts are starting to appear asking about the next lot of work, so this may be best answered as a news item.

At present there are only 2 groups of researchers, the RAPIT group, and the dual resolution (regional models) group.
The original models for both groups were created and sent out ages ago, and, as they get returned, the next step in the series is automatically created by an 'auto-regen' program.
Only those models that make it all the way to the end will be continued to the next stage. Those that become unstable and fail, and those that are abandoned or aborted won't go any further.

Occasionally, the researchers may ask for more new series to be created, if it looks like there won't be enough making it to form a good sample.

With only a few thousand models, and 35,000+ computers connected, you just need to be patient if you're after work.


Backups: Here
ID: 42828 · Report as offensive
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 43107 - Posted: 30 Sep 2011, 11:25:15 UTC

Some model files have recently failed to upload when the CPDN upload server rejected them because of an invalid signature. This has happened since Boinc changed the procedure regarding the way files are (or are not) signed. CPDN complied with the requested change but did not realise at first that the change applies to every upload server.

All files from all model types should now be accepted by all the upload servers.


Cpdn news
ID: 43107 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 11 · Next

Message boards : Number crunching : News and Announcements

©2024 climateprediction.net