climateprediction.net home page
News and Announcements

News and Announcements

Message boards : Number crunching : News and Announcements
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · Next

AuthorMessage
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 45686 - Posted: 22 Mar 2013, 18:44:04 UTC
Last modified: 24 Mar 2013, 0:02:57 UTC

CPDN main project


I am afraid we have been forced to take the independent climateprediction.net message board (the phpbb forum) offline for investigation and maintenance. On the evening of Wednesday 20th March a hidden iframe redirect was found on a number of pages on that message board. We are currently looking into this security issue. The main portion of the CPDN website is also hosted on this server, and so this portion of the website is also offline. We hope to resolve this issue soon and restore normal services.

This problem does not affect the availability or download of climate models and the upload servers are available as usual.

The CPDN Team
Cpdn news
ID: 45686 · Report as offensive
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 45711 - Posted: 24 Mar 2013, 0:16:50 UTC

CPDN main project

The problem explained in the last post which has made it necessary to shut down some climateprediction.net server programs has caused an additional problem which has just come to light.

It is possible for new members to join the project, but at the moment it is not possible to attach computers to climateprediction.net. If you cannot attach your computer you cannot to download new climate models for the time being.

If you are affected by this problem you can attach to other projects to keep your computer busy. In BOINC Manager in the Tools menu select Attach to project and then choose a project.
Cpdn news
ID: 45711 · Report as offensive
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 45731 - Posted: 26 Mar 2013, 2:14:23 UTC

CPDN main project

It is again possible to attach computers to the project and download work when it is available. Thank you, Jonathan!

Reminder: you can subscribe to this thread by pressing the button at the top and receive an email whenever a new notification is posted.
Cpdn news
ID: 45731 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 45864 - Posted: 9 Apr 2013, 22:17:35 UTC

The recent batch of hadcm3n models (issued around 5 April), has a problem which causes FORTRAN errors, as discussed in a Windows thread.
This is not the same as the FORTRAN errors that have occurred intermittently over the years. This one is fatal.

Those on Macs and Linux will self abort shortly after starting, but on Windows systems, they just sit there not running. And will continue to do so until they time out.

This means that they'll never return a trickle_up file, so the server can't distinguish between them and successfully running models to send a "Killer trickle". e.g. I have 2 models from a December batch which are around 85% complete. Other people will also have some that are OK.

That means that on Windows, it has to be diy. :)

There's a few ways to check between good and bad:
1) No trickle_up files returned. But this also depends on the mix of projects being run.
2) From the date near the top of the model's page, and also near the top of the workunit page.
3) No progress, either in the BOINC manager window, or on the Show graphics window.

The project apologises for the problem.


ID: 45864 · Report as offensive
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 45940 - Posted: 16 Apr 2013, 0:50:41 UTC

Upload server uploader1.atm is down but will probably be up again soon. Very few model files upload to this server at the moment so not many CPDN members will be affected by the outage.
Cpdn news
ID: 45940 · Report as offensive
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 45951 - Posted: 16 Apr 2013, 23:16:52 UTC

All the servers are now running.
Cpdn news
ID: 45951 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 46518 - Posted: 25 Jun 2013, 23:09:23 UTC

As part of the ongoing problems with attaching here, it's becoming obvious that there's a very important file on everyone's computer.

This is: account_climateprediction.net.xml
It has the information that the project needs to identify you.

Please make a copy (or several), and keep it in a safe place for when hardware problems occur.

Keeping the equivalent for ALL of your projects is a good idea for the same reason.


ID: 46518 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 46636 - Posted: 18 Jul 2013, 20:17:47 UTC

The latest unplanned outage was caused by a hard disk failure in climateapps2, which is the BOINC server as well as an upload server.

This disk was replaced and the raid system slowly rebuilt itself.
Then the 2nd disk was replaced, which then needed another rebuild.

The server is still trying to re-sync itself, but is being hammered by all of the computers pushing and shoving in their attempts to get their data back to the project, and to try for more work.

And it seems that there is a new problem(s), resulting in lines of error messages appearing on various/ several/all pages.
This has been reported.


ID: 46636 · Report as offensive
Profile old_user651284

Send message
Joined: 28 Mar 11
Posts: 35
Credit: 82,588
RAC: 0
Message 46696 - Posted: 24 Jul 2013, 14:46:18 UTC

Planned downtime: 2 - 5 August 2013

The server room in which CPDN resides is due to be shut down for electrical
testing on 2 August 2013. The testing will take place over the weekend, and CPDN servers will be brought back online on 5 August 2013 (assuming everything goes to plan).

There will be NO CPDN service during this time from any Oxford machines:

ClimatePrediction.net
Climateapps2.oerc.ox.ac.uk [this forum]
cpdn-upload2.oerc.ox.ac.uk
charybdis.oerc.ox.ac.uk
cpdntrickle.oerc.ox.ac.uk
uploader.oerc.ox.ac.uk
uploader1.atm.ox.ac.uk
cpdnbeta.oerc.ox.ac.uk
trillionthtonne.org
seacourt.oerc.ox.ac.uk
glaaki.oerc.ox.ac.uk
kraken.oerc.ox.ac.uk
wyrm.oerc.ox.ac.uk

ID: 46696 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 46729 - Posted: 29 Jul 2013, 21:52:32 UTC

Climateapps2, the BOINC server, is still having problems.
It's very old, and is having difficulty remembering to keep all of it's volumes mounted.

Plans are under way to retire it, but in the meantime, it will help to keep it up and running if people minimise their poking and prodding. e.g. constantly looking at the server status for work, trying to upload data, (if it's possible, keep the Network set to off), etc.

ID: 46729 · Report as offensive
Profile astroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 46735 - Posted: 12 Aug 2013, 18:59:16 UTC

Problems are larger than initially thought. Currently, Climateapps2 isn't processing Trickles nor requests to check Server Status.

Andy is working on it.

"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 46735 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 46759 - Posted: 13 Aug 2013, 22:31:03 UTC

And Andy knows about the "key file" problem as well.

ID: 46759 · Report as offensive
Profile astroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 46781 - Posted: 15 Aug 2013, 18:32:32 UTC

1} Climateapps2 is up and running, and known major bugs were squashed. (Andy has been busy -- he's watching the store alone these days because Jonathan is away.)

2) Andy wrote that he has requests from three scientists to generate more work. He'll get to it as soon as he can but we don't have a timeline yet. (He's busy cleaning-up after a long stint in 'firefighting' mode.)

"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 46781 · Report as offensive
Profile astroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 46846 - Posted: 22 Aug 2013, 14:52:08 UTC

Main database server power-supply failed. Attempts to Trickle-up, for example, return: "Server error: feeder not running".

Jonathan is working on it.

"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 46846 · Report as offensive
Profile MikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 46955 - Posted: 3 Sep 2013, 10:10:11 UTC
Last modified: 3 Sep 2013, 10:17:25 UTC



The credit generation system is still not working after the climateapps2 server rebuild, the administrators are aware & have been investigating for quite some time. Once it is resolved, everyone will get the outstanding credit for work done since the original server failed (30th July).

I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 46955 · Report as offensive
Profile MikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 47088 - Posted: 18 Sep 2013, 6:51:20 UTC
Last modified: 18 Sep 2013, 15:37:23 UTC


The credit system is running now- models are being marked with credit based on trickles, and all work since the old server crashed looks like it has been credited. The export process will send this to external statistics sites within the next day or so.


  • There is an anomaly with Beta-project credits, which is being looked at (EDIT - appears to be resolved now).
  • Duplicated trickles are appearing for a few models. This is being looked at.


I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 47088 · Report as offensive
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2168
Credit: 64,548,452
RAC: 6,748
Message 47126 - Posted: 19 Sep 2013, 20:45:30 UTC

You may, or may not, have noticed that the number of RAPIT (hadm3n) tasks available from the download server decreased considerably today. Andy B gave the following as a reason for this decrease:

"In case there is a query on the boards: I have been asked to pause the current workunits in the queue in and put out another batch of workunits, the scientists want this other batch of workunits computed before the current workunits in the queue, so you will shortly see a drop of the queue to 2200."
ID: 47126 · Report as offensive
Profile Thyme Lawn
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1283
Credit: 15,824,334
RAC: 0
Message 47154 - Posted: 23 Sep 2013, 10:03:05 UTC

The CPDN BOINC webpages are back up and trickles are being accepted again.

Jonathan says the 403 access forbidden problem was caused by a failure to mount the project's NFS partitions after an unexpected reboot at 0300 UTC yesterday (Sunday 22 September):

The servers running on our VM infrastructure seem to have rebooted at 4 am BST on 22 Sept.

The main webserver failed to re-mount its NFS partitions, upon which the website resides.

This is now fixed, and I am making investigations.

"The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer
ID: 47154 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 47497 - Posted: 7 Nov 2013, 19:37:27 UTC

The zips for hadcm3n (RAPID) models upload to a server external to Oxford Uni.
This is currently out of space.
The relevant people have been asked to urgently increase storage space.


ID: 47497 · Report as offensive
Profile astroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 47722 - Posted: 4 Dec 2013, 17:18:02 UTC

Message from staff:

Hi All,

The CPDN website is down at the moment, the cause is the machine that hosts it is 100% full. I am currently running a query to work out where the space is being taken up, however this query is fairly slow, so I don't expect it to be up until tomorrow.

Andy

"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 47722 · Report as offensive
Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · Next

Message boards : Number crunching : News and Announcements

©2024 climateprediction.net