climateprediction.net home page
Server out of space

Server out of space

Message boards : Number crunching : Server out of space
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
darkpella

Send message
Joined: 11 Sep 05
Posts: 5
Credit: 880,340
RAC: 0
Message 37105 - Posted: 8 Jun 2009, 12:56:34 UTC

Hi,

since some hours I can't upload the latest result my host just finished crunching since it looks like there's not enough space on server's HD.

Below transcript of Boinc messages.

Bye

darkpella
08/06/2009 14.53.51 climateprediction.net [file_xfer] Started upload of file hadam3p_n7yu_1997_2_006158816_4_2.zip
08/06/2009 14.53.53 climateprediction.net [error] Error on file upload: Server is out of disk space
08/06/2009 14.53.53 climateprediction.net [file_xfer] Temporarily failed upload of hadam3p_n7yu_1997_2_006158816_4_2.zip: transient upload error
08/06/2009 14.53.53 climateprediction.net Backing off 16 min 22 sec on upload of file hadam3p_n7yu_1997_2_006158816_4_2.zip

ID: 37105 · Report as offensive     Reply Quote
old_user294426

Send message
Joined: 20 Feb 06
Posts: 158
Credit: 1,251,176
RAC: 0
Message 37107 - Posted: 8 Jun 2009, 14:49:57 UTC

See "Unable to upload 2 zip files" for details
ID: 37107 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 37108 - Posted: 8 Jun 2009, 15:03:58 UTC

Or read the News and Announcements thread at the top of Number Crunching
This, like all threads, can be subscribed to, but you do need to make sure that you have email messages ON.


Backups: Here
ID: 37108 · Report as offensive     Reply Quote
Profile Purple Rabbit
Avatar

Send message
Joined: 1 Sep 04
Posts: 23
Credit: 5,065,510
RAC: 2,268
Message 37110 - Posted: 8 Jun 2009, 17:02:38 UTC
Last modified: 8 Jun 2009, 17:04:49 UTC

Maybe it's time for the project to archive? I have tasks from 2004 from long defunct (or at least transmogrified) computers still showing.

If, in fact, ALL previous work is still on the servers, then perhaps a judicious moving of tasks to offline storage might be in order?

Because this is obviously NOT a new, brilliant idea, I'm curious why it hasn't been done (if it hasn't already been done) :-)
ID: 37110 · Report as offensive     Reply Quote
Profile Iain Inglis

Send message
Joined: 9 Jan 07
Posts: 467
Credit: 14,549,176
RAC: 317
Message 37111 - Posted: 8 Jun 2009, 17:26:49 UTC - in response to Message 37110.  

...I'm curious why it hasn't been done (if it hasn't already been done) :-)
... it may be because the data is online for climate scientists at the CPDN data portal. Cunningly, there is only one copy of the data: as I understand it, the portal is really an index to the data uploaded from our models. However, that does mean that old data is kept.
ID: 37111 · Report as offensive     Reply Quote
Profile Purple Rabbit
Avatar

Send message
Joined: 1 Sep 04
Posts: 23
Credit: 5,065,510
RAC: 2,268
Message 37112 - Posted: 8 Jun 2009, 18:10:13 UTC - in response to Message 37111.  
Last modified: 8 Jun 2009, 18:19:22 UTC

Thanks Iain,

</choir preaching on> There's no doubt the researchers need access to the data, but does it have to be on the dynamic servers? I assume their access rate is quite a bit less than the processors (us). OK, money, people etc. figure in to this, but Tolu just said he's buying a new server to handle the dynamic load. Perhaps the server might be better used as a researcher access server? Move the old stuff and voila, she works again! </choir preaching off>

I don't pretend to know the funding and politics of the project, but I did work for the US Government for 30 years so I DO understand bureaucracy :-) All this just seems a strange to me. Of course I haven't had to tread the academic footsteps that Tolu et. al. (if there are any) have to follow. My simple engineer's mind tells me that just adding servers will eventually fail to scale up to the load.
ID: 37112 · Report as offensive     Reply Quote
Profile Iain Inglis

Send message
Joined: 9 Jan 07
Posts: 467
Credit: 14,549,176
RAC: 317
Message 37113 - Posted: 8 Jun 2009, 18:23:39 UTC - in response to Message 37112.  
Last modified: 8 Jun 2009, 18:24:30 UTC

There's no doubt the researchers need access to the data, but does it have to be on the dynamic servers? I assume their access rate is quite a bit less than the processors (us)...
... indeed, there is another problem that hits us from time to time, which is that when some researcher develops an interest in the data, 'our' servers get well and truly thrashed.

Not ideal, but if things were ideal then climate research would be the backwater it once was ...
ID: 37113 · Report as offensive     Reply Quote
Profile Purple Rabbit
Avatar

Send message
Joined: 1 Sep 04
Posts: 23
Credit: 5,065,510
RAC: 2,268
Message 37114 - Posted: 8 Jun 2009, 19:48:38 UTC
Last modified: 8 Jun 2009, 19:55:07 UTC

Thanks again Iain, but...

Yes, I agree, but can we make it better? It's worth a try. Tolu...Milo? How off base am I?

Rick
ID: 37114 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 37115 - Posted: 8 Jun 2009, 20:00:50 UTC

Strangely enough, the project people do know about these ideas. They've been discussed privately with some of the moderators.

If only they could get rid of the alligators, so they could concentrate on getting rid of the malaria carrying mosquitoes. Then they would be able to ..., erm, ..., what was it again? It's been so long that I forget what the original job was.

ID: 37115 · Report as offensive     Reply Quote
Profile Purple Rabbit
Avatar

Send message
Joined: 1 Sep 04
Posts: 23
Credit: 5,065,510
RAC: 2,268
Message 37116 - Posted: 8 Jun 2009, 20:14:30 UTC

Aye, yup. I understand. I knew I wasn't asking anything new (but perhaps revisiting old thoughts). I just thought that I'd prime the engine again.

Sometimes it's worth tickling the process (for whatever that may be worth).

Rick
ID: 37116 · Report as offensive     Reply Quote
Profile Milo Thurston
Volunteer moderator
Volunteer developer

Send message
Joined: 2 Mar 06
Posts: 253
Credit: 363,646
RAC: 0
Message 37133 - Posted: 10 Jun 2009, 9:17:15 UTC - in response to Message 37114.  

Yes, I agree, but can we make it better? It's worth a try. Tolu...Milo? How off base am I?


Well, at present we have a total of around 22.5 TB of data and this is increasing quite rapidly. I don't know the current rate but should be able to calculate it within a week or two. All of these data are on-line so researchers can get to them; basically this means that the big RAID arrays upon which the data sit are set up so that may be accessed via https.

There are a few options I've considered to deal with this:
1. Buy more data servers. I'm doing this at the moment but I have to deal with much bureaucracy so it is quite slow, and it takes an actual lack of space to swing things in to action. More space was requested when we started sending out hadam3p jobs but it has taken until now.
2. Archive data on tape. This has all the problems associated with (1) plus the additional one that the general view of tapes around here is that they are a very expensive nuisance and that it would be better to buy more RAID arrays. There is a central university tape backup system but they refuse to handle our data as the volume is too large and moving files around (as I'm having to do now) causes them enormous problems.
3. Persuade someone else to host the data. We are keen to have collaborating universities host upload servers, which means that the bureaucracy problems are simply off-loaded on to them.
4. Delete some of the data. This would be an easy option but no-one wants to do it because it is thought that any of our data could be potentially useful.

Carrying on with plan 1 means that there is at least some scalability, even though it's not a perfect solution.
ID: 37133 · Report as offensive     Reply Quote
Profile tullio

Send message
Joined: 6 Aug 04
Posts: 264
Credit: 965,476
RAC: 0
Message 37136 - Posted: 10 Jun 2009, 12:27:28 UTC

What about buying some external hard disks connected via Ethernet, USB or Firewire? Here is what I read on The Register:
Hard disks
Tullio
ID: 37136 · Report as offensive     Reply Quote
Profile Milo Thurston
Volunteer moderator
Volunteer developer

Send message
Joined: 2 Mar 06
Posts: 253
Credit: 363,646
RAC: 0
Message 37138 - Posted: 10 Jun 2009, 13:44:06 UTC - in response to Message 37136.  

What about buying some external hard disks connected via Ethernet, USB or Firewire? Here is what I read on The Register:
Hard disks
Tullio


IIRC they don't have any RAID capability other than 0 or 1. The former cannot be countenanced and the latter reduces the space too much. There are also difficulties (although not insurmountable) in incorporating them in the the results portal.
ID: 37138 · Report as offensive     Reply Quote
Profile Purple Rabbit
Avatar

Send message
Joined: 1 Sep 04
Posts: 23
Credit: 5,065,510
RAC: 2,268
Message 37141 - Posted: 10 Jun 2009, 17:22:39 UTC
Last modified: 10 Jun 2009, 17:42:21 UTC

Thanks for the update Milo. You're in the trenches trying to make all this work. I really appreciate that. You have been quite successful so far.

Obviously, this is not an easy problem to solve. Is there ANY way the higher-ups can look at the architecture of CPDN and plan for the future? I think I know the answer to this question though...sigh.
ID: 37141 · Report as offensive     Reply Quote
Profile Milo Thurston
Volunteer moderator
Volunteer developer

Send message
Joined: 2 Mar 06
Posts: 253
Credit: 363,646
RAC: 0
Message 37152 - Posted: 11 Jun 2009, 10:08:17 UTC - in response to Message 37141.  


Obviously, this is not an easy problem to solve. Is there ANY way the higher-ups can look at the architecture of CPDN and plan for the future? I think I know the answer to this question though...sigh.


Thanks.
There is some planning going on, one example of which is that we might get a new - and considerably more powerful - database server if all goes well (still working on this).
However, on the science side of CPDN there are lots of physicists doing lots of different projects, all concerned with their own research and each with a separate equipment grant, so it will tend to become a bit disjointed.
ID: 37152 · Report as offensive     Reply Quote
old_user560538

Send message
Joined: 15 Mar 09
Posts: 5
Credit: 187,665
RAC: 0
Message 37157 - Posted: 11 Jun 2009, 21:02:45 UTC - in response to Message 37152.  

Good June!

This present technical problem with CPDN system irritated me slightly in the beginning: "Why don't these guys take necessary steps in advance...?"

However, I soon calmed down as remembered an old story from my country. There was a certain extremely high ranking military officer, Marshal of Finland C. G. E. Mannerheim, who was the Commander-in-Chief in all Finnish wars last century. It seems, that sometime during WWII, when he was doing some travelling, his car broke down. The discussion was something, as follows.
"Driver, why didn't you take this vehicle to workshop in time?"
"Sir, do you take your watch to clocksmith before it stops?"
A long pause ensued..., then with different tone...
"No, actually, I don't..."

Regards

Pasi Karonen
Finland
ID: 37157 · Report as offensive     Reply Quote
old_user565985

Send message
Joined: 28 Apr 09
Posts: 18
Credit: 575,431
RAC: 0
Message 37159 - Posted: 11 Jun 2009, 23:16:05 UTC

Does anyone know the current status of the uploader1atm server? The power supply was supposed to be installed on Tuesday, but it is now Friday in the UK and nothing has happened so far as I am aware. I have 17 zip files now which want to go and 6 finished WU which I would like to clear out. Any fresh info woule be helpful
Cheers
Bill
ID: 37159 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 37160 - Posted: 11 Jun 2009, 23:19:18 UTC

CPDN usually hasn't got as much funding as it needs for hardware or to employ enough programmers. There have only been two programmers since Carl left in June 2007. I once asked Carl whether in an emergency he could ask the Oxford Uni computing staff to help with server problems. He just laughed and said distributed computing is so different from their normal work that they completely refuse to help. This does not surprise me.

Tolu, who develops the new models almost single-handed, once told me 'I prefer not to work at the weekend.'. This does not surprise me either!

Milo is doing everything he can to find solutions not just for the immediate crisis but for the longer term.

The moderators have been concerned for a long time about how BOINC handles files that temporarily cannot be uploaded from Transfers. This BOINC Trac ticket was opened two years ago by one of our moderators, MikeMarsUK.


Cpdn news
ID: 37160 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 37161 - Posted: 12 Jun 2009, 1:27:52 UTC

Hi Bill

I expect you have HadAM3P files to upload. The files from each of these models upload to three different servers. Strange but true.

These servers are the three that are disabled:

* uploader.oerc (disk space)
* cpdn-upload1.comlab (disk space)
* uploader1.atm (power supply)

On Thursday the power supply had not been delivered to the university store in Oxford. I do not think this server can be up and running before Monday.

But in any case we cannot upload our HadAM3P files until these three servers are all running. Milo has been moving large quantities of data to make this possible and as quickly as possible.

He said late on Friday that he has uploader.oerc working again, but I don't know when he will enable it for uploads.

He knows about the BOINC 2-week file upload deadline.

Cpdn news
ID: 37161 · Report as offensive     Reply Quote
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,100,600
RAC: 2,970
Message 37162 - Posted: 12 Jun 2009, 2:01:56 UTC

I have a question about the server problem. I just downloaded one of the Mid-Holocene models. I have suspended network activity because I understand that trickles are not being excepted. Will there be a problem if the servers are still down when the HM model reaches the end of phase 1. Does it need to upload a .zip file containing the results of the phase at this point, and will it give up after a certain amount of time if it can‘t.

ID: 37162 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : Server out of space

©2024 climateprediction.net