Message boards :
Number crunching :
Upload problem
Message board moderation
Previous · 1 · 2 · 3 · Next
Author | Message |
---|---|
Send message Joined: 2 Mar 06 Posts: 253 Credit: 363,646 RAC: 0 |
We have got some data on phkup and a few uploads still go there. I just logged in and there seems to be a problem with the server not having mounted our home directory; either that or they've somehow deleted all our data. I have contacted the admins for that server and hopefully they will resolve the issue. |
Send message Joined: 3 Oct 04 Posts: 39 Credit: 13,172,838 RAC: 0 |
"The project no longer uses any of the servers in Berne. This outsourcing ended after the 2005 incident." Les, this is intriguing. I think maybe I was away doing other projects in 2005 and not reading the CPDN boards, or else I'm just too old to remember. If you have the time/inclination to tell us, whatever happened in 2005? John. |
Send message Joined: 5 Aug 04 Posts: 1283 Credit: 15,824,334 RAC: 0 |
If you have the time/inclination to tell us, whatever happened in 2005? There was a serious failure of the server's RAID array closely followed by a major flooding incident in Bern. The server was inaccessible for about 3 weeks and the full story is here. "The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Les Bayliss wrote: The project no longer uses any of the servers in Berne. This outsourcing ended after the 2005 incident. Back in 2005, there was talk about replacing the server phkup in Switzerland with one in England to make physical access to it easier. Apparently this didn't happen after all, and that server is still physically in Switzerland. |
Send message Joined: 2 Mar 06 Posts: 253 Credit: 363,646 RAC: 0 |
It is indeed. I've just heard that it's up again and I'm reconfiguring it. Interestingly, the link to the 2005 problems refers to both power supplies on uploader1.atm failing. This is the same machine that failed recently, only more severely. I think it's about time to retire that hardware and any future machine appearing on that url will be new. |
Send message Joined: 31 Aug 04 Posts: 42 Credit: 15,303,437 RAC: 130,044 |
Reading that historical account of the 'Bern Floods' I thought I picked up that ya'll are using Dell stuff. Living about 10 miles from Dell you'd be surprised how many Dell server parts are laying around within a 50m radius of here. Next time you need a PSU post up the model... we might be able to locate one in a matter of hours. I'd call Steve first... about 3 miles up the hwy from me... unless he's 'cleaned up' he has an entire room in his house full of Dell stuff... mostly server stuff... I don't ask. LOL. - da shu @ HeliOS, "Free software is a matter of liberty, not price. To understand the concept, you should think of free as in free speech, not as in free beer" |
Send message Joined: 17 Jan 09 Posts: 2 Credit: 43,535 RAC: 0 |
Got a failure-to-upload problem, which seems to be de rigueur in order to crunch on this project... :p Here's the feedback: 13-Jul-09 2:31:48 AM climateprediction.net Started upload of hadam3p_n0q3_1995_2_006079789_2_2.zip 13-Jul-09 2:31:50 AM climateprediction.net [error] Error reported by file upload server: Server is out of disk space 13-Jul-09 2:31:50 AM climateprediction.net Temporarily failed upload of hadam3p_n0q3_1995_2_006079789_2_2.zip: transient upload error 13-Jul-09 2:31:50 AM climateprediction.net Backing off 27 min 9 sec on upload of hadam3p_n0q3_1995_2_006079789_2_2.zip (and later)13-Jul-09 2:59:00 AM climateprediction.net Backing off 3 hr 10 min 33 sec on upload of hadam3p_n0q3_1995_2_006079789_2_2.zip This has happened several times... there appeared to be 2 accompanying files, hadam3p_n0q3_1995_2_006079789_2_1.zip and hadam3p_n0q3_1995_2_006079789_2_3.zip, and they uploaded after a couple of aborts. However, it looks like the big file doesn't want to or can't be uploaded. What is a transient upload error, btw? Transient (<L transiens, -ntis transiting, temporarily visiting, going across, pres. part. of transire, to move across) implies temporary in nature. I do hope so :) Also, I read that there is a logjam of WU's waiting to be processed and such; how long should I expect to wait? N.B. the message about your server being out of space... what's up? Did someone go on leave and let everything kind of pile up? Last, am I in any danger of losing this WU, of it just giving up like another guy's WU did? If so, please let me know any steps I might take to avoid this. And my CPU put in 228h 2m 41s on this baby... I do hope my credit isn't in jeopardy, as I am competitive and would like to see the points :) Thanks... to a better knowledge of our climate... :) The pretty lady you see around my profile is Hayley Westenra, an angelic singer from New Zealand |
Send message Joined: 26 Apr 09 Posts: 6 Credit: 514,253 RAC: 0 |
Same error here since last night: 13/07/2009 09:00:08 Backing off 1 hr 5 min 5 sec on upload of hadam3p_mal2_1987_2_006115552_1_2.zip 13/07/2009 09:00:08 Temporarily failed upload of hadam3p_mal2_1987_2_006115552_1_2.zip: transient upload error 13/07/2009 09:00:08 [error] Error reported by file upload server: Server is out of disk space According to Server Status all lights are green. Is there a Disk space problem with the upload server (should be this one according to client_state.xml: cpdn-upload1.comlab.ox.ac.uk) ? |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
As was posted in the News on the 18th of June: I've now got a temporary server running as uploader1.atm.ox.ac.uk. It's not very good hardware and has limited space, so there may be delays in uploading With around 3 terabytes of science data being stored, things do slow down occasionally. The project people will be back at work in a couple of hours. ******************* Cesium_133 The zips can go to different servers. You need to check in client_state.xml to see the relevant upload server. Transient: passing / momentary. logjam of WU's Check the date of those posts; that problem was 3 weeks ago. Did someone go on leave and let everything kind of pile up? It's just a quaint old custom that they have in England, called The Week End. Credits are allocated per trickle, all the way through the creation of a model. And you don't lose them, even for models that fail. Last, am I in any danger of losing this WU, of it just giving up like another guy's WU did? That too, was 3 weeks ago. And there is 14 days for BOINC to keep retrying. So, as long as you don't do anything silly, such as aborting transfers or models, it will all get there in the end. The README files linked from my sig are full of useful hints and tips. |
Send message Joined: 2 Mar 06 Posts: 253 Credit: 363,646 RAC: 0 |
Sorry about this, it's more HadAM3P data coming in rapidly whilst every server I try to get is either delayed (the new one is 60% built), off-line or otherwise unavailable. I will do what I can to move data somewhere as soon as possible. |
Send message Joined: 26 Apr 09 Posts: 6 Credit: 514,253 RAC: 0 |
As was posted in the News on the 18th of June: My apologies - I thought the problem had been resolved in the end. It's just a quaint old custom that they have in England, called The Week End. ... Hey, Moriarty, they still have the "Week End" in good ol' Blighty! All the best.... |
Send message Joined: 17 Jan 09 Posts: 2 Credit: 43,535 RAC: 0 |
With around 3 terabytes of science data being stored, things do slow down occasionally. So it appears. Things are still in limbo... the last message complement reads: 14-Jul-09 2:32:14 AM climateprediction.net Started upload of hadam3p_n0q3_1995_2_006079789_2_2.zip 14-Jul-09 2:32:16 AM Project communication failed: attempting access to reference site 14-Jul-09 2:32:16 AM climateprediction.net Temporarily failed upload of hadam3p_n0q3_1995_2_006079789_2_2.zip: connect() failed 14-Jul-09 2:32:16 AM climateprediction.net Backing off 3 hr 7 min 25 sec on upload of hadam3p_n0q3_1995_2_006079789_2_2.zip 14-Jul-09 2:32:17 AM Internet access OK - project servers may be temporarily down. Also, you say you have space for 3 Tb of data. I am going to pretend I know what I'm talking about :p ... but that equates to 14 of my computers' worth of memory and space (I only have 1, lol). For a project that's the largest climatological research job in the world using DC technology, that sounds like a real paucity of space. Maybe I should contribute £ to the effort so you all can get a spare PC or something for events such as this :) Seriously... is that the answer, more space? Credits are allocated per trickle, all the way through the creation of a model. And you don't lose them, even for models that fail. I looked up trickles, and I get how they work... kudos to the BOINC Wiki. Does that mean I already have the credit, that it's reflected in the rankings I look up for myself? Or is that credit latent, credited tentatively but somehow undisplayed pending a confirmation? If I have it already, do I need to worry if the data -ever- gets uploaded?... ...(from above rhetorical question) FYI, yes, I care very much about it being uploaded. I'm crunching for credit, sure, but I do so for our common knowledge as we try to save this planet we've already f----- up enough. I've read the business about the 14-day deadline thing, and how to extend it if need be. I don't care if I see the points already posted to my name... I'm here to help you all get that WU of mine, along with the 2 others I'm now doing, to you in good order. I'm not going to let that data go down the loo-throne. If I need help finding or working with those files to extend time, I will come a-calling. Our Earth and my personal contributions toward understanding it and predicting future climactic change mean enough to me to make a serious pest of myself getting help with what I can't figure out by RTFM'ing. As for weekends here, they do exist. I actually don't obligate myself to work much on Saturdays or at all on Sundays, though I generally do. And my computer knoweth not a Sabbath... |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
The 3 terabytes is the approximate amount of returned model data stored. Actual server space is more. The new server that Milo mentioned a few posts back is the one that will replace the temporary server cpdn-upload1.comlab. The new one will be a 20 terabyte machine. Apparently this temporary machine has had 1.4 terabytes of data uploaded to it since it was used to replace the original. (Which suffered a power supply failure, followed by a raid HD failure.) The City of Oxford has many buildings associated with various Colleges of the University of Oxford, as well as other buildings housing offices of people such as this project's people. Scattered all through this are many computer server rooms, and some of these house servers used by this project. Most of these rooms are not accessible to the project people all of the time, so when something goes wrong there, it's necessary to wait until an IT person from that area becomes available. This is currently the case with one of the machines used for temporary storage. ***************** As often posted on these boards, credit is re-created daily by a program that runs once per day, just before midnight UK time. If you upload a trickle just after this program runs, it will be 24 hours before you see that credit. And Pending credit is a BOINC mechanism that isn't used here. Nor is Validation. ***************** Also note that cpdn-upload1.comlab is currently disabled due to ongoing problems, as per the News thread. |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
Hi Cesium 133 The majority or, in the case of some model types, all the data required by the scientists is contained in the zip file uploads, not in the trickles. The trickles tell the server that the model's still crunching and needs more credit for the extra progress made. But of course completed models that upload all their zip files are far more valuable for the research than unfinished ones. You will already have the credit for all or most of the trickles you've uploaded. Your most recent credits may not appear in your account yet because the various credit scripts don't run continuously. Each day our credit total is 'exported' to several stats sites like BoincStats where you can search for yourself and other members of CPDN and other BOINC projects. The stats sites are invaluable for providing all sorts of comparative data not directly available on project web pages. At the moment nobody needs to be even thinking about extending the two-week period BOINC allows files to remain in the Transfers tab after the first failed upload. The current situation is a nuisance but not nearly as serious as the server crisis in June. Cpdn news |
Send message Joined: 27 Jan 07 Posts: 300 Credit: 3,288,263 RAC: 26,370 |
I feel your pain, Cesium. I'll have 11 or 12 zip files to upload by tomorrow plus dozens of trickles. Thankfully, I suspended all BOINC activity before the models have finished. I got 4 models scheduled to finish by tomorrow. Bad timing, I guess. |
Send message Joined: 31 Oct 04 Posts: 336 Credit: 3,316,482 RAC: 0 |
I feel your pain, Cesium. I'll have 11 or 12 zip files to upload by tomorrow plus dozens of trickles. Thankfully, I suspended all BOINC activity before the models have finished. I got 4 models scheduled to finish by tomorrow. Bad timing, I guess. The trickles should upload as soon as you re-enable network activity. Parts of the .zip files, that go to different servers, will upload too. From what I can see, only the *_2.zip files go to the stuffed (and disabled) server, BOINC will retry those upload for several days, so hopefully they will find their way either to the new server or to the freed-up space on the current temporary server. So hopefully nothing will get lost (having 20 models waiting or about to be finished myself, I sure hope that this will go well), just do not abort anything and leave the upload "retry" button alone unless the server status page shows green, as (to my knowledge) at least some BOINC versions have a limited number of upload attempts. p.s.: The simple trickle reports during a phase are included in a scheduler contact, they are not really uploads, they are just progress reports in XML format |
Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,131,270 RAC: 2,086 |
Hi, Ananas You are right about the trickles returned by most of the model types being nothing but simple progress reports, but, the trickles in the CM models contain real data. Each trickle contains the results for the model year just finished. There is also a sort of super-trickle every 10 years that contains all the trickles for the previous 10 years and a similar mega-trickle every 40 model years. |
Send message Joined: 27 Jan 07 Posts: 300 Credit: 3,288,263 RAC: 26,370 |
So hopefully nothing will get lost (having 20 models waiting or about to be finished myself, I sure hope that this will go well), just do not abort anything and leave the upload "retry" button alone unless the server status page shows green, as (to my knowledge) at least some BOINC versions have a limited number of upload attempts. All 12 uploads were successful tonight. :) |
Send message Joined: 7 Mar 06 Posts: 5 Credit: 4,085,123 RAC: 0 |
Upload server uploader.oerc down again! Planned or not? Hopefully the first but I fear the worst. |
Send message Joined: 1 Jan 07 Posts: 943 Credit: 34,376,975 RAC: 4,703 |
Upload server uploader.oerc down again! Planned or not? Hopefully the first but I fear the worst.Congratulations! Only 29 hours after the matching news thread announcement was made. That must be some sort of record. |
©2024 climateprediction.net