Message boards : Number crunching : Upload Failure
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 10 · Next
Author | Message |
---|---|
Send message Joined: 14 Sep 10 Posts: 11 Credit: 1,812,972 RAC: 0 |
The upload handler string does not appear to be mis-spelled on the problem systems. It reads "http://cpdn-restarts.oerc.ox.ac.uk/cgi-bin/file_upload_handler". The problem is that the upload shouldn't go to "cpdn-restarts.oerc..." at all. It should go to "uploader.oerc...". I read this at http://climateapps2.oerc.ox.ac.uk/cpdnboinc/forum_thread.php?id=5447&nowrap=true#44076 , fixed the client-state.xml files, and everything's uploading fine now. Please update the server-status page to indicate which files go where, so that users can fix this problem more easily. |
Send message Joined: 15 May 09 Posts: 4541 Credit: 19,039,635 RAC: 18,944 |
Please update the server-status page to indicate which files go where, so that users can fix this problem more easily. What you say makes perfect sense to me. However I am sure that there are more than a few who would make a total hash of this if the information was that easily available. Making people ask questions here probably weeds out some of those who would produce the proverbial dogs dinner. I also wonder why it was going to the wrong server. Is this to do with the fixing of the server problems recently, in particular the interim fix before it was properly fixed? If so, problems due to that source have in the past been fixed at the CPDN end and the uploads have eventually gone through. Dave |
Send message Joined: 15 May 09 Posts: 4541 Credit: 19,039,635 RAC: 18,944 |
And despite what I just said, it would be nice to see what was meant to go where, then if a server was down we would be able to see whether or not that was the cause of our problems as opposed to just assuming. Dave |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
The mods do have a list, but it's a bit dynamic. The real problem, is that during the recent server failure, re-directs were put in place for all servers, so that in the event of future problems, the data could be quickly sent to another server somewhere else in Oxford. It appears that this needs to be checked again, against the current working server list. The upload server being used for every upload is listed in client_state.xml, where it's used by BOINC to address it's mail. The labels on the servers is (mostly) the physical location of the server, which makes it easier to know where to go where the server needs a physical re-boot, as against a ssh wakeup. I've just informed the project, so it'll get looked at in half a day. But there may be more data sets that are affected, in which case, it shouldn't be a problem for too long. Backups: Here |
Send message Joined: 14 Sep 10 Posts: 11 Credit: 1,812,972 RAC: 0 |
I've just informed the project, so it'll get looked at in half a day. But there may be more data sets that are affected, in which case, it shouldn't be a problem for too long. Thanks. It does seem very likely that others are also experiencing this problem, perhaps without noticing it. |
Send message Joined: 19 Dec 09 Posts: 4 Credit: 256,301 RAC: 0 |
5/28/2012 4:47:56 AM | climateprediction.net | [error] Error reported by file upload server: can't open file /storage/cpdn-restarts/incoming/uploader/hadam3p_pnw_azhc_1967_1_007885059_1_13.zip: No such file or directory |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
It's a power failure in one of the buildings. edit Which now seems to be fixed. 2 of my zip 13s are currently uploading. Backups: Here |
Send message Joined: 19 Dec 09 Posts: 4 Credit: 256,301 RAC: 0 |
sure they upload ... the client says so but once it gets to 100% I still get the same error. |
Send message Joined: 15 May 09 Posts: 4541 Credit: 19,039,635 RAC: 18,944 |
Servers keep going up and down. My guess is they are rebuilding the raid array if it went off during a write but that is no more than a guess. Dave |
Send message Joined: 15 May 09 Posts: 4541 Credit: 19,039,635 RAC: 18,944 |
Zip1 just gone through without an error message afterwards, waiting to see about the 13s |
Send message Joined: 26 Aug 04 Posts: 84 Credit: 351,331 RAC: 0 |
Well that cpdnupload2.oerc server keeps going up and down like a yo-yo so my WU's never complete the upload. |
Send message Joined: 7 Aug 04 Posts: 83 Credit: 410,895 RAC: 0 |
All upload servers are shown to be operating yet I still can not complete the 13.zip upload of the PNW unit. It keeps reaching 100% and then fails. -ChinookF�hn |
Send message Joined: 15 May 09 Posts: 4541 Credit: 19,039,635 RAC: 18,944 |
As Les Helpfully reminded me in this thread http://climateapps2.oerc.ox.ac.uk/cpdnboinc/forum_thread.php?id=7378 zip13s go to the restart server to generate the next data set. Following the power outage in the server room, the server has been going on and off a lot today and I would guess that is something to do with the work they are doing to restore it to full functionality after the power outage. In the last 10 minutes it has gone from down to up again on the server status page but I think I will keep network activity suspended till tomorrow and try again then. Dave |
Send message Joined: 19 Dec 09 Posts: 4 Credit: 256,301 RAC: 0 |
Every time I check the server status page it shows all servers "running" and I do not see a server named "restart" and my final upload is not being accepted. Would be nice if they turned the uploads off entirely so I don't have to continue reuploading the file only to have it fail in the end. Does the project staff typically post when these types of problems occur / get correted? |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
This problem was being discussed in the adjacent thread, had3pam_eu models not uploading, before you started to post. Look there for news. Sometimes one of the two project people post, but that's in the News and Announcements thread. Most of the time they leave it to the moderators to pass on news. In both directions. You'll only see the cpdn_restarts server name mentioned in the client_state.xml file, because all of the servers have aliases to make it simpler and faster to redirect data to an alternative server when one fails for any reason. Backups: Here |
Send message Joined: 15 May 09 Posts: 4541 Credit: 19,039,635 RAC: 18,944 |
My guess is that turning the uploads off completely might interfere with something else or it may just be that under resourced as the project is they have more important things to do such as working on resolving the problem. If you look in your clientstate.xml file you will see entries such as these which give hints as to which server is being used. <file> <name>hadam3p_pnw_c3rx_1988_1_007937852_0_13.zip</name> <nbytes>0.000000</nbytes> <max_nbytes>150000000.000000</max_nbytes> <status>0</status> <upload_url>http://cpdn-restarts.oerc.ox.ac.uk/cgi-bin/file_upload_handler</upload_url> </file> Which I think refres uploader.oerc but it could be upload2.oerc. It was suggested in another thread that it may be a drive mounting problem or a daemon problem causing the uploads to fail @ 100%. Best thing is to turn off network access for BOINC till things are resolved unless you are running multiple projects in which case there is a way to do it but as I only run CPDN I have never learned it. Beat me to it Les, feel free to contradict anything I may have got wrong on the technical side. |
Send message Joined: 19 Dec 09 Posts: 4 Credit: 256,301 RAC: 0 |
This problem was being discussed in the adjacent thread, had3pam_eu models not uploading, before you started to post. I get that you are busy and covering lots of ground but your *tone kind of sounds like I'm at fault for not reading a thread about EU models when I'm taling about finals for PNW. Perhaps redirecting me to that thread first would have worked better for all. Guess it really doesn't matter much as there are still reports saying it's still not working. While I understand short budgets and time constraints as an IT professional myself, this has turned into just one more typical alientating BOINC forum experience ... love the science, frustrated with the communication. |
Send message Joined: 16 Jan 10 Posts: 1084 Credit: 7,884,997 RAC: 4,577 |
... While I understand short budgets and time constraints as an IT professional myself ... The moderators are volunteers and have no budgets; their time is as constrained as their real life and motivation allow. On the other (PHP) board there was recently an exchange in which a user posted a question and got what seemed to me to be a very complete answer from another user, which provided not only the direct information required but also the context in which to interpret the answer. The first user then posted back to complain about "long-winded waffle". Provide a direct answer, as Les did here, and that will offend someone else for being abrupt. Essentially, you can't win. If your complaint is about the lack of communication from project staff, then I'm with you ++ - but please give other volunteers a break. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Both time and health constraints. It was getting cold, and past time to go to bed where it's warmer. People discuss matters in lots of threads whose title doesn't always reflect a current problem. It's always a good idea to look at the most recent thread or two when their time/date stamp shows that they're recent. As you don't seem familiar with this board, perhaps I should say: Warning Essential maintenance on our storage infrastructure will require our main uploads servers to go out of service at the following time: This message was posted to the following places: News and Announcement] (at the top of the Number crunching section) News and Announcements (near the top of our alternative PHP board) News (linked from the front page of this project.) The 1st of these can be subscribed to, and you'll get an email every time something is posted there. BUT you MUST have email messages activated. The 2nd location, on the PHP bard, requires a separate login, as it's the original board from back before BOINC started, and has nothing to do with this board. Any of the threads there can also be subscribed to. The 3rd location has an RSS feed. Backups: Here |
Send message Joined: 15 May 09 Posts: 4541 Credit: 19,039,635 RAC: 18,944 |
Good to see that my zip13s have now gone through. |
©2024 cpdn.org