Message boards :
Number crunching :
VANISHING WU'S
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · Next
Author | Message |
---|---|
Send message Joined: 5 Aug 04 Posts: 1496 Credit: 95,522,203 RAC: 0 |
Primary reason is server upgrade. That's no small task, given all the unique code added for this project, which uses the server somewhat differently when compared to other projects. My machines heat my house, so the electricity does double duty (otherwise, the ceiling radiant heat, electric, would have to be used). CPDN doesn't benefit now but Einstein and WCG get a small boost. What's the Russian general's line in "War and Peace"? Patience and time... time and patience. "We have met the enemy and he is us." -- Pogo Greetings from coastal Washington state, the scenic US Pacific Northwest. |
Send message Joined: 5 Aug 04 Posts: 108 Credit: 19,548,566 RAC: 33,196 |
Every night at midnight your time you'll still have your work quota put back to one model per core per day. Uhm, in older BOINC server-code it was midnight server-time, not user-time, so for CPDN this would equal midnight GMT in the winter. Since having all quota-limited computers connecting the hour after midnight server-time gave an extra spike in server-load, in more resent server-code the "midnight" is instead randomly assigned to individual computers, meaning someone with multiple computers can have one computer getting a new quota at 01:23:45, another at 12:33:44, a third at 05:43:21 and so on. I'm not sure if CPDN has resent-enough code to have this functionality or the older midnight-server-time-code... |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
You are right; it's midnight servertime, not usertime. Sorry for the mistake. I believe that after the server's midnight, computers are allowed to request new work at a random number of minutes during the next hour. I'm not sure how this affects CPDN as our computers can only request work once per hour anyway. I imagine it means the work quota is reset at the computer's first server contact after server midnight. Cpdn news |
Send message Joined: 15 May 09 Posts: 4352 Credit: 16,590,792 RAC: 6,226 |
Not sure if this is valid or not Name hadcm3n_7jcv_1980_40_008436370_3 Workunit 8587226 It doesn't seem to be marked, "no resubmission" but another in that workunit is and it does have the 2023 deadline. Mind you It will be a few days till one of the other tasks I am crunching finishes anyway. |
Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,115,532 RAC: 2,545 |
I hate to be the barer bad tidings, but, the 2023 deadline is a give-way. It is almost certainly a bad WU and I would abort it. |
Send message Joined: 19 Sep 04 Posts: 92 Credit: 1,937,829 RAC: 183 |
And having "No Resubmission" at the top of the workunit info is also not a good sign (http://climateapps2.oerc.ox.ac.uk/cpdnboinc/workunit.php?wuid=8587226). Professor Desty Nova Researching Karma the Hard Way |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
The project is in the final stages of the major upgrade. Still a fair bit to do, but it's getting there. It wasn't helped by the recent major failure of part of the university computer network where our servers are located. Backups: Here |
Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,115,532 RAC: 2,545 |
Judging from the count in the �Tasks in Progress� on the �Server Status� Page I think that we may have had a small release of new WU�s overnight. Probably only 2000 or 3000. Hopefully, these are good ones. With all the hungry computers out there they didn�t last long. Unfortunately, I didn�t get one. |
Send message Joined: 22 Feb 06 Posts: 487 Credit: 29,682,098 RAC: 5,087 |
I have recently picked up hadcm3n_7x8g_1980_40_008454355 but it has a completion date of 1ts May 2014 rather than 2023. Is this one of the rogue batch to be aborted or should I let it run anyway? |
Send message Joined: 15 May 09 Posts: 4352 Credit: 16,590,792 RAC: 6,226 |
Completion date may 2014 suggests it's not one of the rogue batch. Also if you look at the work unit link, http://climateapps2.oerc.ox.ac.uk/cpdnboinc/forum_reply.php?thread=7671&post=48137&no_quote=1#input It isn't marked, "No Resubmission" so you should be all right on this one. |
Send message Joined: 7 Aug 04 Posts: 2169 Credit: 64,555,907 RAC: 5,858 |
I have recently picked up hadcm3n_7x8g_1980_40_008454355 but it has a completion date of 1ts May 2014 rather than 2023. Is this one of the rogue batch to be aborted or should I let it run anyway? No, the first task from that work unit was issued in September, so it is not one of the bad batch. |
Send message Joined: 22 Feb 06 Posts: 487 Credit: 29,682,098 RAC: 5,087 |
Thanks. Just checking. |
Send message Joined: 20 May 10 Posts: 13 Credit: 55,033 RAC: 0 |
This has been a very informative thread for me and makes me wish that had been reading in the thread much earlier on than just now. I have aborted a lot of my earlier jobs because they still had a lot of hours to run when the due date came and I probably should have checked with someone instead to see if I should have let it keep running or not. I am approaching a deadline on a job now - the deadline is the 13th @ 3:38 am and the task will not complete before the deadline - physically impossible to run 180+ hours in less than 48. The task is had3mcn_022u_2020_40_008398650_3. Another thing about the task deadline is that it keeps growing in hours - a few days ago it was 172 hours left to run and now is over 180 so I really don't know how long it will take for the job to finish. So, do I let the job keep running or abort it like I have done with almost every previous job I have done with the project? It seems such shame to run a project for so many hours and then abort due to time expiration... |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
There's no deadline. The number that's reported as one, is just there because the BOINC software requires one. For this project, a computer can take a couple of years if necessary. The only problem then, is that the model in question will get a red message put into "status" field on it's page on the server, and the server software will re-issue it to some one else. The gradual increase in the completion time would probably be caused by the computer running it, not running sufficient hours a day, or being very slow, or the model being swapped out for work from other projects. (Which is effectivly the same as the first reason.) Just keep plodding on. Backups: Here |
Send message Joined: 20 May 10 Posts: 13 Credit: 55,033 RAC: 0 |
Thanks for the info Les. I don't know about your explanations for the added hours. The project has been running continuously on high priority status and hasn't stopped except for when I rebooted my computer a couple times earlier today. I had just over 180 hours yesterday and now the project has 194 hours remaining to run so I don't understand how it can add about 14 hours in one day's time under the explanation you gave. It just doesn't sound right to me. Anyway, I am continuing to let it run. If it continues to increase hours at this same rate I will report additional hour increases as something strange seems to be occurring here. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Well, there is another possibility: the model is in an indefinite loop. You'll need to look at the numbers in the bottom left corner of the Show graphics page, and watch them or write them down. Check every now and then to see if they go back to earlier numbers and then repeat. Backups: Here |
Send message Joined: 20 May 10 Posts: 13 Credit: 55,033 RAC: 0 |
OK, here are 3 readings that I've taken from the screen as you suggested: 2/13 noon 582301 of 1039392 56.03% hours of computing 469.27.28 _______ 2/13 7:38 PM 590.551 56.82% 475.38.04 _______ 2/14 6:20 am 601,153 57.84% 463.22.58 It does appear to be progressing to me in sequence within the task itself however its still adding time onto the job so maybe this is normal and the 14 hours yesterday was a fluke. The job now has 197 hours remaining to completion and 811 completed so it has added an additional 3 hours onto the task since my last message to you...but thats fewer than the 14 from the previous day! I'll keep watching and see what happens from here unless you have any other thoughts. |
Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,115,532 RAC: 2,545 |
Keep running the WU (24/7 if possible) and keep watching the model dates. It should be constantly progressing. What you are looking for is if it suddenly regresses by several years. This would indicate that it was stuck in a loop. It can go round and round in these loops forever. This happens to these models sometimes and there is nothing that you can do to fix it. Good Luck. P.S. this is one reason that they have to fix the graphics on the new 7.22 Hadam3p_pnw models. Without visible dates it is extremely hard to know if a model is progressing normally. |
Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,115,532 RAC: 2,545 |
Tasks in progress is down to 38,331. That is the lowest that I have ever seen it. In 3 hours I will have an empty core to fill. Hopefully more are on the way soon. |
Send message Joined: 31 Dec 09 Posts: 12 Credit: 17,214 RAC: 0 |
Hi there! Finally I've got hold of a CPN work unit (hadcm3n_7zue_1980_40_008457737) but having read this thread I'm a bit concerned now. My wu is one of the hadcm3n_7 series and there seem to be problems with those. However, it has a deadline of May 2014, it isn't tagged with "no resubmission" and it says it was originally submitted in September 2013. Am I right to assume that it's OK to run? I don't want to waste any computer time and energy. |
©2024 climateprediction.net