Message boards :
Number crunching :
No credits in last 3 days???
Message board moderation
Author | Message |
---|---|
Send message Joined: 20 Feb 06 Posts: 158 Credit: 1,251,176 RAC: 0 |
No credits given in 3 days now??? Yet server status seems OK. Keith |
Send message Joined: 20 Feb 06 Posts: 158 Credit: 1,251,176 RAC: 0 |
What is happening? (Or rather NOT HAPPENING?) No activity on Number Crunching Forum!!!!! And I have had an upload start but not finish:--- Sat 11 Dec 00:55:32 2010 climateprediction.net Scheduler request completed Sat 11 Dec 01:00:00 2010 Suspending network activity - time of day Sat 11 Dec 09:43:46 2010 climateprediction.net Computation for task famous_v8sq_1599_200_006697168_1 finished Sat 11 Dec 09:43:46 2010 climateprediction.net Output file famous_v8sq_1599_200_006697168_1_4.zip for task famous_v8sq_1599_200_006697168_1 absent Sat 11 Dec 09:43:46 2010 climateprediction.net Output file famous_v8sq_1599_200_006697168_1_5.zip for task famous_v8sq_1599_200_006697168_1 absent Sat 11 Dec 09:43:46 2010 climateprediction.net Output file famous_v8sq_1599_200_006697168_1_6.zip for task famous_v8sq_1599_200_006697168_1 absent Sat 11 Dec 09:43:46 2010 climateprediction.net Output file famous_v8sq_1599_200_006697168_1_7.zip for task famous_v8sq_1599_200_006697168_1 absent Sat 11 Dec 09:43:46 2010 climateprediction.net Output file famous_v8sq_1599_200_006697168_1_8.zip for task famous_v8sq_1599_200_006697168_1 absent Sat 11 Dec 09:43:46 2010 climateprediction.net Output file famous_v8sq_1599_200_006697168_1_9.zip for task famous_v8sq_1599_200_006697168_1 absent Sat 11 Dec 09:43:46 2010 climateprediction.net Output file famous_v8sq_1599_200_006697168_1_10.zip for task famous_v8sq_1599_200_006697168_1 absent Sat 11 Dec 09:43:46 2010 climateprediction.net Output file famous_v8sq_1599_200_006697168_1_11.zip for task famous_v8sq_1599_200_006697168_1 absent Sat 11 Dec 09:43:46 2010 climateprediction.net Output file famous_v8sq_1599_200_006697168_1_12.zip for task famous_v8sq_1599_200_006697168_1 absent Sat 11 Dec 09:43:46 2010 climateprediction.net Output file famous_v8sq_1599_200_006697168_1_13.zip for task famous_v8sq_1599_200_006697168_1 absent Sat 11 Dec 09:43:46 2010 climateprediction.net Output file famous_v8sq_1599_200_006697168_1_14.zip for task famous_v8sq_1599_200_006697168_1 absent Sat 11 Dec 09:43:46 2010 climateprediction.net Output file famous_v8sq_1599_200_006697168_1_15.zip for task famous_v8sq_1599_200_006697168_1 absent Sat 11 Dec 09:43:46 2010 climateprediction.net Output file famous_v8sq_1599_200_006697168_1_16.zip for task famous_v8sq_1599_200_006697168_1 absent Sat 11 Dec 09:43:46 2010 climateprediction.net Output file famous_v8sq_1599_200_006697168_1_17.zip for task famous_v8sq_1599_200_006697168_1 absent Sat 11 Dec 09:43:46 2010 climateprediction.net Output file famous_v8sq_1599_200_006697168_1_18.zip for task famous_v8sq_1599_200_006697168_1 absent Sat 11 Dec 09:43:46 2010 climateprediction.net Output file famous_v8sq_1599_200_006697168_1_19.zip for task famous_v8sq_1599_200_006697168_1 absent Sat 11 Dec 09:43:46 2010 climateprediction.net Output file famous_v8sq_1599_200_006697168_1_20.zip for task famous_v8sq_1599_200_006697168_1 absent Sat 11 Dec 09:43:46 2010 climateprediction.net Starting famous_v8nt_799_200_006696991_5 Sat 11 Dec 09:43:46 2010 climateprediction.net Starting task famous_v8nt_799_200_006696991_5 using famous version 611 Sat 11 Dec 12:45:13 2010 Resuming network activity Sat 11 Dec 12:45:13 2010 climateprediction.net Sending scheduler request: To send trickle-up message. Sat 11 Dec 12:45:13 2010 climateprediction.net Reporting 1 completed tasks, not requesting new tasks Sat 11 Dec 12:45:15 2010 climateprediction.net Started upload of famous_v9a7_1599_200_006697797_1_8.zip Sat 11 Dec 12:45:15 2010 climateprediction.net Started upload of famous_v8sq_1599_200_006697168_1_3.zip Sat 11 Dec 12:45:17 2010 climateprediction.net Scheduler request completed Sat 11 Dec 12:46:07 2010 climateprediction.net Finished upload of famous_v9a7_1599_200_006697797_1_8.zip CAN SOMEONE PLEASE EXPLAIN? Keith |
Send message Joined: 28 Oct 04 Posts: 64 Credit: 34,444,555 RAC: 0 |
I've got the same problem - no updates for three days - and this is not the first time! Two weeks ago (approx), same thing. My refresh requests simply show the same data even though the ten systems are running. Please fix asap! BillN |
Send message Joined: 20 Feb 06 Posts: 158 Credit: 1,251,176 RAC: 0 |
Bill As absolutely nothing has been happening on the Number Crunching Forum EXCEPT YOUR REPLY, BILL!!!!!! and my 2 messages, I have sent a private message to Les Bayliss and am awaiting a reply hopefully. I would add that I have no doubt that he would reply with his usual diligence, but doubt whether he will receive my querying message!!!! In the meantime, I am going to try a reset of BOINC to see if that has any effect. Keith |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
Hi Keith and Bill I've only been crunching CPDN recently, not other projects. My page at BoincStats shows that although on some days recently no CPDN credits were exported, my total is correct. As at last night UK time my total at BoincStats was up-to-date. So it must be in our CPDN accounts that the totals aren't updating? Sorry I can't check this myself because I don't usually watch the credit figures there. Milo took the deliberate decision before he left recently not to add up the credit totals every single day because the database server is having a hrd time coping with its workload, particularly during the UK night when the data and credits scripts are scheduled to run. I will report this as soon as it's been confirmed that you do mean in our CPDN accounts. The sooner Milo's successor starts at CPDN and gets to grips with the server database problems the better. Interviews were held a few days ago though I'm not sure whether they were for Milo's post or Tolu's or both. Cpdn news |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
Yes, apologies for the late replies, Keith. Les Bayliss has been ill with an infection for about 10 days. He's at home, taking the medications and will get over this illness but at the moment he's very unwell. You shouldn't need to reset CPDN or anything like that. I'm going to look now at what happened to the crashed model and should soon know whether you need to do anything. Cpdn news |
Send message Joined: 16 Jun 05 Posts: 10 Credit: 20,676,311 RAC: 0 |
Hi! I have the same problem! I have aprox. 100 host running CPDN and none of them had credit last 3 days! Mike |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
Here are your models, Keith. I see you have a Mac, which is significant. The Boinc manager messages you quoted above with all the lines about missing files mean that the model in question crashed before it produced file number 4. As FAMOUS uploads 20 files when it's successful, files 4 - 20 were missing ie never produced by your model. All those lines about missing files are nothing to worry about. You've had 3 recent FAMOUS crashes. That's the 2nd, 3rd and 4th tasks in your list. These FAMOUS models aren't 100% successful. Quite a few crash. That's because the chief researcher, Hiro Yamazaki, is trying out parameter values that are more extreme than usual to see what works and what doesn't. The idea isn't crazy. At a climate conference I went to last year in London a researcher from a completely different university, not Oxford, said we need more models with a wider range of parameter values. But some of these models crash as a result. In addition, the Mac compilation for FAMOUS makes this model type run faster than on Windows or Linux. This produces a higher proportion of model crashes on Mac. On each model's web page we can see extra messages by clicking on stderr + (after the model's completed). famous_v9ob_599_200_006698305_6 had exit code 22 and the message NEGATIVE PRESSURE VALUE CREATED. This is typical of a crash caused by extreme parameter values. famous_vajr_1599_200_006699437_5 crashed with code 22 and the message ATM_DYN : INVALID THETA DETECTED. This is the other type of crash caused by parameter values. famous_v1lv_1999_200_006729611_1 crashed with INVALID THETA. The model whose Boinc manager messages you quoted in your opening post is near the top of your second page of models. It crashed with INVALID THETA. Again, this means your computer's behaving properly but the model didn't. Hiro's interested in the models that crash, not just the ones that reach the end so your computer time hasn't been wasted. In fact on the CPDN Beta project he asked us to run a batch of models even though he knew in advance that they would probably all crash with these same messages. If I had FAMOUS crashes whose messages didn't include INVALID THETA or NEGATIVE PRESSURE I'd wonder what was wrong with the computer. So seeing these particular messages reassures me... By the way, I noticed that the model you mentioned in your opening post hasn't received any of its credits yet. Cpdn news |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
Thanks, Mike. I can see that your models haven't been receiving their credits either. I shall report this now. Cpdn news |
Send message Joined: 7 Oct 07 Posts: 5 Credit: 196,852 RAC: 0 |
I have the same problem. Nothing for three days. Both my current tasks are FAMOUS. ADDENDUM: Amazing! Just moments after I sent this message, my credits updated. Surely a coincidence. |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
Not a coincidence! At the beginning of December Milo started in a new computing post at Oxford University but not in climate modelling which isn't his own research specialist area. But he's working in the same building (perhaps even in the same room some of the time) and he still very kindly keeps an eye on urgent things for CPDN. When I reported in the forum for the moderators and admins that credits hadn't been updated for 3 days he reacted immediately even though it's Saturday. He said he'd run the script. But he isn't doing non-urgent CPDN jobs. This is a good idea because he'd probably have no time for his new post and his new department would start to complain about us. There's a general server problem with access to the database. The database server is having a hard time. I think CPDN needs new hardware to solve this problem. The person who replaces Milo is going to be busy. Very busy. Cpdn news |
Send message Joined: 7 Oct 07 Posts: 5 Credit: 196,852 RAC: 0 |
Thank you both for taking time on a Saturday to look into all this. |
Send message Joined: 20 Feb 06 Posts: 158 Credit: 1,251,176 RAC: 0 |
Yes Mo My Statistics graph, too, is now back on track again, to the original line (CPDN FAMOUS only). I knew that the FAMOUS tasks earned at a significantly higher rate, and am only being allocated FAMOUS tasks now. I also appreciated that many were deliberately made to fail to test the data. But this time the credits were being delayed 2nd Dec, 4th & 5th Dec, 8th to 10th Dec. I was concerned that the underlying data was being lost, but now realize that all is OK now. And sometimes I got messages that the database was off line. I might add that I was allocated an EU task which I was not supposed to have been sent. But this apparently was successfully completed, and I have received no more since. Keith |
Send message Joined: 2 Mar 06 Posts: 253 Credit: 363,646 RAC: 0 |
Thanks for your assistance, Mo. The problem at the moment is there is a lot of demand on the database for connections. We have only one database and so when it reaches the maximum number any other script is refused. Unfortunately, that has often been the credit script recently, which means that I have to try running it manually. New staff have been interviewed so we hope to get some people in soon who might be able to deal with this. |
Send message Joined: 1 Oct 05 Posts: 12 Credit: 10,041,430 RAC: 0 |
Looking at the BOINC Stats page I see there is a bit of an issue for all users with credit uploads.. Here's the graphs I noticed: It seems the progress occasionally stalls but then picks up with one day of "exaggerated" credit, making up for the days that were reporting none. I'll keep an eye on my uploads to make sure they don't get stuck in that phase, but hopefully everything will be running smoothly again! [/img] |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
Milo says that the database, which is enormous because so many members generate so much data, will need to be replicated onto new hardware. That will mean a new server. That means money, and the new person who replaces Milo a) needs to start working for CPDN and b) needs to get used to both the hardware and the software before doing this big job. So it could be months before this job is done. During these months the same thing could happen again ie the credits script could fail to run automatically because there are too many simultaneous connections for the old server. But Milo's successor will, I hope, notice quickly. The two new programmers will also need to try to understand the new Boinc credit system to see whether CPDN should change to it. Oxford University probably didn't show them that web page at their job interviews, otherwise nobody would have accepted the posts. If there is even one CPDN member who can understand it and explain it to the rest of us could he or she please start a new thread... Cpdn news |
Send message Joined: 20 Feb 06 Posts: 158 Credit: 1,251,176 RAC: 0 |
Mo It now seems the ?database is falling apart with 2 successive failures of FAMOUS tasks:- Mon 13 Dec 01:03:40 2010 climateprediction.net Output file famous_v8nt_799_200_006696991_5_20.zip for task famous_v8nt_799_200_006696991_5 absent Mon 13 Dec 01:03:40 2010 climateprediction.net Starting famous_v7fd_999_200_006695391_2 Mon 13 Dec 01:03:40 2010 climateprediction.net Starting task famous_v7fd_999_200_006695391_2 using famous version 611 Mon 13 Dec 08:59:32 2010 climateprediction.net Computation for task famous_v7fd_999_200_006695391_2 finished Mon 13 Dec 08:59:32 2010 climateprediction.net Output file famous_v7fd_999_200_006695391_2_1.zip for task famous_v7fd_999_200_006695391_2 absent Mon 13 Dec 08:59:32 2010 climateprediction.net Output file famous_v7fd_999_200_006695391_2_2.zip for task famous_v7fd_999_200_006695391_2 absent Mon 13 Dec 08:59:32 2010 climateprediction.net Output file famous_v7fd_999_200_006695391_2_3.zip for task famous_v7fd_999_200_006695391_2 absent Mon 13 Dec 08:59:32 2010 climateprediction.net Output file famous_v7fd_999_200_006695391_2_4.zip for task famous_v7fd_999_200_006695391_2 absent Mon 13 Dec 08:59:32 2010 climateprediction.net Output file famous_v7fd_999_200_006695391_2_5.zip for task famous_v7fd_999_200_006695391_2 absent Mon 13 Dec 08:59:32 2010 climateprediction.net Output file famous_v7fd_999_200_006695391_2_6.zip for task famous_v7fd_999_200_006695391_2 absent Mon 13 Dec 08:59:32 2010 climateprediction.net Output file famous_v7fd_999_200_006695391_2_7.zip for task famous_v7fd_999_200_006695391_2 absent Mon 13 Dec 08:59:32 2010 climateprediction.net Output file famous_v7fd_999_200_006695391_2_8.zip for task famous_v7fd_999_200_006695391_2 absent Mon 13 Dec 08:59:32 2010 climateprediction.net Output file famous_v7fd_999_200_006695391_2_9.zip for task famous_v7fd_999_200_006695391_2 absent Mon 13 Dec 08:59:32 2010 climateprediction.net Output file famous_v7fd_999_200_006695391_2_10.zip for task famous_v7fd_999_200_006695391_2 absent Mon 13 Dec 08:59:32 2010 climateprediction.net Output file famous_v7fd_999_200_006695391_2_11.zip for task famous_v7fd_999_200_006695391_2 absent Mon 13 Dec 08:59:32 2010 climateprediction.net Output file famous_v7fd_999_200_006695391_2_12.zip for task famous_v7fd_999_200_006695391_2 absent Mon 13 Dec 08:59:32 2010 climateprediction.net Output file famous_v7fd_999_200_006695391_2_13.zip for task famous_v7fd_999_200_006695391_2 absent Mon 13 Dec 08:59:32 2010 climateprediction.net Output file famous_v7fd_999_200_006695391_2_14.zip for task famous_v7fd_999_200_006695391_2 absent Mon 13 Dec 08:59:32 2010 climateprediction.net Output file famous_v7fd_999_200_006695391_2_15.zip for task famous_v7fd_999_200_006695391_2 absent Mon 13 Dec 08:59:32 2010 climateprediction.net Output file famous_v7fd_999_200_006695391_2_16.zip for task famous_v7fd_999_200_006695391_2 absent Mon 13 Dec 08:59:32 2010 climateprediction.net Output file famous_v7fd_999_200_006695391_2_17.zip for task famous_v7fd_999_200_006695391_2 absent Mon 13 Dec 08:59:32 2010 climateprediction.net Output file famous_v7fd_999_200_006695391_2_18.zip for task famous_v7fd_999_200_006695391_2 absent Mon 13 Dec 08:59:32 2010 climateprediction.net Output file famous_v7fd_999_200_006695391_2_19.zip for task famous_v7fd_999_200_006695391_2 absent Mon 13 Dec 08:59:32 2010 climateprediction.net Output file famous_v7fd_999_200_006695391_2_20.zip for task famous_v7fd_999_200_006695391_2 absent Mon 13 Dec 08:59:32 2010 climateprediction.net Starting famous_v7cx_1399_200_006695303_1 Mon 13 Dec 08:59:32 2010 climateprediction.net Starting task famous_v7cx_1399_200_006695303_1 using famous version 611 Mon 13 Dec 09:55:14 2010 Resuming network activity Mon 13 Dec 09:55:14 2010 climateprediction.net Sending scheduler request: To send trickle-up message. Mon 13 Dec 09:55:14 2010 climateprediction.net Reporting 2 completed tasks, requesting new tasks Mon 13 Dec 09:55:15 2010 climateprediction.net Started upload of famous_v7zd_1999_200_006696111_3_4.zip Mon 13 Dec 09:56:08 2010 climateprediction.net Finished upload of famous_v7zd_1999_200_006696111_3_4.zip Mon 13 Dec 09:56:49 2010 climateprediction.net Scheduler request failed: HTTP gateway timeout Mon 13 Dec 09:57:50 2010 climateprediction.net Sending scheduler request: To send trickle-up message. Mon 13 Dec 09:57:50 2010 climateprediction.net Reporting 2 completed tasks, requesting new tasks Mon 13 Dec 09:58:07 2010 climateprediction.net Scheduler request completed: got 1 new tasks Mon 13 Dec 09:58:07 2010 climateprediction.net Message from server: Completed result famous_v8nt_799_200_006696991_5 refused: result already reported as error Mon 13 Dec 09:58:09 2010 climateprediction.net Started download of famous_v6t2_1999_200_006694588.zip I think it is time to stop crunching and to come back in 6 months time??? Keith |
Send message Joined: 20 Feb 06 Posts: 158 Credit: 1,251,176 RAC: 0 |
Credits are again failing, no doubt to attend to more important matters as before. |
Send message Joined: 2 Mar 06 Posts: 253 Credit: 363,646 RAC: 0 |
Credits are again failing, no doubt to attend to more important matters as before. The script is in fact running at the moment. |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
Keith said: It now seems the ?database is falling apart with 2 successive failures of FAMOUS tasks:- Keith, your two FAMOUS that have crashed within the last 12 hours are famous_v7zd_1999_200_006696111_3 and famous_v8nt_799_200_006696991_5. Both crashed with 6 INVALID THETA messages in stderr +. As I explained above, this message indicates extreme parameter values which can cause models to crash on some combinations of CPU type + operating system. This is completely unconnected to the database access problem which should not cause any models to crash, though it is causing some downloads to fail. The database access problem is causing some trickle uploads to be delayed but this will not cause model crashes. Cpdn news |
©2024 climateprediction.net