climateprediction.net home page
Posts by old_user733

Posts by old_user733

1) Message boards : Number crunching : Upload problem (Message 40325)
Posted 6 Aug 2010 by Profile old_user733
Post:
Most of the time the upload goes up to 1.09 MB when it stucks, so smaller zip files will upload without problems.


I don't know if this will help, but I had a similar problem a year or two ago. It turned out to be with a device called a Riverbed, which is a network caching server, which needed to be reset. This was installed to cut down on interoffice network traffic (a lot of it repetitive), but it was also set inadvertently to cache internet traffic, and started causing trouble with CPDN uploads. If you have one of these at your location, maybe it could be the problem.
2) Message boards : Number crunching : 159,333 FAMOUS models cant download any ! (Message 40246)
Posted 27 Jul 2010 by Profile old_user733
Post:
DON'T abort extra work that you get! That (those) models count against your "daily" quota.

Question: When these models crash (as they are wont to do), does that also count against your daily quota?
3) Message boards : Number crunching : How many hours are in a day? (Message 40165)
Posted 17 Jul 2010 by Profile old_user733
Post:
Whoa. Suddenly got some new tasks for machine 1070830. 5 in all, so now all 8 cores are busy with these making up the debt, and the Beta tasks are in waiting. From looking at the task list, only these 5 have been added, no new phantoms. So, problem solved for the time being, at least for this machine. Thanks all.

4) Message boards : Number crunching : How many hours are in a day? (Message 40161)
Posted 16 Jul 2010 by Profile old_user733
Post:
Hi Mo,

Yes I read that thread, and was lamenting that so many people wouldn't notice that their machines were erroring out everything they got sent, and I also noticed your post alluding to phantom tasks on Jan 27 and again on Mar 12, which is what I think is happening here. These tasks never showed up on my system, which, AFAIK, does not have any networking issues. If they had showed up and then gotten errors, I would have taken appropriate action.

BTW, when your quota is "minussed", how does it show in the message from the server? Does it actually say -1?

-Gene
5) Message boards : Number crunching : How many hours are in a day? (Message 40159)
Posted 16 Jul 2010 by Profile old_user733
Post:
Well, nothing new here. I've been running BOINC 6.10.56 since it came out. The problem started happening several weeks ago, and on both of my machines, though somehow the other one managed to get a couple of tasks, even though it was also getting the quota message.

Getting the quota message apparently tells BOINC to back off on communications, including trickles, so I now find myself checking every so often and doing them manually. If BOINC then asks for tasks again, it again gets the quota message from the server. Just did it again:


7/16/2010 1:09:38 PM climateprediction.net update requested by user
7/16/2010 1:09:41 PM climateprediction.net Sending scheduler request: Requested by user.
7/16/2010 1:09:41 PM climateprediction.net Requesting new tasks
7/16/2010 1:09:43 PM climateprediction.net Scheduler request completed: got 0 new tasks
7/16/2010 1:09:43 PM climateprediction.net Message from server: No work sent
7/16/2010 1:09:43 PM climateprediction.net Message from server: (reached daily quota of 4 tasks)

And again, BOINC has set a communications backoff for 6-1/2 hours this time. Mind you, I haven't had any new work from the project since June 29th, so I don't know where this quota is getting triggered.

What I was thinking is that it might have to do with the 20 tasks the server thinks that I have (which apparently were lost in the transactions), but even then, I haven't gotten anything recently, so why the "daily quota of 4"?

I guess I can wait until the work I have is finished, then detach and reattach to the project. That may fix it for my system, at least temporarily, but won't fix the real problem, which has got to be on the server side.
6) Message boards : Number crunching : How many hours are in a day? (Message 40155)
Posted 16 Jul 2010 by Profile old_user733
Post:
Well, I wasn't actually saying that, but yes, I tried it and that is exactly what is happening. Here's the message log from the attempt:


7/16/2010 8:48:45 AM CPDN Beta work fetch suspended by user
7/16/2010 8:48:49 AM CPDN Beta suspended by user
7/16/2010 8:49:39 AM climateprediction.net update requested by user
7/16/2010 8:49:42 AM climateprediction.net Sending scheduler request: Requested by user.
7/16/2010 8:49:42 AM climateprediction.net Requesting new tasks
7/16/2010 8:49:44 AM climateprediction.net Scheduler request completed: got 0 new tasks
7/16/2010 8:49:44 AM climateprediction.net Message from server: No work sent
7/16/2010 8:49:44 AM climateprediction.net Message from server: (reached daily quota of 4 tasks)

In the meantime, I still have the three tasks running, and 5 cores idle, but I will un-suspend Beta and they will get busy.

Also, every time I force CPDN to update and it gets this quota message, it backs off communication for a long time, in this case 11 hours. I then have to force it again when it's waiting to do a trickle.
7) Message boards : Number crunching : How many hours are in a day? (Message 40149)
Posted 16 Jul 2010 by Profile old_user733
Post:
I'm not Jim, but I have a similar problem, and perhaps a clue to the answer. Take a look at this computer's tasks. Seems like it has a lot to work on, no? But the only valid tasks are the four issued on June 27-30, one of which, 11045389, just finished. The other three 11041787, 11040358, and 11040062 are still running, due to finish in the next few days. These are all like the last slab models issued before they were discontinued. There are 20 other tasks listed as in progress which never made it to my machine. Evidently they figure in the reluctance of the server to issue it more work. Thank goodness the Beta site doesn't have these issues, looks like it will take over as these slabs finish and I get no more work from the main site.

-Gene
8) Message boards : Number crunching : How many hours are in a day? (Message 40131)
Posted 12 Jul 2010 by Profile old_user733
Post:
I've been getting these messages for about a week now on one machine, and now on this one too.


7/11/2010 11:42:16 PM climateprediction.net Sending scheduler request: To fetch work.
7/11/2010 11:42:16 PM climateprediction.net Requesting new tasks
7/11/2010 11:42:17 PM climateprediction.net Scheduler request completed: got 0 new tasks
7/11/2010 11:42:17 PM climateprediction.net Message from server: No work sent
7/11/2010 11:42:17 PM climateprediction.net Message from server: (reached daily quota of 3 tasks)

Both machines (1036631 and 1070830) are shared 50-50 with CPDN and CPDN Beta, so they need only work for two and four cores, respectively, and at the moment they still have it. In a few days, the current work will start to finish, so we'll see if any new work is forthcoming. In the past (more than a week ago), they would fill up as if all cores were working 100% for each project, something that still happens with Beta. I have BOINC set to "connect every 0.1 days" and "Additional work buffer" of 0 days, so I would expect them to ask for new work only as they are nearing completion of the current work.
9) Message boards : Number crunching : FAMOUS SUCCESS/FAILURE RATIO (Message 40052)
Posted 29 Jun 2010 by Profile old_user733
Post:
Invalid Theta on this task: famous_r100_799_200_006666899_1.

So far, five completions, and one other Invalid Theta. All on Win7_x64.
10) Message boards : Number crunching : Iceworld Appeal (Message 39779)
Posted 27 May 2010 by Profile old_user733
Post:
This task, hadsm3dhet2_k2m8_006613042_8, turned into an iceworld on a Q6600 running Win7_x64, so I aborted it. Sorry, no backup so I can\'t do any recording. It did actually finish, though, on two other machines, one running Linux and the other a Mac.
11) Message boards : Number crunching : trickle uploads stopping at 7.3 MB. Is it just me? (Message 32462)
Posted 5 Feb 2008 by Profile old_user733
Post:
It was the Riverbed. With that bypassed, the transfers went smoothly.
12) Message boards : Number crunching : trickle uploads stopping at 7.3 MB. Is it just me? (Message 32456)
Posted 5 Feb 2008 by Profile old_user733
Post:
Sorry I had to go, my ride was there. We have a thing at our location which is called a \"Riverbed\"...

http://www.riverbed.com/products/appliances/

which gets in the middle of all transfers, like a cache. I am becoming suspicious that it may be at fault here. Will explore this more tomorrow.
13) Message boards : Number crunching : trickle uploads stopping at 7.3 MB. Is it just me? (Message 32455)
Posted 4 Feb 2008 by Profile old_user733
Post:
I ran Wireshark on one of the machines with the problem, and at the point where the stoppage occurs, I see a \"TCP ZeroWindow\" message, looks like it\'s coming back from climateapps3.oucs.ox.ac.uk (163.1.13.134). This is on the Ack, and I can see a steadily decreasing window size on the preceding acks.

got go go, more on this later...
14) Message boards : Number crunching : trickle uploads stopping at 7.3 MB. Is it just me? (Message 32451)
Posted 3 Feb 2008 by Profile old_user733
Post:
Hmmm, well you\'ve answered at least part of the question. It does seem to be just me. To answer the two questions implied in your response, no, we don\'t use a proxy server, and, I\'ve been running CPDN for years from this site, we never had this issue before.

It\'s a good bet that somebody in the IT department has been fiddling with something, then. I\'ll have to complain about it tomorrow.

Thanks for the help.
15) Message boards : Number crunching : trickle uploads stopping at 7.3 MB. Is it just me? (Message 32448)
Posted 3 Feb 2008 by Profile old_user733
Post:
OK, so it must be something on my end. Just want to make sure before I start complaining to our IT department. Here\'s an example from the message log of what\'s happening...

2/3/2008 9:34:31 AM|climateprediction.net|Started upload of hadcm3iozn_cq15_2000_80_75899506_6_5.zip
2/3/2008 9:38:51 AM||Project communication failed: attempting access to reference site
2/3/2008 9:38:51 AM|climateprediction.net|Temporarily failed upload of hadcm3iozn_cq15_2000_80_75899506_6_5.zip: http error
2/3/2008 9:38:51 AM|climateprediction.net|Backing off 2 hr 8 min 39 sec on upload of hadcm3iozn_cq15_2000_80_75899506_6_5.zip
2/3/2008 9:38:52 AM||Access to reference site succeeded - project servers may be temporarily down.

The zip file in question is 15.83MB, and the transfer started and went very quickly up to 7.30MB, then abruptly halted. After a while, timeout ensued. The same thing is also happening on another machine which has 3 of these files trying to upload.
16) Message boards : Number crunching : trickle uploads stopping at 7.3 MB. Is it just me? (Message 32440)
Posted 3 Feb 2008 by Profile old_user733
Post:
I just noticed that all my machines are stuck trying to upload trickles and/or results. They start out uploading just fine, then the upload freezes at around 7.3MB, or just under halfway. The upload times out, then the process repeats. Is there a server problem there, or is it on my end?
17) Message boards : Number crunching : Server State Over, but wu is in progress! (Message 20290)
Posted 15 Feb 2006 by Profile old_user733
Post:
Don\'t abort! The CC is stupid in that CPDN does not care that the deadline has been passed, but with other projects, it is a big deal. The original bunch of work units were sent out with too short of a deadline. Later sulphurs had a deadline of about a year. Congratulations on getting so far with your P3 1 GHz and good luck on finishing!


OK, OK!! I\'ll take my finger off the button! ;-) Thanks for the quick reply.

Seriously, though, should I detach this machine from CPDN in the future? It looks like the new models will, if anything, require even more powerful CPU\'s. This machine is at the low end of my collection, so I will still be running CPDN with other, faster machines in any case. Is there a minimum recommended machine?

Thanks.
-Gene
18) Message boards : Number crunching : Server State Over, but wu is in progress! (Message 20287)
Posted 15 Feb 2006 by Profile old_user733
Post:
On this note, I have a machine which is running a sulphur cycle 4.19 model, this is the WU:

http://climateapps2.oucs.ox.ac.uk/cpdnboinc/workunit.php?wuid=711985

It is now overdue by 6 days, the BOINC CC is telling me I should abort it, but it is on phase 5. It seems that the deadline was a bit short on this WU, since I got it on Sept.12, 05, I thought these had about a year to finish.

Anyway, the WU is listed as \"over\" with \"too many total results\" as the error. Should I abort? I\'m really reluctant to do so since it is on phase 5, unless the result is worthless. It is continuing to run normally otherwise.

The machine is a 1GHz P3, it has no other project running at the moment, BOINC has been in EDF mode essentially since I got this WU. Now running BOINC CC 5.2.15.
19) Message boards : Number crunching : Announcement: Database residual problem - misallocated WUs (Message 12902)
Posted 26 May 2005 by Profile old_user733
Post:
I just got some WU's on a couple of machines, but I don't see them listed in my "results" page (yet). Is there a delay before they show up?
-----
Actually, two results showed up for one of my machines, but they are different than the ones I got. The other machine's WU's did not show up. I'm thinking I will have to abort them all and try again.
-----
These are the problem WU's:
Host: 6415 - Result ID: 880949 - Name: 3y7o_100206157 (not on machine)
Host: 6415 - Result ID: 880941 - Name: 3y7g_100206149_0 (not on machine)
Host: 6415 - Name: 3zvo_100208339_0 (not in results page) - aborting
Host: 6415 - Name: 3zx7_100208394_0 (not in results page) - aborting
Host: 1113 - Name: 3zx1_100208388_0 (not in results page) - aborting
Host: 1113 - Name: 3xv6_100205703_0 (not in results page) - aborting
-----
Interesting note: Results 880941 and 880949 are listed as being sent to me at an earlier time than I got the other four units.
-----
Update: After aborting the 4 unlisted WU's, I got 1 WU per machine that DID show up on my results page. Whew!
20) Message boards : Number crunching : Boinc 4.36 dev version. dwnlded 3 cpdn, what to do (Message 12519)
Posted 11 May 2005 by Profile old_user733
Post:
I've got one machine which dl'ed an extra WU which it won't start for maybe a month at the going rate. I know it'll finish in time, but doesn't CPDN assume you trashed it if it doesn't start trickling?



Next 20

©2024 climateprediction.net