climateprediction.net home page
Posts by BarryAZ

Posts by BarryAZ

21) Message boards : Number crunching : still don't get credits since last breakdown (Message 46948)
Posted 2 Sep 2013 by BarryAZ
Post:
It seems to me that the issue is pretty clearly something at the project and that no efforts by individuals will have any effect. I don't know that anyone has seen credit increments in many weeks.

Personally, it isn't that big of a deal for me, when the project migrated to 'broken' condition last month, after a few days, I simply suspended all Climate processing -- there are other projects.

I am sure that when (and I'm assuming when rather than if) the project folks resolve this there will be an updated news blast and we can move forward.

I note that is actually a set of assumptions.

1) That the key people back at the project are aware of the credit freeze.

2) They are working toward resolving the issue.

3) They will be successful in resolving the issue.

4) Once successful project folks will pass on the update so it gets reported by the good folks who as volunteers share messages here with the general public.

22) Message boards : Number crunching : Stats script not running (Message 46129)
Posted 30 Apr 2013 by BarryAZ
Post:
Perhaps this one slipped by -- but the stats updates are not processing (for the past three days).

23) Message boards : Number crunching : Server is full (Message 45912)
Posted 12 Apr 2013 by BarryAZ
Post:
One of the two files uploaded over night - that cleared one computer -- so I've 'unsuspended' Climate there.

What I get on the failed upload:

4/12/2013 9:28:56 AM | climateprediction.net | Started upload of hadcm3n_4juh_1980_40_008323966_1_3.zip
4/12/2013 9:28:58 AM | climateprediction.net | [error] Error reported by file upload server: [hadcm3n_4juh_1980_40_008323966_1_3.zip] locked by file_upload_handler PID=27612
4/12/2013 9:28:58 AM | climateprediction.net | Temporarily failed upload of hadcm3n_4juh_1980_40_008323966_1_3.zip: transient upload error
4/12/2013 9:28:58 AM | climateprediction.net | Backing off 4 hr 49 min 30 sec on upload of hadcm3n_4juh_1980_40_008323966_1_3.zip
24) Message boards : Number crunching : Server is full (Message 45902)
Posted 12 Apr 2013 by BarryAZ
Post:
These are hadcm work units -- I'm getting transient upload errors after 100%.

For now I figure to back off and suspend Climate processing on those systems until the uploads clear. After all, there are more CPU cycles than Climate work units so for me, simply shifting off for several hours or even several days and processing other projects isn't a large pain.



Barry, are your Hadcm uploads OK? I haven't got any atm to check myself. It could be that after the upload outage a lot of files are waiting and the backlog takes a while to clear.

25) Message boards : Number crunching : Server is full (Message 45897)
Posted 11 Apr 2013 by BarryAZ
Post:
Les, it seems that there still are some upload issues -- not sure if this is simply leftovers from the full up condition or if there are still available space problems...


Les, you haven't made it worse at all. I realize you were not 'having a go at me'.

And I should note, that I am appreciative of the quick resolution of this - uploads are now going through and that is very much a good thing.

26) Message boards : Number crunching : Server is full (Message 45894)
Posted 11 Apr 2013 by BarryAZ
Post:
Les, you haven't made it worse at all. I realize you were not 'having a go at me'.

And I should note, that I am appreciative of the quick resolution of this - uploads are now going through and that is very much a good thing.
27) Message boards : Number crunching : Server is full (Message 45887)
Posted 11 Apr 2013 by BarryAZ
Post:
Les, I KNOW the communication/action lag is not on your end <rueful smile, yet again>.

I realize it is a matter of priorities and complexities. Once I notice the upload issue on my systems, I suspended processing on Climate to give other projects some cycles and not add to the Climate backlog.

There is no truth to the rumour that climateprediction.net issues are the cause of the slowdown in global temperature rises over the past few years.



The "paperwork" was completed about 12 hours ago, and apparently "the other people" are onto it.
(We're getting better :) ).



28) Message boards : Number crunching : Server is full (Message 45885)
Posted 11 Apr 2013 by BarryAZ
Post:
Given past response sequence -- Les is aware and will have already passed the surprise information back to the admins. The admins will likely let Les know they have received his message by sometime on Thursday or Friday and the problem will be temporarily resolved sometime next week <rueful smile of been there done that>
29) Message boards : Number crunching : "Project has no tasks available" (Message 45078)
Posted 14 Oct 2012 by BarryAZ
Post:
My sense is that Les may be unhappy that he has no more responsive answers regarding work availability than he has been able to provide. This is similar to the lack of timing information he has to provide when there are disk storage problems.

Seemingly that sort of information is considered confidential by the folks who actually work on the project. Perhaps it is a case of the project folks seeking, by holding project status information back, to address the problem of too many processors for the project to handle by 'guiding people' away from this project to other projects which tend to be more responsive to status and information requests.

In this scenario, Les is caught in the middle -- his last post in this thread suggests to me a bit of discomfort with that role.
30) Message boards : Number crunching : Upload Server Out Of Disk Space (Message 44916)
Posted 27 Sep 2012 by BarryAZ
Post:
Les, it seems that when the drives run out of disk space, either the watch bot has failed or the watchers were perhaps watching something else. I can understand it, as these servers it seems are one of the admins, 'oh by the way, would you take care of the Climate folks servers in addition to your full time job' sort of things.

It isn't that big a deal for me -- when the 'out of disk space' message first popped for me on a Climate upload, on the Saturday, I readily suspended all Climate processing that I have going. I figured, based on history here for the project, it might take days to perhaps a week or more before it got properly sorted out.
31) Message boards : Number crunching : Uploads not working (Message 44900)
Posted 25 Sep 2012 by BarryAZ
Post:
Les, I understand -- happens often enough over here - as soon as I spotted the upload problem, I suspended my Climate apps and let other applications cycle along. I long ago learned that one should have two or three applications running on a workstation for each the CPU apps and the GPU apps.

32) Message boards : Number crunching : Upload Server Out Of Disk Space (Message 44896)
Posted 25 Sep 2012 by BarryAZ
Post:
Perhaps someone back at the ranch might consider deploying some status routines to notify them automatically in advance of this. I'm sure there are status routines available. This issue, has popped up frequently enough to suggest a bit of proactive handling..
33) Message boards : Number crunching : Upload Failure (Message 44290)
Posted 4 Jun 2012 by BarryAZ
Post:
Yup - it is the hadam3p == seems, over the years that the server used to support the hadam application uploads/downloads is the most fragile of the servers. Other uploads have not had problems this week for me.

Perhaps we should redirect by avoiding that set (and server) down the road to reduce the load on that server...
34) Message boards : Number crunching : Upload Failure (Message 44284)
Posted 2 Jun 2012 by BarryAZ
Post:
Upload problems here -- so you can now note that upload problems by clients have been reported -- since late Friday night here (west coast US time)
35) Message boards : Number crunching : Inconsistent credit numbers on different web pages (Message 43018)
Posted 26 Sep 2011 by BarryAZ
Post:
Ah, I was wondering if there was any notice of that situation. After a long hiatus, over the summer, the scripts had been running regularly until last week....

More likely something to do with the various credit scripts not running for the last couple of days.


36) Message boards : Number crunching : another upload problem ? (Message 42156)
Posted 11 May 2011 by BarryAZ
Post:
Looks like things have been stuck for a couple of days. Certainly trickles haven't posted since Sunday.

I suspect we will see some pronouncement regarding this within the next few days...
37) Message boards : Number crunching : Can't upload for 12 days (Message 41731)
Posted 8 Mar 2011 by BarryAZ
Post:
Les, the numbers I was referring to were the not the stats site numbers. I was comparing numbers on the Climate user account. The user account numbers for me from late February are about 7,500 HIGHER than the user account numbers today.
38) Message boards : Number crunching : Can't upload for 12 days (Message 41725)
Posted 7 Mar 2011 by BarryAZ
Post:
Right, I saw that information posted back in December, but as you did post here, you ran into the 'no good deed goes unpunished' loop. <smile>.




Sorry about that - I haven't officially worked for them since 1/12/10, and when I have I've had other things to deal with, e.g. the complete change of IP addresses for the department.

39) Message boards : Number crunching : Can't upload for 12 days (Message 41724)
Posted 7 Mar 2011 by BarryAZ
Post:
For what it's worth, when I check back toward the end of February - I had 9,281,477. The current status reports me at 9,273,965. So I've 'lost' about 7,500 credits (perhaps more as there may have been a few trickles before shutdown.



Thanks.
Of the few we've seen so far, most are the same or very close (a few points either way). Very few so far seem to show discrepancies this large.
This probably means everything will have to be switched on and discrepancies pursued later.

40) Message boards : Number crunching : Can't upload for 12 days (Message 41720)
Posted 7 Mar 2011 by BarryAZ
Post:
Milo, thanks for the explanations. These past few months have been a bit troublesome for the Climate project (at least from this users perspective). The frequent fully offline status, along with what seemed to be extensive scheduled daily maintenance/backup runs have been at least somewhat frustrating to work with and observe.

It sounds like (from your message and Les' message) that the light should be at the end of the tunnel -- but that (from what Les posted) it might be a very long tunnel yet.

I sort of wonder if there is some sort of 'size wall' that BOINC projects run up against when the combination of user count and length of the project conspire with the available database and storage technologies to present so large a volume of data that all sorts of difficulties arise. It does seem, that dealing with this difficulties presents something of a (to use an American phrase) 'whackamole' scenario where in addressing one problem another (related or not) pops up.



Also, I for one am hoping that a large chunk of the massive data store you


No model data are ever backed up - the uploaded data sit on the upload servers (until moved) or archive servers, trusting to RAID 5 or 6, and that's it. There's over 80TB now and there's nowhere to to back that up to.
However, the main project database does need to be backed up, and this is what has been causing problems; the MySQL database was taking up just under 500GB and was very slow to dump. Some of those data (handled messages from host) have been cleaned up and some old results archived so once everything is turned back on it should run much more smoothly, particularly when the new staff install a database slave.

The delay at the moment is that the credit calculation procedures are refusing to add new credit and I presume that that will be the top priority for the CPDN staff today (I expect to be seeing them shortly). Credit is re-calculated daily in its entirety from results/trickles.



Previous 20 · Next 20

©2024 climateprediction.net