climateprediction.net home page
Posts by old_user651284

Posts by old_user651284

21) Questions and Answers : Getting started : CPDN in workplace (Message 44277)
Posted 1 Jun 2012 by Profile old_user651284
Post:
Hello

    Our bandwidth is limited, and somewhat expensive. If we were to deploy
    across around 50 clients, what bandwidth utilization could we expect?



Every Weather at Home (hadam3p) work unit will return between 100 and 250 MB of data.
Every RAPID (hadcm3n) work unit will return ~ 250 MB of data.

In addition each client will need to download the executable from us, together with ancillary files


    Our client hardware is moderately up to date (mostly newer than three years old). Can we safely deploy CPDN without adversely affecting users' work?



...in my opinion, yes. You can set BOINC to run only when the computer is not being used for anything else, or between set times of day.
I would suggest that the best way to satisfy yourself as to its performance would be to install it yourself and see what you think.

    Are any tools available for centralized deployment? For instance, can default project/BOINC preferences be applied to teams?



...not sure, I don't belong to any teams, and haven't done any large deployments :-(

I did find this...

http://boinc.berkeley.edu/dev/forum_thread.php?id=4401
which gives hints about what is possible.


I hope that helps

Jonathan
22) Message boards : Number crunching : Upload Failure (Message 44075)
Posted 23 Apr 2012 by Profile old_user651284
Post:
Hi everyone,

We are currently suffering two server failures - both serious hard disk issues, so I am configuring another to take over their roles before I get around to sorting out those problems.

I will let you know how things proceed, but it will be at least 24 hours before we can consider ourselves back online.

Please accept my apologies.

Jonathan

CPDN Sys-Admin
23) Message boards : Number crunching : had3pam_eu models not uploading (Message 43905)
Posted 2 Mar 2012 by Profile old_user651284
Post:
Hi everyone,

We have had a number of problems with our upload servers over the last few weeks.
I believe I have addressed all of the operational issues that prevented file uploads, and the machines are now working to clear the considerable backlog.

I apologise for the inconvenience that this has caused.


Jonathan

CPDN Sys-Admin
24) Message boards : Number crunching : HADCM3N Tasks not showing on tasks list (Message 43468)
Posted 23 Nov 2011 by Profile old_user651284
Post:
Hi there,

The issue of phantom work units is down to a discrepancy between the primary BOINC database for CPDN, and the slave (backup) database.

Pages on this site are (or were until I worked out what was wrong) served from the slave database in order to take load off the primary database.

Some information about s subset of work units has not been transferred to the slave properly, so although they are received and processed, they don't show up on these pages.

I have reverted to using the primary database for this site, and will schedule some downtime in order to get the two databases back in sync.

Please accept my apologies for the inconvenience.

Jonathan
CPDN SysAdmin
25) Message boards : Number crunching : Set no gpu option (Message 43181)
Posted 10 Oct 2011 by Profile old_user651284
Post:
We are currently using the BOINC server_stable version, and we don't plan to upgrade again for some months (at least 2012).

As was mentioned, we have to compile in extra options when building the software, so we need to test that it will work before we deploy it on the live site. The last upgrade was done after a server crash; I am somewhat greyer than I was, and less than keen to repeat the experience :-)

However, we do have a roadmap of work that needs to be done to our system, and updating BOINC is definitely on there for next year.

Jonathan

26) Message boards : Number crunching : Crunching Nonexistent Task (Message 43072)
Posted 28 Sep 2011 by Profile old_user651284
Post:
Sorry about this, it is due to our backup database having got out of sync with the main database.

The backup database has been handling read-only requests from this website, to take the strain off the main database. Unfortunately it seems that this has caused it to be unable to cope with keeping up to date :-(

I have now reverted to using the main database for this site, so things should be back to normal.

Jonathan
27) Message boards : climateprediction.net Science : Server name change (Message 42917)
Posted 16 Sep 2011 by Profile old_user651284
Post:
Hi,
As part of an ongoing move of CPDN servers to a new machine room I have deprecated the old domain name of this server.

Requests for climateapps2.OUCS.ox.ac.uk are now being handled by the server's new name of climateapps2.OERC.ox.ac.uk.


One thing that may not work is cached passwords for this forum. You may be required to log in to the forum under its .OERC address because your browser will treat it as a separate forum from the old version under the .OUCS address.

This has been necessitated by University bureaucracy.
If there are issues that arise then it may be possible to postpone this change, but I hope that everything will work as expected.

Jonathan
CPDN SysAdmin
28) Message boards : Number crunching : Is the redirect from OERC to OUCS still in place? (Message 42805)
Posted 24 Aug 2011 by Profile old_user651284
Post:
The server was out of space, so she could not accept the incoming files.
This was due to us having to re-jig the storage location of 25 TB of data - we have to play musical chairs with the storage space, and unfortunately climateapps1 was not fast enough in sitting down :-)

It should be resolved now - let me know if not.

Jonathan
29) Message boards : Number crunching : hadam3p_eu crash 45 seconds in. (Message 42547)
Posted 4 Jul 2011 by Profile old_user651284
Post:
We have been investigating the problem with the Hadam3p work units. It appears that the crash is caused by the combination of two perfectly normal forcing files.

The SST and SI files were altered in the previous suspect run. If either of these files is substituted for the previous version, the model runs perfectly well. The crash only occurs when both files are specified as inputs to the same work unit.

We are conducting tests on the Met Office UM in order to try to find out why this should be the case.

In the mean time, the current release of Hadam3p work units are resubmission jobs of proven work units. These should be fully functional since we are extending the duration of previous experimental runs.

Jonathan
30) Message boards : Number crunching : hadam3p_eu crash 45 seconds in. (Message 42538)
Posted 3 Jul 2011 by Profile old_user651284
Post:
Why never test these loser WUs before testing the volunteers bandwidth?
A few hundred thousand times 100 or so MB -- what's that to a volunteer?
These obviously never tested WU -- EU4 -- yeah, you can figure it out later, after wasting my time and bandwidth.
Why never test before sending a gazillion to us?
Huh?

Don't you newb clowns test anything before you send a few bazillion WUs out?

Forgive me, I've been volunteering my machine's time for a decade --
Did you try even one of these loser models at home before you sent a few hundred thou out to us? Don't think so.. It's obvious.

Please -- don't abuse the volunteers.

Do some minimal testing before you send a totally wasteful broken model times 300,000 to us crunchers. OK?

Actually, I'm really annoyed by this last batch of broken s*** that I download, it breaks, -----

Do you do ANY testing before sending this stuff?

No, obviously not.

And yes, I, and a few others, are annoyed.

If you dare, apologize.

Eric


Dear Mr Redd,

I certainly do dare to apologise for the inconvenience and annoyance that the current round of problems have caused you, and all other supporters of CPDN.

The recent Hadam3p release was, as far as I am aware, produced in exactly the same manner as previous hadam3p releases (which have been running successfully).

I apologise for the wasted bandwidth that we have caused. We have been made fully aware how much this has upset people, and we will, in future, take steps to minimise these problems.

I will be posting an apology on the news items on this site on Monday.

Finally, I would ask that, annoyed as you obviously are, please don't abuse the project staff.

Jonathan Miller
CPDN System Administrator
31) Message boards : Number crunching : HadCM3 Full Resolution model low credits (Message 42096)
Posted 3 May 2011 by Profile old_user651284
Post:
I don't know why this is happening, so I will take a look, and maybe come up with an 'Official Permanent Fix' (sounds ominous :-( ).


32) Message boards : Number crunching : HadAM3P-PNW disappeared? (Message 42044)
Posted 27 Apr 2011 by Profile old_user651284
Post:
DIDN'T NEED flag

Hi Everyone,

Some of the work units that get processed contain particular parameters that are of interest to the CPDN project. The BOINC system has a method for allowing us to gather more info on certain parameter sets by resubmitting a work unit to the pool of available work units.

The DIDN'T NEED flag means that the CPDN project did/do not need to resubmit the work unit for additional processing.

The flag can mean a number of things, and is combined with other flags in the database to determine exactly why we don't need to reprocess it. One of the common reasons is that the current run gives us exactly the info we need.

It is unfortunate that the flag gives the impression that we are not interested in the work unit - we certainly are interested.
We are looking into how we can make this more clear on the work unit info pages.

Please accept our apologies for any confusion or consternation this may have caused.

EDIT: We have now altered this flag to read "No Resubmission" which is a more accurate reflection of the status of the work unit.

Jonathan
CPDN SysAdmin
33) Message boards : Number crunching : Upload problems (Message 41988)
Posted 21 Apr 2011 by Profile old_user651284
Post:
My feeling is to agree that it is not merely a PNW issue.
I have found references to malformed URLs on more than one upload server.

I think the best strategy would be for the CPDN sysadmins to search the apache logs on our servers and try to find out which models are failing, and use them to pull out the info about the client types that are failing.

I do agree that it would be good to find out whether this happens with other BOINC projects.

...so I have another item on my to-do list!

Jonathan
CPDN SysAdmin
34) Message boards : Number crunching : HadCM3 Full Resolution model low credits (Message 41972)
Posted 12 Apr 2011 by Profile old_user651284
Post:
I will reaffirm previous statements that the various credit issues that have arisen recently will be addressed very shortly during the next phase of work on the CPDN databases.

This work will be carried out within the next 7 days.

The database contains all the information from which user credits are calculated, so it is imperative that the database is running as it should.

During the maintenance I will make the appropriate corrections to credit rates for the HadCM3 Full Resolution model, and when the user credits are re-calculated those who have run these models will be rewarded appropriately.

INTERESTING CPDN FACTOID: The database that holds all the records of user-contributed data is about 0.5 TB in size, and allows us to keep track of the 80 TB of data that CPDN supporters have produced.

The forthcoming work on the database will involve adding a second database server to take some of the load off the main server, allowing us to manage the huge amount of data that the community so kindly provides for us.

The staff at CPDN do appreciate all the work that is put in by the community, and I hope you can see that we do respond to feedback from our users (though I admit in recent months the response time has been slow due to lack of staff).

Please hang in there, chaps.

Jonathan
Interim CPDN IT Support
35) Message boards : Number crunching : HadCM3 Full Resolution model low credits (Message 41967)
Posted 11 Apr 2011 by Profile old_user651284
Post:
Hi,
I'd like to second what Neil has said.
Correcting the problems with the credits is on our list of things to do.
The low credit score for a given model is something that can be corrected in a straightforward manner.

The trickles issue is 'known' and we have a solution.

Please accept my apologies for the annoyance this causes you - I will be correcting the issues over the course of the coming week, when I will be doing another round of maintenance on the project's database servers.

Hang in there, chaps, the project relies upon your efforts.

Jonathan
interim CPDN IT support


Previous 20

©2024 climateprediction.net