climateprediction.net home page
Posts by Milo Thurston

Posts by Milo Thurston

1) Message boards : Number crunching : Account Updating! (Message 42695)
Posted 28 Jul 2011 by Profile Milo Thurston
Post:
Now that the trickles seem to be working they'll have to run the big stats script. When that happens it's likely to punish the database quite severely.
2) Message boards : Number crunching : Why haven't I recieved credit in a long time? (Message 42679)
Posted 26 Jul 2011 by Profile Milo Thurston
Post:
If you've been keep up with the posts on the News thread at the top of this section, you'll know that one of the main server's disks failed recently.
All of the bespoke scripts to do with trickles and credits were on it, so they're having to be re-compiled and installed on the new disks.
This is taking time, as this project uses a very customised version of the BOINC server code.


There was, of course, a backup of this server on the
university backup service, but recovery from backup turned out to be somewhat less than easy. A lot of things had to be re-compiled for the new system anyway.
3) Message boards : Number crunching : NO WORK! (Message 42297)
Posted 31 May 2011 by Profile Milo Thurston
Post:
There's some irony in the fact that CPDN uses thousands and thousands of computers around the world to run climate models, but grinds to a halt because the central server has to be used to calculate credits. Why can't that work be farmed out?


The credit calculation process requires write access to the main BOINC database, and as this is MySQL configured with one master and one slave it means that the writing has to be done to the master. I am sure that the CPDN staff would prefer it if the master database were accessed from inside the site firewall. It may be possible to move the stats dump from the master to the slave, which would help enormously.

On my current project, rather than using MySQL I've switched to MongoDB, which has some very nice capabilities (e.g. http://www.mongodb.org/display/DOCS/Sharding+Introduction).
4) Questions and Answers : Getting started : Thread to report problems accessing the climateprediction.net forum (Message 42042)
Posted 27 Apr 2011 by Profile Milo Thurston
Post:
The blank screen has been sorted out - Jonathan pointed me in the direction of the offending code.
This seems to be caused by trying to make the new anti-spam measures work with Tapatalk.
5) Message boards : Number crunching : Can't upload for 12 days (Message 41723)
Posted 7 Mar 2011 by Profile Milo Thurston
Post:

In that case I should mention that according to BOINCStats I have credit of 7,508,804, wheras Climate Prediction credits me with only 7,477,627. In the last few months BOINStats has usually been a day or two behind CPDN in this regard.


Thanks.
Of the few we've seen so far, most are the same or very close (a few points either way). Very few so far seem to show discrepancies this large.
This probably means everything will have to be switched on and discrepancies pursued later.
6) Message boards : Number crunching : Can't upload for 12 days (Message 41721)
Posted 7 Mar 2011 by Profile Milo Thurston
Post:
Milo, thanks for the explanations. These past few months have been a bit troublesome for the Climate project (at least from this users perspective). The frequent fully offline status, along with what seemed to be extensive scheduled daily maintenance/backup runs have been at least somewhat frustrating to work with and observe.


Sorry about that - I haven't officially worked for them since 1/12/10, and when I have I've had other things to deal with, e.g. the complete change of IP addresses for the department.
By the way, the new temporary sysadmin has finished the database upgrade and is waiting for users to report that their credits are OK before he turns all the daemons back on.
7) Message boards : Number crunching : Can't upload for 12 days (Message 41718)
Posted 7 Mar 2011 by Profile Milo Thurston
Post:

Also, I for one am hoping that a large chunk of the massive data store you folks have was moved to near line storage so that it doesn't have to be backed up with your back up routines.


No model data are ever backed up - the uploaded data sit on the upload servers (until moved) or archive servers, trusting to RAID 5 or 6, and that's it. There's over 80TB now and there's nowhere to to back that up to.
However, the main project database does need to be backed up, and this is what has been causing problems; the MySQL database was taking up just under 500GB and was very slow to dump. Some of those data (handled messages from host) have been cleaned up and some old results archived so once everything is turned back on it should run much more smoothly, particularly when the new staff install a database slave.

The delay at the moment is that the credit calculation procedures are refusing to add new credit and I presume that that will be the top priority for the CPDN staff today (I expect to be seeing them shortly). Credit is re-calculated daily in its entirety from results/trickles.
8) Message boards : Number crunching : Any glimmer of hope for any Mac tasks? (Message 41686)
Posted 4 Mar 2011 by Profile Milo Thurston
Post:
A new programmer has been hired in Tolu's role. He's only just got started and is currently on 50% time, and so nothing has been done yet. I presume that it will get done eventually has he is being supplied with a Mac for development.

My replacement is only temporary and he's involved with a dreadful database task at the moment.
9) Message boards : Number crunching : News and Announcements (Message 41655)
Posted 21 Feb 2011 by Profile Milo Thurston
Post:
Here's a message from Pardeep:

-----
Dear Seasonal Attribution Project participants,
The results of our work on attribution of the UK autumn 2000 floods have finally been published (http://www.nature.com/nature/journal/v470/n7334/full/nature09762.html), and have generated a fair amount of media interest!
I just wanted to say one final big thank you to all the participants who crunched simulations for us – we couldn’t have done the study without you! And I similarly want to say thanks for the enjoyable discussions I had back on the project’s old message boards.
It's sure taken a while to get this work finished, but now it’s great to see the follow-up weatherathome.net project is well underway, and I hope we’ll see more interesting results come out of that.
Happy crunching,
Pardeep
10) Message boards : Number crunching : Anyone else still not uploading! (Message 41642)
Posted 15 Feb 2011 by Profile Milo Thurston
Post:
The problem is not Kraken; it's Uploader1.atm


I've set uploader1.atm such that it will be accepting files for the moment. But, it's not processing them for inclusion in the results database as there's an enormous data transfer (18TB) going on.
11) Message boards : Number crunching : Anyone else still not uploading! (Message 41626)
Posted 12 Feb 2011 by Profile Milo Thurston
Post:
If you're trying to upload to uploader1.atm then please note that it's still down transferring data, a process that was interrupted yesterday and which I then re-started today.
12) Message boards : Number crunching : TRICKLE CANNOT UPLOAD, BUT, SERVER SHOWS IT ALREADY HAS (Message 41531)
Posted 24 Jan 2011 by Profile Milo Thurston
Post:

Thanks again Milo for keeping things working in your own/your new job's time when you could have abandoned the project.


You're welcome.
Kraken is now fixed but, as expected, heavily overloaded. So, don't be surprised if it appears to be down when you try to connect. Some people are getting through as files are arriving.
13) Message boards : Number crunching : TRICKLE CANNOT UPLOAD, BUT, SERVER SHOWS IT ALREADY HAS (Message 41523)
Posted 22 Jan 2011 by Profile Milo Thurston
Post:

....and apparently no one else around at Oxford Uni who knows how to fix the problem.


I suspect that they know how to fix it, but as the IP addresses have been switched earlier than I expected it means that various machines are now not connected to the network, and the only way to fix them now is to go into the OeRC machine room. No current CPDN staff have access to this room, though the new staff will.

Meanwhile, I'll have to do it next week.
14) Message boards : Number crunching : Servers (Message 41501)
Posted 18 Jan 2011 by Profile Milo Thurston
Post:
I've posted something on the main site's news thread about this - unfortunately it is beyond my control.
15) Message boards : Number crunching : CPDN not sync whith BOINCstats (Message 41499)
Posted 18 Jan 2011 by Profile Milo Thurston
Post:
Milo, how much of that 70TB is active? Perhaps a large piece of the data could be handled as archived data (near storage rather than live storage), thus simplifying and shortening the backup process and perhaps improve over all performance/


The problem is not really the size of data files (although that causes problems in other ways) but the number of database records referring to them. We can archive the older records, and have done so twice, but then they become inaccessible to users who want to look at past results. Neither Tolu nor I ever had any time to write code to let users examine archived data.

I used to have a database slave which could be used for backups (the hardware has since been retired) and I have suggested to the CPDN staff that they set up a new one.
16) Message boards : Number crunching : CPDN not sync whith BOINCstats (Message 41480)
Posted 14 Jan 2011 by Profile Milo Thurston
Post:
Les' point about data is a good one - CPDN now has about 70TB of stored data files, with multiple database records (file, parameters, workunit details &c.) for each of these.

It seems that the staff are willing to order another database server, which can be configured as a slave to the main BOINC database, and also to consider archiving more results. Meanwhile, I have at least got the stats dump to run, so hopefully boincstats will pick this up.
17) Message boards : Number crunching : CPDN not sync whith BOINCstats (Message 41457)
Posted 9 Jan 2011 by Profile Milo Thurston
Post:
Luckily I spotted this and running the script manually is not really any trouble.
Although the main stats script has been running nicely, the boinc_stats script failed due to being unable to connect to the database.
18) Message boards : Number crunching : Can't upload for 12 days (Message 41444)
Posted 6 Jan 2011 by Profile Milo Thurston
Post:
Sorry, but unfortunately uploader.oerc is still not ready; it is being much slower than I expected.
I will, of course, bring it back on line as soon as I am able.
19) Message boards : Number crunching : Can't upload for 12 days (Message 41438)
Posted 5 Jan 2011 by Profile Milo Thurston
Post:
I've almost got uploader.oerc back up again - hopefully it will not be much more than another day.
20) Message boards : Number crunching : Credits accruing but no trickles sent???? (Message 41418)
Posted 2 Jan 2011 by Profile Milo Thurston
Post:
Kraken has been up, but is being hammered by a large quantity of uploads, hence the erratic response.

uploader.oerc has had ~2TB out of ~4.5TB removed and will come back up when finished. This copy is proceeding in a slow but orderly manner.

climateapps1 is proving more problematic, but I have another plan to try this afternoon.


Next 20

©2020 climateprediction.net