climateprediction.net home page
Posts by Pete B
Posts by Pete B
log in
1) Message boards : Number crunching : New small batches of long runs --> with problems (Message 55984)
Posted 30 Mar 2017 by Profile Pete B
Running this type of model on a slow PC would really take forever.


Interesting. Looking back at my first CPDN PC in 2006 (Pentium 4 CPU 3.06GHz) it had a floating point speed of 1420 million ops/sec. My latest creation (i7-6900K CPU @ 3.20GHz) is 4870 million ops/sec.

Now although that is a factor of 3.3 difference, it is not as large a difference as I was expecting given it is 11 years on. Although agreed, it would still be taking forever :-(


Reminiscing about old runs and machines, I remember running one of the first HADCM3 Spinups back in 2005/6 in advance of the BBC Expt on a Pentium 4 @ 3.2GHz and 1GB RAM. This ran at around 2sec/TS on that machine. I recently found I still had an old backup of the spinup files in an as downloaded but not started state.

I made all the adjustments to the files to get the spinup to run on my current i7 6700 with 16GB RAM in 2017. I ran it for a day and checking the zip files it was returning 0.45sec/TS, about 4.5 times faster than on the 12 year old Pentium 4 machine.

The old Pentium 4 machine was single core (2 if running hyperthreading). This is 4 core (8 if running hyperthreading). Taking both into account, some difference :) I'm only actually running 2 models together on it though just now.
2) Questions and Answers : Windows : New Build Running Windows 10 (Ver 1607) (Message 55443)
Posted 7 Jan 2017 by Profile Pete B
From my observations, the server status page only seems to update about every 4 hours. I haven't worked out yet whether it is regular or based on some sudo-random number generator.


The other thing I note now is that if you run an update/reset request to the project from the BOINC Manager, it will auto repeat the request every hour or so for limited time then stop if nothing is downloaded.

IIRC, older versions of BOINC Manager appeared to auto repeat every hour or so until they downloaded some WU's. The CPDN Project is very quiet at present compared to previous times I've participated and there virtually always seemed to be WU's on at least one of the many experiments so maybe that's the reason why it always seemed to soon download something in the past.

I can't see any setting in the Manager to continue hourly checks until something is downloaded so I'll just have to issue a manual update occasionally as well. If I'd done that yesterday morning, I may have caught something before the server was empty yesterday afternoon.

So, I'm now running WU's from one of my backup projects for the time being until a new set of WU's appears on CPDN.
3) Questions and Answers : Windows : New Build Running Windows 10 (Ver 1607) (Message 55437)
Posted 6 Jan 2017 by Profile Pete B
Update:

OK, it looks like all WU's had actually gone at the point I tried at about 15:30 even though some were shown on the then latest server update. The latest server update for 16:10 shows it's now empty.
4) Questions and Answers : Windows : New Build Running Windows 10 (Ver 1607) (Message 55436)
Posted 6 Jan 2017 by Profile Pete B
Hi, I've recently returned to the project, initially on an old Win 7 machine but now upgraded to a new Win 10 (1607) Intel I7 6700 machine with BOINC 7.6.33. I soon completed the WU's that were originally downloaded on the old PC on the new PC but now seem to be unable to download the latest WAH2 WU even though over 1000 are shown on the server.

I've checked all the project settings in my account, made sure all model types are ticked and reset the project in BOINC but still no downloads.

I've looked through the message pages but nothing to indicate that Windows 10 is a problem downloading work other than the old Windows 10 thread in this section.

Are some model types not compatible with Windows 10 or is there something I've missed? The PC did actually download one workunit on 24 Dec but nothing since but I thought nothing of it as the server showed no work available until now.

Thanks.
5) Questions and Answers : Unix/Linux : Computation errors. (Message 50695)
Posted 31 Oct 2014 by Profile Pete B
I'm having the same problem after upgrading Ubuntu recently from an old 9.04 (which was running CPDN perfectly with the relevant libs) through various iterations to 14.04 now. It seems to be the libstdc++6 (lib32stdc++6?) that is my problem, all the others are now installed. I have now located the libstdc file but still have to install it but haven't found lib32stdc++6. I was thinking of downloading a 32 bit version of 14.04 and running it as a liveCD to see if the file is there, then moving it over to somewhere I can access it and installing it from there.
6) Message boards : Number crunching : Update on HadCM3 'Short' WU crashes with shutdown in Windows (Message 50666)
Posted 29 Oct 2014 by Profile Pete B
There are a large number of reports across the thread about the HadCM3 'Short' WU's crashing in Windows if BOINC is stopped. I don't know about earlier WU's but with the current batch of WU's, I am not experiencing this.

Last week, I (absentmindedly as I thought immediately afterwards) suspended BOINC and shut my PC down for a reboot after installing a driver for some unrelated software. I fully expected a crash of the 3 running 'Short' WU's on restarting BOINC but they didn't, they carried on running to completion as if nothing had happened.

I, intentionally this time, repeated the exercise of BOINC suspension and PC reboot for some Windows updates yesterday and the 2 'Short' WU's, together with an EU AM3 all restarted without a problem.

I'm running Windows 7 incl Sevice Pack 1, BOINC version 7.2.42 and the method I use, and always have, is to first suspend the running project via the Activities dropdown which suspends all running WU's simultaneously. I then wait about 30 secs to give a chance for any disc writing to complete, then exit from BOINC. I then shut the PC down.

On restarting, I wait until everything has started, then start the BOINC manager. I then restart the project via the 'Activities' window. No crash with 'Short' WU's yet.

I haven't tried a drastic BOINC process stop by shutting down the PC with BOINC still running, maybe that would crash the WU's?.
7) Message boards : Number crunching : hadcm3n Full Res Ocean out of memory error (Message 50635)
Posted 27 Oct 2014 by Profile Pete B
All mine have so far failed, mainly 'r's but a couple of 's's too. On my system, they all seem to go just before the first checkpoint at the end of the sixth model day, all with the 'Invalid Theta Detected' error. They have all crashed on other machines too so any I get now that are re-issues after previous fails, I'll probably abort.

I'll just see what happens with any WU's I get that are not previous failure re-issues before the fifth re-issue cutoff point. There can't be many more first issue WU's left.
8) Message boards : Number crunching : hadcm3n Full Res Ocean out of memory error (Message 50597)
Posted 23 Oct 2014 by Profile Pete B
All 3 of mine, PC 827263, 2 'r' and 1 's' have gone with the "Invalid Theta". WU's 9221758, 9224048 and 9222404. It's interesting that the third one had already failed twice on other PC's before mine, one with 'Invalid Theta' and one with 'out of memory'. Because I suspended all the various other WU's in the queue to force the CM3n's to run first, the computer will not now run any more till I can reset them later.
9) Questions and Answers : Windows : WINDOWS 10 (Message 50526)
Posted 13 Oct 2014 by Profile Pete B
Why not install it alongside your existing Windows 7, either as a dual boot system or even in a virtual box? With the latter, you can run them both together side by side. If I had enough time at present, I'd give the latter method a go as I already run Win7/Linux this way.
10) Message boards : climateprediction.net Science : Reporting misconfigured computers (Message 50323)
Posted 26 Sep 2014 by Profile Pete B
This one seems to have crashed everything over the last 3 months or so. Last completed one seems to be mid June:

http://climateapps2.oerc.ox.ac.uk/cpdnboinc/show_host_detail.php?hostid=1331421
11) Message boards : Number crunching : HadCM3 short - errors galore (Message 50322)
Posted 26 Sep 2014 by Profile Pete B
It's this one: http://climateapps2.oerc.ox.ac.uk/cpdnboinc/workunit.php?wuid=9164996

I guess I've been lucky with my computers. Most of my crashes are because of power failures (when the batches are not faulty). I now make backups when climateprediction have longer models, that are more susceptible to sudden shut down.


One of the other PCs seems to have crashed just about everything (wrongly set up) and should probably be flagged up as a misconfigured PC.

The other PC seems to have crashed CM3S's, even during periods the batches generally run well, but has completed other WU's. From what I've read on here and looking at the crash details, it's probably not being run continuously and this WU doesn't seem to like being stopped. Each one has crashed after different run times so the actual PC setup is probably OK.
12) Message boards : Number crunching : HadCM3 short - errors galore (Message 50319)
Posted 26 Sep 2014 by Profile Pete B
Strangely, I picked up a WU that had crashed with lots of "Model crashed: ATM_DYN : INVALID THETA DETECTED." on another computer, and mine completed it without problem. Luck?


In the days when we used to run 200 year long HADCM3 models that could take a month or more to get through, it was usual to make backups fairly regularly so if a run crashed, it could be re-run from a short time before the crash. Often, it would crash again with the same fault, usually the one quoted above, indicating close to the edge parameter sets.

I used to be running both Intel and AMD CPU machines and there was more than one occasion where repeated crashes on one CPU could be got through by transferring the model (the complete BOINC backup) to the other CPU machine, running it past the crash point successfully and either continuing on that machine or transferring it back to the other and finishing successfully.

As Les says, hopefully, that kind of situation would have given the researchers some valuable data about the state of that particular model and parameter sets used.

I tried to look at your models to pick out the one you were referring to and see what it had crashed on before and what you were running but your computer/s is/are hidden so it couldn't be done.
13) Message boards : Number crunching : Latest HADCM3S WU's Crashing (Message 50203)
Posted 16 Sep 2014 by Profile Pete B
My latest CM3S, the first one I downloaded from the smaller (older?) set on the server late last night, presumably after the troublesome ones had been removed for correction, is holding up and running well.
14) Message boards : Number crunching : Latest HADCM3S WU's Crashing (Message 50172)
Posted 15 Sep 2014 by Profile Pete B
Hi Jonathan

Thanks for the update, that explains it.

My system successfully completed the 6th from the current batch that it downloaded on Friday following 5 crashes over Thu/Fri. It has then crashed another 10 since, all with the same error. I have now deselected the HADCM3S WU's for the time being until everything is OK. I'll just wait to see if I'll be lucky and find one of the odd ones occasionally returned from the other WU sets.
15) Message boards : Number crunching : Latest HADCM3S WU's Crashing (Message 50157)
Posted 13 Sep 2014 by Profile Pete B
Welcome back, Pete! Good to see you here again.

My machines, all various Intel quads with various Windoze OS, have been among the "lucky" ones. They completed hundreds of HadCM3s tasks with occasional crashes (after ~8 seconds), single digit percentage. Apparently, part of the release was misconfigured by the scientist(s). Luck of the draw for some of us.

In addition to the ongoing 32-bit library problem with 64-bit Linux installations, there was something about boinc 'service' installations in Windows. Not sure whether the latter was a problem for HadCM3s tasks. (Dodgy old memory...)


Hi Jim, good to be back and see some interesting real recent situation analyses being done now. I just found over the last 3 years with other things going on and a long daily work commute that began back then, I no longer had the time to play a part in the model testing and analysis and didn't want to crunch just for credits sake.

The Windows v Linux issue with these CM3S's is an interesting one. The ones that failed on mine seemed to be failing on every other machine also but they were all Windows OS with one exception. The Linux exception did not look representative though as it had crashed many other WU's also.

Now, after one CM3S crash yesterday, I then got a WU that is holding and is now more than 50% complete so there seems to be more than just a Windows OS issue with them.
16) Message boards : Number crunching : Latest HADCM3S WU's Crashing (Message 50128)
Posted 11 Sep 2014 by Profile Pete B
The last 2 wus i received said "computation error". I don't think they even started running, it seemed like a problem with the download. If someone wants to tell me how to get the details of the error i will try and post them here. Go easy on me though, i'm not a computer wiz, explain step by step.


Yours has the same error. If you click on the relevant Task ID details on your PC Task list, then click on the '+' box after 'Stderr', it will open the Stderr list and show the details. Here is on of yours.

HTH
17) Message boards : Number crunching : Latest HADCM3S WU's Crashing (Message 50127)
Posted 11 Sep 2014 by Profile Pete B
Yes, I'd noticed that issue when searching through for info and if it hadn't been that mine had successfully completed some of these HADCM3S's in the batch loaded onto the server at the end of August, I would have put it down to that problem and loaded up the Linux Virtualbox on the same machine to see what happened.

The same 4 WU's which failed on mine today had multi failures on other users machines also and all but one of those are Windows systems. One is a Linux system here but that seems to have an issue as many other WU's had failed on it too.
18) Message boards : Number crunching : Latest HADCM3S WU's Crashing (Message 50123)
Posted 11 Sep 2014 by Profile Pete B
I've just returned to the Project about a month ago after a 3 year absence from it. Using, by todays standard a relatively outdated PC till I get round to building another, I've run a few regionals for PNW and now running some ANZ's. Also completed without problem some HADCM3S's a week or so back but from the latest batch of those WU's put on the server yesterday, my machine has rapidly crashed 4. Checking the history of the WU's, all 4 seem to have suffered crashes on multiple machines, some of which have run many other types to completion. My failure mode is shown:

<core_client_version>7.2.42</core_client_version>
<![CDATA[
<message>
The device does not recognize the command.
(0x16) - exit code 22 (0x16)
</message>
<stderr_txt>

Model crashed: INITTIME: Atmosphere basis time mismatch

Model crashed: INITTIME: Atmosphere basis time mismatch

Model crashed: INITTIME: Atmosphere basis time mismatch

Model crashed: INITTIME: Atmosphere basis time mismatch

Model crashed: INITTIME: Atmosphere basis time mismatch

Model crashed: INITTIME: Atmosphere basis time mismatch
Sorry, too many model crashes! :-(
09:25:21 (3020): called boinc_finish

</stderr_txt>
]]>

I'm not aware of the development history of the HADCM3S WU's but is there a set of bad parameters still in some of these?
19) Message boards : Number crunching : No globe in graphics (Message 38375)
Posted 23 Nov 2009 by Profile Pete B
If I manually switch models and the other CPDN slab is waiting to run I can see its graphics with the globe and progress stopped but viewable.

When I manually switch models again the globe that didn\'t show before in the graphics now displays. So it was just a temporary glitch for some unknown reason.

Bizarre.


Hi Mo

Had you by any chance recently restarted BOINC so that this model hadn\'t yet run in the current session? If so, it will be blank as no current session graphic data will be stored in memory.

Once a model has run for a bit though in the current session, then gone into that state for another one to run for a period, the graphics will show the data at the point the model stopped.

Otherwise I just don\'t know.
20) Questions and Answers : Unix/Linux : Ubuntu 8.10 running in a Sun VirtualBox VM (Message 36866)
Posted 6 May 2009 by Profile Pete B
I'm not sure if this could be the problem, but under VMware the guest CPU clock calibration can get messed up by frequency scaling of the host. Turn off the host's Cool 'n Quiet in BIOS, and turn off all power manangement programs on the host and guests.

Good luck.


I'm certain you are correct in your deduction of the cause of the problem. Since I made the OP, I've done various tests and found effectively what you are saying. I never had the C&Q or any of the other CPU power management stuff enabled in the MoBo BIOS or the Windows host OS but nevertheless, the frequency scaling can become a problem and stop the normal time scaling of a VM operation.

If it happens, and it can sometimes happen during running of both host & virtual machines, say when a run completes, uploads and a new one starts on either machine, then the only way of getting back to correct timing is to shut down BOINC on both the real machine and the VM, shut down & reboot the VM, then start the VM BOINC Client before restarting the real BOINC Client. Strangely though, it doesn't always happen under these circumstances so it must depend on exactly what any running processes (BOINC or otherwise) are doing what on either machine at critical times. Messing around with the CPU allocations for either VM or real Clients in Windows 'Task Manager' has no effect on stopping the mistiming once it's happened.

Therefore, I see it as a VirtualBox issue rather than a Linux (the guest OS) one and one that may get resolved in a subsequent release, especially if & when VBox becomes capable of utilising more than one core of a multi core CPU.

Incidentally, I see that Windows 7 release will have µSoft's VirtualPC integrated into it for running XP legacy stuff - another lesson from Mac maybe ;-)


Next 20

Main page · Your account · Message boards


Copyright © 2017 climateprediction.net