climateprediction.net home page
Posts by MarkJ

Posts by MarkJ

21) Message boards : Number crunching : Linux/Mac/Windows segmentation (Message 51849)
Posted 18 Apr 2015 by MarkJ
Post:
How does that work Mark? I and I suspect a lot of other crunchers have never used virtual boxes. What is the performance hit if any to crunching and other work particularly if running on minimum spec for memory?

You install VirtualBox on your machine. You can do that either via the BOINC+Vbox installer or do each individually. I prefer individually as there have been reports of the BOINC+Vbox installer failing on the Vbox part. Also you can usually get a more up to date Vbox directly from the source (virtualbox.org). The recent versions of BOINC will detect the presence of Vbox once its installed.

The project sends Vbox virtual disk images as work units. The disk image has whatever OS (usually a flavour of Linux) and everything else to run the task.

Overheads-wise they use more memory of course due to the Vbox memory footprint. The CPU usage would depend on the task but CPDN are single-threaded so I'd expect 1 CPU thread per work unit running.
22) Message boards : Number crunching : Linux/Mac/Windows segmentation (Message 51844)
Posted 17 Apr 2015 by MarkJ
Post:
I wonder if it would be better for the project to create the work units as Vbox images, then they wouldn't need to target a specific OS and could have a base image that runs their preferred OS with the necessary data files and programs included.
23) Message boards : Number crunching : HadAM3P HadRM3P PNW Visual Fortran failures (Message 50833)
Posted 15 Nov 2014 by MarkJ
Post:
Thanks Iain. Unfortunately if I do service mode install I would lose the ability to use the iGPU for crunching, even though it wasn't doing any at that time.

From what I gather in the other Visual Fortran thread its to do with the graphics app not working with the Intel iGPU under Windows. Is this correct?
24) Message boards : Number crunching : HadAM3P HadRM3P PNW Visual Fortran failures (Message 50831)
Posted 15 Nov 2014 by MarkJ
Post:
I picked up a bunch of these this morning. 15 of them failed after 4 seconds CPU time with Visual Fortran errors. This is across 3 separate machines (all the same config, dedicated BOINC crunchers). Looking at them the wingman has also failed after 4 seconds so I don't think its just me.

I left some of them going for 10 hours (elapsed time) and they show up in BOINCtasks as zero CPU time, no checkpoint and using 48-52Mb memory. The ones that work have non-zero CPU time, do checkpoints and are using around 148-162Mb of memory. I decided I needed to access the machines after they didn't appear to progress and sure enough the Visual Fortran popups were there.

I have 5 more that seem to be running across the 3 machines.

Links to some of them:
No 1
No 2
No 3

Edit
Looking through the Visual Fortran thread that was for different models it would seem Windows and Intel iGPU's seem to be a common denominator. These machines have (but weren't using) Intel HD Graphics 4000.

I don't use the BOINC screensaver or look at the model's graphics
25) Message boards : Number crunching : Project keeps resetting - any explanations? (Message 49387)
Posted 19 Jun 2014 by MarkJ
Post:
An easier way to clean up the project directory would be to delete project CPDN and then add it back. For BOINC v6 that would be remove and then add the project. That is go to the Projects tab in BOINCmgr, select CPDN and then click on the delete (or remove) button. To add it back click on the Tools menu -> Add project and then select CPDN from the list

That should clean out your project folder and is much easier that trying to work out what's needed and what is not.
26) Message boards : Number crunching : ANZ model upload problems. (Message 48866)
Posted 22 Apr 2014 by MarkJ
Post:
Problem solved. It seems to have released the file and I have managed to upload it.

Been away for 4 days and came back to a bunch of stuck uploads. I have managed to clear most of them except this one which seems to have got itself locked.

22-04-2014 05:34 PM [error] Error reported by file upload server: [hadam3p_anz_p8wg_2012_1_008642207_2_13.zip] locked by file_upload_handler PID=13801

Can I get Andy or one of the other admin guys to unlock it so I can complete the upload please.

27) Message boards : Number crunching : ANZ model upload problems. (Message 48864)
Posted 22 Apr 2014 by MarkJ
Post:
Been away for 4 days and came back to a bunch of stuck uploads. I have managed to clear most of them except this one which seems to have got itself locked.

22-04-2014 05:34 PM [error] Error reported by file upload server: [hadam3p_anz_p8wg_2012_1_008642207_2_13.zip] locked by file_upload_handler PID=13801

Can I get Andy or one of the other admin guys to unlock it so I can complete the upload please.
28) Questions and Answers : Windows : Disk usage - can I delete old files? (Message 48810)
Posted 16 Apr 2014 by MarkJ
Post:
The simplest and safest would be to detach and reattach (or as they say in newer BOINC versions remove and add) the project. That gets BOINC to delete the project folder and anything under it as well as clearing out the client_state entries. The drawbacks are that you have to finish off any work in progress and it will want to download the executables again.

The same applies to other projects such as Einstein which can leave data files around in case they'll be needed later.
29) Message boards : Number crunching : NZ Application "not in DB" (Message 48643)
Posted 31 Mar 2014 by MarkJ
Post:
I get the same issue when viewing tasks. Login to your account. Click on the view tasks link. The right most column labelled Application shows "not in DB" for the anz tasks.
30) Message boards : Number crunching : ANZ model upload problems. (Message 48611)
Posted 29 Mar 2014 by MarkJ
Post:
After file locks disappeared still had transient upload failures. I had to dig out the old proxy server and hook it up to the dial up to clear them. Personally I think it's an issue with my ISP and they don't have a clue. Anyway all files cleared as of 2 hours ago. Work units progressing, and I added 2 more machines to help out.
31) Message boards : Number crunching : ANZ model upload problems. (Message 48564)
Posted 27 Mar 2014 by MarkJ
Post:
I also have 2 stuck with the same error getting reported from different machines:

526 climateprediction.net 27-03-2014 04:39 PM [error] Error reported by file upload server: [hadam3p_anz_n7dq_2012_1_008583254_0_1.zip] locked by file_upload_handler PID=-1

This one got a "transient upload error" at 05:17 (UTC + 11 hours) and then has been getting this since 05:30

and

604 climateprediction.net 27-03-2014 07:30 AM [error] Error reported by file upload server: [hadam3p_anz_n7ot_2012_1_008583653_1_1.zip] locked by file_upload_handler PID=-1

This one appeared after a "transient upload error" at 07:26 (UTC + 11) and from 07:30 its been getting the locked file error.

It looks like a common theme, it gets an upload error and then file isn't getting released. Running BOINC 7.2.42 on one and 7.3.11 on the other.
32) Message boards : Number crunching : NEW BOINC VERSION (Message 47703)
Posted 2 Dec 2013 by MarkJ
Post:
The main difference between .28 and .33 was suppressing of image display within notices. One particular project had issues that caused the client to crash.
33) Message boards : Number crunching : Persistent upload problems (Message 47621)
Posted 21 Nov 2013 by MarkJ
Post:
This is the ADSL one...

Tracing route to rapid-watch.badc.rl.ac.uk [130.246.191.84]
over a maximum of 30 hops:

1 <1 ms <1 ms <1 ms my router
2 15 ms 15 ms 14 ms 10.20.21.168
3 16 ms 15 ms 15 ms syd-nxg-men-csw1-tengi-4-2.tpgi.com.au [202.7.173.1]
4 15 ms 15 ms 14 ms syd-nxg-men-crt1-ge-7-1-0.tpgi.com.au [202.7.162.105]
5 169 ms 168 ms 168 ms te8-3.ccr01.sjc05.atlas.cogentco.com [38.122.92.41]
6 169 ms 170 ms 169 ms te0-3-0-7.ccr22.sjc01.atlas.cogentco.com [154.54.6.69]
7 169 ms 169 ms 169 ms be2047.ccr21.sjc03.atlas.cogentco.com [154.54.5.114]
8 169 ms 168 ms 169 ms tiscali.sjc03.atlas.cogentco.com [154.54.10.214]
9 316 ms 316 ms 315 ms xe-10-0-0.lon21.ip4.tinet.net [89.149.180.110]
10 310 ms 368 ms 313 ms 141.136.103.210
11 * * 310 ms ae29.londpg-sbr1.ja.net [146.97.33.2]
12 * * * Request timed out.
13 * * * Request timed out.
14 * * * Request timed out.
15 * * * Request timed out.
16 * * * Request timed out.
17 * * * Request timed out.
18 * * * Request timed out.
19 * * * Request timed out.
20 * * * Request timed out.
21 * * * Request timed out.
22 * * * Request timed out.
23 * * * Request timed out.
24 * * * Request timed out.
25 * * * Request timed out.
26 * * * Request timed out.
27 * * * Request timed out.
28 * * * Request timed out.
29 * * * Request timed out.
30 * * * Request timed out.

Trace complete.

As you can see after hop 11 it just disappears. So its looking like a routing issue
34) Message boards : Number crunching : Persistent upload problems (Message 47619)
Posted 21 Nov 2013 by MarkJ
Post:
My ISP has started asking questions after I complained. One of the things they asked for was a tracert. I did one. It was very slow after the first 2 hops (which are the ISP) upto about hop 13 and then all the remaining hops just timed out. It was (for some reason) going via Japan which is where it seems to die.
35) Message boards : Number crunching : Persistent upload problems (Message 47593)
Posted 17 Nov 2013 by MarkJ
Post:
Final machine cleared off. All tasks reported.
36) Message boards : Number crunching : Persistent upload problems (Message 47590)
Posted 17 Nov 2013 by MarkJ
Post:
Went away for a few days and left the 4th machine to clear itself off. Its done and the 5th (last) machine is down to two files left to upload.
37) Message boards : Number crunching : NEW BOINC VERSION (Message 47589)
Posted 17 Nov 2013 by MarkJ
Post:
What does 'not able to go back mean'? thanks


The client_state files have different elements in them between version 6 and version 7. If you're already running a v7 client you'll be fine, its just the older ones have different elements. You can actually go back its just that you'll lose any work the v7 client has in progress if you revert back to v6. If the projects have "resend lost tasks" enabled you might get them back that way.

I personally never bothered and just upgraded old to new and then familiarised myself with it. Things like work fetch are a bit different between the v6 and v7 clients so you might want to adjust them once you've upgraded. A common complaint is also about the messages tab. The v7 client has an "event log" which opens in a separate window and shows the same stuff the messages tab used to show.
38) Message boards : Number crunching : Persistent upload problems (Message 47517)
Posted 10 Nov 2013 by MarkJ
Post:
Started on the 4th machine but now the server is out of space, so the 56k is having a rest :)
39) Message boards : Number crunching : NEW BOINC VERSION (Message 47516)
Posted 10 Nov 2013 by MarkJ
Post:
7.2.28 is out now as the current version.

If you're using a 6.x or earlier version please heed the warning about not being able to go back (unless you back up your client_state files).
40) Message boards : Number crunching : Persistent upload problems (Message 47483)
Posted 6 Nov 2013 by MarkJ
Post:
3rd machine now cleared. 2 more to go. I managed to get a blazing fast 3.5k upload speed.

I let the wife use the phone all day Monday, so I'm good to use the phone line until Friday :)


Previous 20 · Next 20

©2024 climateprediction.net