climateprediction.net home page
Win2K Pro - only downloading one WU on dual CPU machine

Win2K Pro - only downloading one WU on dual CPU machine

Questions and Answers : Windows : Win2K Pro - only downloading one WU on dual CPU machine
Message board moderation

To post messages, you must log in.

AuthorMessage
Dave Peachey

Send message
Joined: 5 Aug 04
Posts: 11
Credit: 2,356,953
RAC: 0
Message 1386 - Posted: 21 Aug 2004, 12:52:08 UTC
Last modified: 21 Aug 2004, 12:58:42 UTC

I've been running CPDN very happily on a dual P4 Xeon machine running WinXP Pro (SP2) which has four virtual CPUs, three of which are in use for CPDN (as I wanted!) and have now migrated a second box across - this one is a dual Athlon MP2800-based machine running Win2K Pro (SP4).

Having installed the software on this box without a problem and noted in the computer's profile that it is recognised as having two CPUs, I am concerned that the scheduler is only downloading a single WU for this machine. I am getting no obvious error messages and my preferences are set up in such a way that - as far as I can tell - I really shouldn't have this problem (max CPUs=4; max disk space=5GB; no. days work held=40).

I have tried (repeatedly) to update the system and, although the scheduler responds very nicely - I get no error messages, no indications of lack of disk space or imminent scheduler starvation - it gives me no more work. In desperation, I have also detached from, and reattached to, the CPDN project. The only effect of this was to trash the original workload (BTW, how can you remove a WU from your account which you're never going to complete? In case someone feels like resolving this, it's WU ID 8398) and download another single WU.

Fortunately (?!) I have a spare CPDN "classic" WU which I had set aside to get my dual Athlon machine running under BOINC (it was to be completed later on the dual Xeon box when its current CPDN "classic" WU completed) so I've reinstalled that in order to use up the CPU cycles on the second processor. I would, however prefer to be running the BOINC client on both CPUs of this new machine.

Is this a bug with Win2K Pro, a (not so obvious) scheduler problem or something else? Any thoughts?

Cheers
Dave
ID: 1386 · Report as offensive     Reply Quote
Profile Thyme Lawn
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1283
Credit: 15,824,334
RAC: 0
Message 1413 - Posted: 21 Aug 2004, 20:13:26 UTC
Last modified: 21 Aug 2004, 20:19:37 UTC

Hi Dave,

Your dual Athlon computer shows up as having 2 processors so it certainly should have downloaded 2 work units. The only thing I can think of that might prevent a second one being downloaded is if you have the resource share in your climateprediction.net preferences is not greater than 50.

I'm not aware of any way you can remove a unit from your account, but I notice that both your jobs had been recirculated from ones that had failed 11 days earlier, which has got to be a good thing.

Carl, Dave's Work unit ID 8398 is still marked as in progress. Is there any way you can invalidate all in progress allocated work units when somebody detaches from the project? I imagine detaching is going to happen a fair bit after the public launch, and keeping detached ones active could pretty quickly lead to work unit starvation (unless you've got a massive pile of them stacked up) ...

<a href="http://climateapps2.oucs.ox.ac.uk/cpdnboinc/team_display.php?teamid=3"><img src="http://www.teampicard.net/templates/fisubice/images/phpbb2_logo.jpg"></a>
ID: 1413 · Report as offensive     Reply Quote
old_user1
Avatar

Send message
Joined: 5 Aug 04
Posts: 907
Credit: 299,864
RAC: 0
Message 1416 - Posted: 21 Aug 2004, 20:56:51 UTC - in response to Message 1413.  

it looks like the detected disk space may be low? It shows up as 1.8GB so I think that is cutting it close, i.e. 600MB per workunit plus even at 95% allowed usage that may be too much for the scheduler to send another workunit?

ID: 1416 · Report as offensive     Reply Quote
Dave Peachey

Send message
Joined: 5 Aug 04
Posts: 11
Credit: 2,356,953
RAC: 0
Message 1421 - Posted: 21 Aug 2004, 21:40:40 UTC
Last modified: 21 Aug 2004, 21:58:07 UTC

Carl/Thyme,

Thanks for the comments/thoughts.

My resource sharing for all my machines is set to 100% CPDN so I don't believe that this is the source of the problem.

I've also considered the available/allocated disk space issue and, given that the available partition space (for the CPDN/other DC client partition) on both the dual Xeon and dual Athlon machines is currently set to 3GB, it would be somewhat of a surprise to find that the dual Athlon is suffering as a result of insufficient available disk space - especially given that the dual Xeon currently has less than 2GB of available space for three BOINC CPDN clients plus an instance of the non-BOINC client (actually, that's where the 1.8GB of available disk space comes in) whereas the dual Athlon has almost 2.5GB of available disk space for once instance of each client!

My inclination, therefore, is also to discount this as the root cause of the problem - based, also, on the lack of any of the "insufficient disk space" messages I remember getting when I set up the preferences when starting the dual Xeon machine - as I've had no equivalent error log messages to this effect for the dual Athlon machine.

Thyme's added comments re the ability to invalidate/release WUs arising from detachment from the project (especially in the early days of the public BOINC launch from the end of next week) is a good point and reinforces my request that some means of user/admin-driven manual release of known invalid/defunct WUs be implemented.

Any more ideas?

Cheers
Dave
ID: 1421 · Report as offensive     Reply Quote
John McLeod VII
Avatar

Send message
Joined: 5 Aug 04
Posts: 172
Credit: 4,023,611
RAC: 0
Message 1438 - Posted: 22 Aug 2004, 1:24:16 UTC

There is a bug in the WU download scheduler design. If any project has less than 50% of the resources allocated, that project will get at most one WU at a time on a dual CPU machine. This needs to be escalated to the BOINC development team.

I have verified this with a different project that hands out WUs that take a much shorter time than CPDN.

jm7
ID: 1438 · Report as offensive     Reply Quote
Dave Peachey

Send message
Joined: 5 Aug 04
Posts: 11
Credit: 2,356,953
RAC: 0
Message 1506 - Posted: 22 Aug 2004, 19:04:54 UTC
Last modified: 22 Aug 2004, 19:11:15 UTC

OK, try this one ...

I decided to have a dicker about with Carl's optimised client on my recently installed dual Athlon MP2800 computer - this initially caused a client failure and, because I forgot to do a backup before updating it, in the process took out another partially-completed WU (although it does, at least, seem to have registered this one as an incomplete/error client run so this one's not going to hang about). Needless to say, I wasn't happy, but put it down to my own stupidity, downloaded another WU and started the system working again - although with the original executable.

Note that, at this point, I hadn't made a changes to my preferences but have (on a periodic basis throughout yesterday and today) been running the "update" process to try to download another WU for that machine. Then, for some reason, having bollixed things up (just for a change!) I decided to go to my personal preferences and, for something to do, created a supplementary set of "Home" preferences which mirrored my "Default" preferences.

So the situation is ... having initially got this machine restarted with just the one new WU and _then_ having creating the additional "Home" preferences, when I stopped/restarted the client, for reasons I can't explain, it now recognised the second CPU, gave me an "imminent scheduler starvation" message and downloaded a second WU for that machine.

My question is, therefore, why would it (apparently) not play ball in respect of the two processors on my dual Athlon machine with the "Default" preferences only (although it was quite happy to do so with my dual Xeon machine two weeks ago) but, once I'd created some supplementary albeit exactly equivalent, "Home" preferences it realised it needed more work and did something about it?

Cheers (from a marginally more confused than usual)
Dave

PS: Can we also have some way of doing HTML tagging in this forum as it's a pain not to be able to emphasise text in bold and/or italics?
ID: 1506 · Report as offensive     Reply Quote
old_user1
Avatar

Send message
Joined: 5 Aug 04
Posts: 907
Credit: 299,864
RAC: 0
Message 1507 - Posted: 22 Aug 2004, 19:08:42 UTC - in response to Message 1506.  

&gt; could someone _please_ find a way to release trashed WUs which will never
&gt; complete on a particular machine?

well I guess you could always detach from project? Or for a less elegant solution, if it's say one workunit you want to quit on a two-CPU run (i.e. trash one but not the other workunit), if you erase the files in the workunit's dataout/ directory and restart that would probably force a crash and upload (finish) for that result.
ID: 1507 · Report as offensive     Reply Quote
Dave Peachey

Send message
Joined: 5 Aug 04
Posts: 11
Credit: 2,356,953
RAC: 0
Message 1509 - Posted: 22 Aug 2004, 19:12:36 UTC
Last modified: 22 Aug 2004, 19:17:36 UTC

Carl,

Dang, you caught me between message edits - it looks as though today's trashed WU has uploaded an error result after all so this one won't hang around pretending to be unfinished!

However, I'm still confused about the (seemingly apparent) difference in the response of the scheduler I have encountered just by creating a set of "Home" preferences.

Cheers
Dave
ID: 1509 · Report as offensive     Reply Quote
old_user1
Avatar

Send message
Joined: 5 Aug 04
Posts: 907
Credit: 299,864
RAC: 0
Message 1510 - Posted: 22 Aug 2004, 19:22:01 UTC - in response to Message 1509.  
Last modified: 22 Aug 2004, 19:22:19 UTC

oh you know what, there was a major bug fix on the BOINC scheduler software that the guys at berkeley just fixed. Well really the "feeder" that sends out workunits to people. Perhaps your problem had something to do with that? Apparently the "feeder" was potentially mixing up some workunits/results, the problem is fixed now but it's possible you had a workunit sent which had a wrong resultid with it and somewhere along the line that was messing things up?

ID: 1510 · Report as offensive     Reply Quote
Profile old_user2697
Avatar

Send message
Joined: 29 Aug 04
Posts: 11
Credit: 1,281,270
RAC: 0
Message 2462 - Posted: 1 Sep 2004, 6:32:19 UTC - in response to Message 1413.  

Hi there,

I have a single HTT CPU (2 virtual CPU's). BOINC version 3.2 with Seti recognized those two CPU's.
I subscribed to all the three projects: Predictor, Seti and ClimatePrediction. My resources sharing is set to resp. 10%, 100% and 100%. I was also able to download only 1 CPDN WU. Fortunally I received later on some seti WU's, so my second CPU is also working now. But I'm not able to get that seecond CPDN WU (even not after forced updates)

Patrick
ID: 2462 · Report as offensive     Reply Quote
John McLeod VII
Avatar

Send message
Joined: 5 Aug 04
Posts: 172
Credit: 4,023,611
RAC: 0
Message 2536 - Posted: 1 Sep 2004, 15:43:39 UTC - in response to Message 2462.  

&gt; Hi there,
&gt;
&gt; I have a single HTT CPU (2 virtual CPU's). BOINC version 3.2 with Seti
&gt; recognized those two CPU's.
&gt; I subscribed to all the three projects: Predictor, Seti and ClimatePrediction.
&gt; My resources sharing is set to resp. 10%, 100% and 100%. I was also able to
&gt; download only 1 CPDN WU. Fortunally I received later on some seti WU's, so my
&gt; second CPU is also working now. But I'm not able to get that seecond CPDN WU
&gt; (even not after forced updates)
&gt;
&gt; Patrick
&gt;
See my post earlier in this thread.

jm7
ID: 2536 · Report as offensive     Reply Quote
Profile old_user2697
Avatar

Send message
Joined: 29 Aug 04
Posts: 11
Credit: 1,281,270
RAC: 0
Message 2672 - Posted: 2 Sep 2004, 9:59:37 UTC - in response to Message 2536.  

Hi John,

I have resource sharing 10%, 100%, 100% for resp. P@H, S@H and CPDN. So the client allocates a 4,76%, 47,62%, 47,62% share. If I understand well, the bug is about the 47,62% below 50%. So if I configure the sharing in that way that the client allocates above 50%, it goes well?

Greetz, Patrick
ID: 2672 · Report as offensive     Reply Quote
old_user1771

Send message
Joined: 26 Aug 04
Posts: 14
Credit: 160,168
RAC: 0
Message 2724 - Posted: 2 Sep 2004, 17:01:23 UTC

I have a dual HT-enabled Xeon box (CPU ID 13384) that shows as having 4 CPUs in the profile. My preferences are set to 100% CPDN - no SETI, or other BOINC'd projects, and MaxCPU is set to 4.

I still only get 2 WU downloaded.

I'm also concerned that the measured integer benchmark on this 2.8GHz Xeon box is dreadfully low - about 750MIPS, as compared to a 2.0GHz P4 at 2583 MIPS.

Are the Xeons really that bad, or is the benchmark hosed somehow?
ID: 2724 · Report as offensive     Reply Quote

Questions and Answers : Windows : Win2K Pro - only downloading one WU on dual CPU machine

©2024 climateprediction.net