climateprediction.net home page
Shared Memory, other thread locked and pinned

Shared Memory, other thread locked and pinned

Questions and Answers : Macintosh : Shared Memory, other thread locked and pinned
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile old_user5994

Send message
Joined: 31 Aug 04
Posts: 239
Credit: 2,933,299
RAC: 0
Message 33251 - Posted: 7 Apr 2008, 23:01:02 UTC

Well, I have a new Mac Pro with 8 CPU and tried to run CPDN and the models die. I do have the \"fix\" in place to up the shared memory.

I have 12G Main memory, 2.73 TB free disk space (though it is on a HW RAID 5 array ...)

Memory says it is healthy (I did find one stick reporting correctable ECC errors - it is out now) ...

Not sure what else to test. Any thoughts? I can post reports if you tell me what you want to see ...
ID: 33251 · Report as offensive     Reply Quote
Profile Iain Inglis

Send message
Joined: 9 Jan 07
Posts: 467
Credit: 14,549,176
RAC: 317
Message 33252 - Posted: 7 Apr 2008, 23:24:20 UTC
Last modified: 7 Apr 2008, 23:54:32 UTC

Paul,

Absent of a proper solution, you may find that the coupled models run correctly (based on adempster). These are selectable in the climateprediction.net set of preferences in your account.

Iain

[Edit: following your post, I\'ve passed on a comment noting that this seems to be a general Mac problem with slabs.]
ID: 33252 · Report as offensive     Reply Quote
Profile old_user5994

Send message
Joined: 31 Aug 04
Posts: 239
Credit: 2,933,299
RAC: 0
Message 33255 - Posted: 8 Apr 2008, 17:00:13 UTC

It took me longer than it should... but, it does look like you are correct. The other models may have all been Slabs (or I did not have the \"fix\" in place.

BUt, I have a HadCM3 model with an hour on the clock. Which is a step ahead because the other models never got started. Of course, it also says 1089:29:47 to go ...

We will see how it goes.
ID: 33255 · Report as offensive     Reply Quote
Profile old_user5994

Send message
Joined: 31 Aug 04
Posts: 239
Credit: 2,933,299
RAC: 0
Message 33291 - Posted: 11 Apr 2008, 17:07:35 UTC

Just as a follow on note, it does appear that all the other models were \"Slabs\" and I now have a coupled model on the Mac Pro and it has 67 hours of runtime on its clock ... so, that looks decent.

A side observation, not sure what it MIGHT mean, and not sure you want me to \"blow through\" a bunch of models to test (but will if you ask), my memory usage is LOW, I mean, only a couple Gig out of the 16 is being used ... my recollection was that earlier that HUGE chunks of memory were allocated and locked on earlier runs.

I only had 8 G then, but I recall it ALL being consumed. So, the \"shared memory\" may be some OTHER memory allocation issue on the Mac Pro (Intel) machines with the slab models.

Again, it is YOUR models ... hate to blow them up just for laughs...

One OTHER note, though I did have ECC errors they were all corrected so not sure if that is relevant to the discussion or not, but, for completeness, thought I would mention it. It just seems to me, if you want Mac machines to participate and run the slab models, in that Iain points out that this seems to be a \'standard\" problem ...

And I do have the 16M shared memory allocation ... should you wish to check my numbers I will be glad to post them too ...
ID: 33291 · Report as offensive     Reply Quote
Profile MikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 33301 - Posted: 12 Apr 2008, 8:44:23 UTC

Hi Paul,

I don\'t know anything about Macs, so this is just a \'generic\' suggestion (and probably won\'t help):

If you increase the shared memory segments further, does that make any difference? i.e., perhaps 32MB shared memory and 64 segments?

Iain had a browse through some other Macs on the project, and he found that the earlier operating system (Darwin 9.1) looked OK with slabs, but many of the machines running Leopard (9.2) seemed to be burning through models. So it might be that Leopard and the Slab models are incompatible?




I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 33301 · Report as offensive     Reply Quote
Profile old_user5994

Send message
Joined: 31 Aug 04
Posts: 239
Credit: 2,933,299
RAC: 0
Message 33303 - Posted: 12 Apr 2008, 9:09:17 UTC

Not sure I know anything about Macs either ... :)

It could be ... I know a number of things are different in Leopard and that is what I am running on both Macs... of course, there is no model for the G5 ... so, I will have to do other things over there ... :)

I am in the middle of a days long test so, can\'t stop to reboot ... but I could try that ... increase the numbers again ... with 16G of ram it is not like I will run out of it soon ... I am only using a little bit of it at the moment ...
ID: 33303 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 33304 - Posted: 12 Apr 2008, 9:52:12 UTC


I wonder if the \'debug\' file might help. cc_config

Apart from the essential lines, there are two that may be of interest:
<app_msg_receive>
Shared-memory messages received from applications.
<app_msg_send>
Shared-memory messages sent to applications.


ID: 33304 · Report as offensive     Reply Quote
Profile Thyme Lawn
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1283
Credit: 15,824,334
RAC: 0
Message 33309 - Posted: 12 Apr 2008, 18:45:29 UTC - in response to Message 33304.  

<app_msg_receive>
Shared-memory messages received from applications.
<app_msg_send>
Shared-memory messages sent to applications.

That just shows the messages passing between the CPDN controller process and the BOINC core client. <app_msg_receive> displays one every second for running tasks (sent to update the progress displayed by BOINC Manager).
"The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer
ID: 33309 · Report as offensive     Reply Quote
Profile old_user5994

Send message
Joined: 31 Aug 04
Posts: 239
Credit: 2,933,299
RAC: 0
Message 33310 - Posted: 12 Apr 2008, 18:55:17 UTC

Well, you guys have some time to figure it out... :)

I have one model running and I think I want to wait till it is done before I do much of anything ... :)

I would hate to blow up a perfectly good running model.
ID: 33310 · Report as offensive     Reply Quote
Profile MikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 33320 - Posted: 13 Apr 2008, 7:35:51 UTC


Tolu is going to have a look at the slabs-on-Leopard issue next week to see if he can find out what is going wrong there.

I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 33320 · Report as offensive     Reply Quote
Profile old_user5994

Send message
Joined: 31 Aug 04
Posts: 239
Credit: 2,933,299
RAC: 0
Message 33321 - Posted: 13 Apr 2008, 7:44:37 UTC - in response to Message 33320.  


Tolu is going to have a look at the slabs-on-Leopard issue next week to see if he can find out what is going wrong there.

cool!

It could be as simple as permissions/security is different ...

I THINK there is also some subtle differences in disk layout ... though I cannot be sure.

I *DO* know that even the change in the version of Mac Pro I have, for example, broke the TechTools 4.6.1 toolkit so it will not boot off of the CD/DVD ... they asked me about the memory speed ... not sure what that might have to do with it ... but ...

Of course, I did leave the debris of half a dozen models in my account for my Mac Pro so he can look at the log there ...

I am still in the middle of a long disk test so if it requires a re-boot I can\'t help for a couple days ... but, should he want to test something easy I can try it ...

He can PM me, or if he is a packrat like me he may even have my e-mail address from days of old (it has not changed) ... or, if he posts here ... I am \"watching\" this thread ...
ID: 33321 · Report as offensive     Reply Quote
old_user147092

Send message
Joined: 5 Jan 06
Posts: 4
Credit: 7,655,256
RAC: 0
Message 33323 - Posted: 13 Apr 2008, 18:49:15 UTC

Hi, I got the same problems on a new Mac Pro (8 cores, 4GB RAM).
Slab models don\'t start by the message of \"Insufficient Memory/Stack Space Available!\"

The models are:
http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=7403801
http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=7403823
http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=7403827

HadCM3 models are running fine.

What I\'ve done after seeing the models crash:
-I\'ve stopped all projects on this computer. (Einstein@Home and Sudoku@Home additionally)
-Finished BOINC.
-Restarted BOINC
-Continued climateprediction
-Wait until three HadCM3 models are present
-Set climateprediction to \"no new work\"
-Continued the other projects
-Set in preferences in my account to prefer HadCM3 models. (Maybe it\'s better to change the preferences before continuing climateprediction)

Hope the Slab bug will be fixed soon.


ID: 33323 · Report as offensive     Reply Quote
Profile old_user5994

Send message
Joined: 31 Aug 04
Posts: 239
Credit: 2,933,299
RAC: 0
Message 33324 - Posted: 13 Apr 2008, 19:08:18 UTC

Hmmm, mine don\'t even have that much in the std error listed ...

To *ME* this looks more and more like a permissions problem ...

In your case it may be something with the file system. Mine, who knows as there are no messages in the returned data file. Are you running Tiger or Leopard?

My Mac Pro is:

Model Name: Mac Pro
Model Identifier: MacPro3,1
Processor Name: Quad-Core Intel Xeon
Processor Speed: 3.2 GHz
Number Of Processors: 2
Total Number Of Cores: 8
L2 Cache (per processor): 12 MB
Memory: 16 GB
Bus Speed: 1.6 GHz
Boot ROM Version: MP31.006C.B05
SMC Version: 1.25f4

The memory sticks are all from the same vendor and look to be pretty close in batch # so should be close running mates ... no errors reported so running well ...

I am running Leopard with all the latest patches ...
ID: 33324 · Report as offensive     Reply Quote
old_user147092

Send message
Joined: 5 Jan 06
Posts: 4
Credit: 7,655,256
RAC: 0
Message 33325 - Posted: 13 Apr 2008, 19:27:49 UTC

Mine is a:

Model Name: Mac Pro
Model Identifier: MacPro3,1
Processor Name: Quad-Core Intel Xeon
Processor Speed: 2.8 GHz
Number Of Processors: 2
Total Number Of Cores: 8
L2 Cache (per processor): 12 MB
Memory: 4 GB
Bus Speed: 1.6 GHz
Boot ROM Version: MP31.006C.B05
SMC Version: 1.25f4

Leopard(Mac OS X 10.5.2 (9C7010)), Darwin 9.2.2

The Activity Monitor shows 2.30 GB used (1.66GB free). But the other projects have sometimes the status \"Waiting for shared memory\".

So maybe *@home applications are not able to deal with more then 2GB of memory or Mac OSX has a bug.

hardy
ID: 33325 · Report as offensive     Reply Quote
Profile old_user5994

Send message
Joined: 31 Aug 04
Posts: 239
Credit: 2,933,299
RAC: 0
Message 33327 - Posted: 13 Apr 2008, 19:44:41 UTC - in response to Message 33325.  

maybe *@home applications are not able to deal with more then 2GB of memory or Mac OSX has a bug.



Now there is a happy thought ...

It could be the size though ...

I am running TechTool Pro 4.6.1 on a 1.5 TB disk and it is curently counting NEGATIVE blocks at the far end ... someone used a unsigned long to store the block count (it will always be 0 or more) and read it out as a signed int ...

One of the reasons I do not like C as a programming language. Loose typing is what gets you bugs like this ...

WIth stronger typing the compiler tells you about errors like this so you have to expecially use a type-case to convert from one type to another and then it is on the programmer\'s head if he makes the wrong choices ... but, when you have to do it explicitly and cannot get away with it ... well, thinking about changing from one type to another always made me think of the boundary conditions ...
ID: 33327 · Report as offensive     Reply Quote
old_user147092

Send message
Joined: 5 Jan 06
Posts: 4
Credit: 7,655,256
RAC: 0
Message 33329 - Posted: 13 Apr 2008, 20:17:48 UTC

What me took to the suggestion to Mac OSX limit is \"Toast\". That program always crashed on Tiger on my macmini when it reached a virtual memory size of 2GB while creating DL-DVDs (direct or via image).
Bug reported but no usefull response though.

So I suspect a similiar problem on the Slab models.

hardy

ID: 33329 · Report as offensive     Reply Quote
Profile MikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 33331 - Posted: 13 Apr 2008, 21:49:16 UTC - in response to Message 33325.  

...
The Activity Monitor shows 2.30 GB used (1.66GB free). But the other projects have sometimes the status \"Waiting for shared memory\".
...


The shared memory situation can be improved by using the spyhill patch (see the error-code-six sticky at the top of this forum).

But it appears that it probably won\'t resolve the slab model problem which you are experiencing, all it will do is increase the number of Boinc tasks that you can run simultaneously.

I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 33331 · Report as offensive     Reply Quote
Profile old_user5994

Send message
Joined: 31 Aug 04
Posts: 239
Credit: 2,933,299
RAC: 0
Message 33332 - Posted: 13 Apr 2008, 21:54:46 UTC - in response to Message 33331.  

The shared memory situation can be improved by using the spyhill patch (see the error-code-six sticky at the top of this forum).

But it appears that it probably won\'t resolve the slab model problem which you are experiencing, all it will do is increase the number of Boinc tasks that you can run simultaneously.


I did not have an issue with any other project running on the 8+ G memory before the patch, or after, ...

I *DO* recall a massive allocation of memory though I am loath to try it again and to \"burn through\" models just for the heck of it ... :)

So, we wait in patience ... :)

Hey, my disk drive is now testing block number -2,009,510,784 and counting down to -13,64,672,512
ID: 33332 · Report as offensive     Reply Quote
Profile Iain Inglis

Send message
Joined: 9 Jan 07
Posts: 467
Credit: 14,549,176
RAC: 317
Message 33339 - Posted: 14 Apr 2008, 13:10:25 UTC

Paul,

This has now been fixed with new release of Mac OS X Intel 5.05 - applications.

I assume if you now try to download a slab for your Mac, you\'ll get the new application version.

Iain
ID: 33339 · Report as offensive     Reply Quote
Profile old_user5994

Send message
Joined: 31 Aug 04
Posts: 239
Credit: 2,933,299
RAC: 0
Message 33340 - Posted: 14 Apr 2008, 14:35:22 UTC
Last modified: 14 Apr 2008, 15:14:18 UTC

Iain,

Well, even with a resource share of 5,000 I could not induce it to pull another model ...

On an 8 CPU system ...

I don\'t know how to fiddle the parameters to get it to pull another model ...

Maybe if I suspend all other projects?

{edit}Dng nab it .. pulled TWO !!!!!{/edit}

{edit 2}It looks like the two of them are running ... both have 30 min on their clocks ... {/edit 2}
ID: 33340 · Report as offensive     Reply Quote
1 · 2 · Next

Questions and Answers : Macintosh : Shared Memory, other thread locked and pinned

©2024 climateprediction.net