climateprediction.net home page
New work Discussion

New work Discussion

Message boards : Number crunching : New work Discussion
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 19 · 20 · 21 · 22 · 23 · 24 · 25 . . . 91 · Next

AuthorMessage
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 59078 - Posted: 24 Nov 2018, 6:36:58 UTC - in response to Message 59076.  


Finally hadcm3s finishing 100 %, without errors and after that they can't complete and errored out.

[quote]<core_client_version>7.12.1</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -1073740791 (0xc0000409)</message>
<stderr_txt>


That's a Windows error message. It's been around for years.
Googling it doesn't give anything that looks useful for end users.
ID: 59078 · Report as offensive
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4475
Credit: 18,448,326
RAC: 22,385
Message 59104 - Posted: 27 Nov 2018, 10:19:46 UTC

Batch 774 SAFR50 region. These are using on the restart files from batch 741.

Server Status page showing 6,800ish tasks at the moment but not sure if that is all or if page updated while hopper still being filled.
ID: 59104 · Report as offensive
Profile Thyme Lawn
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1283
Credit: 15,824,334
RAC: 0
Message 59106 - Posted: 27 Nov 2018, 11:28:55 UTC - in response to Message 59104.  

Batch 774 SAFR50 region. These are using on the restart files from batch 741.

If you are running any tasks from this batch please check them because I have 2 batch 774 tasks stuck in the initialisation of the regional model (project team notified).

BOINC Manager shows the elapsed time and progress increasing as expected, but if you open the task properties dialog box the CPU time isn't changing from "---". If checkpoint or task debug is enabled BOINC's event log shows that no checkpoints are being made. When BOINC is restarted the elapsed time and progress revert to 0.
"The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer
ID: 59106 · Report as offensive
WB8ILI

Send message
Joined: 1 Sep 04
Posts: 161
Credit: 81,488,986
RAC: 868
Message 59107 - Posted: 27 Nov 2018, 13:23:57 UTC
Last modified: 27 Nov 2018, 13:24:14 UTC

Thyme Lawn -

I had two 774 tasks waiting in the wings, so I started them to see what would happen. They both failed within seconds with some kind of "Error in Namelist".

Make sure you don't have a hidden window (Alt-Tab) with a Fortran runtime error.
ID: 59107 · Report as offensive
Profile Thyme Lawn
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1283
Credit: 15,824,334
RAC: 0
Message 59108 - Posted: 27 Nov 2018, 13:47:37 UTC - in response to Message 59107.  

@WB8ILI I run BOINC as a service. Yes, it does mean I can't run GPU applications on other projects, but it also means that tasks which fail with a Windows runtime error don't generate the pop-up dialog box you can get in a non-service install.
"The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer
ID: 59108 · Report as offensive
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4475
Credit: 18,448,326
RAC: 22,385
Message 59109 - Posted: 27 Nov 2018, 13:48:34 UTC

I had two 774 tasks waiting in the wings, so I started them to see what would happen. They both failed within seconds with some kind of "Error in Namelist".

Make sure you don't have a hidden window (Alt-Tab) with a Fortran runtime error.


I have told the project about this to add to what Thyme and others have told them.

BOINC Manager shows the elapsed time and progress increasing as expected, but if you open the task properties dialog box the CPU time isn't changing from "---". If checkpoint or task debug is enabled BOINC's event log shows that no checkpoints are being made. When BOINC is restarted the elapsed time and progress revert to 0.


Do they still use a whole core of cpu time when doing this?
ID: 59109 · Report as offensive
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,332,106
RAC: 6,095
Message 59111 - Posted: 27 Nov 2018, 15:33:24 UTC

I received 4 of the batch 774 WU’s overnight. Started them as a test. All failed in less than 2 minutes with Fortran run time error. Win 10 on Intel 2.66 GHz processor with 8 Gb of ram.
ID: 59111 · Report as offensive
Profile Thyme Lawn
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1283
Credit: 15,824,334
RAC: 0
Message 59115 - Posted: 27 Nov 2018, 22:41:49 UTC - in response to Message 59109.  

BOINC Manager shows the elapsed time and progress increasing as expected, but if you open the task properties dialog box the CPU time isn't changing from "---". If checkpoint or task debug is enabled BOINC's event log shows that no checkpoints are being made. When BOINC is restarted the elapsed time and progress revert to 0.


Do they still use a whole core of cpu time when doing this?

No. SysInternals Process Explorer was showing 2 idle cores of the 8 on my i7 with less than 0.1 seconds of CPU time for both of the batch 774 models' controller + global + regional processes.
"The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer
ID: 59115 · Report as offensive
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,332,106
RAC: 6,095
Message 59128 - Posted: 3 Dec 2018, 22:07:52 UTC

Any sign of new work in the pipeline? I’m starting to feel like Old Mother Hubbard. The cupboard is bard and my poor dogs are getting none.
ID: 59128 · Report as offensive
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4475
Credit: 18,448,326
RAC: 22,385
Message 59129 - Posted: 4 Dec 2018, 8:09:49 UTC - in response to Message 59128.  

No idea at all. Not even any hints of the withdrawn batch coming back with correct stash files or whatever was wrong fixed. Sometimes information appears on the message boards that moderators can report issues with batches on in advance, sometimes the batch doesn't go up there till it is released. So no information doesn't always mean nothing is on the way.

I have relatively slow machines so they are still busy but I would guess from the dropping number of tasks in progress that there are a lot of machines with no work by now.
ID: 59129 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 59136 - Posted: 6 Dec 2018, 19:54:18 UTC

10,000 10 month models in 2 batches.

Yum :)
ID: 59136 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 59138 - Posted: 6 Dec 2018, 21:51:55 UTC

5% guess is about 50 hours on my fast machines.
ID: 59138 · Report as offensive
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4475
Credit: 18,448,326
RAC: 22,385
Message 59162 - Posted: 13 Dec 2018, 16:05:40 UTC
Last modified: 13 Dec 2018, 16:44:38 UTC

Any sign of new work in the pipeline?


batch 777 safr50 16 month tasks
about 8,000 at a guess but may be more if they are still filling the hopper.

Edit:Looks like that is all and they are going fast.
ID: 59162 · Report as offensive
Profile Alan K

Send message
Joined: 22 Feb 06
Posts: 488
Credit: 30,544,650
RAC: 6,213
Message 59163 - Posted: 13 Dec 2018, 22:59:21 UTC - in response to Message 59162.  

Are these the previous batch 774 ones that kept failing?
ID: 59163 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 59164 - Posted: 13 Dec 2018, 23:48:57 UTC

Same research project, but hopefully, with the "Oops" removed.
ID: 59164 · Report as offensive
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4475
Credit: 18,448,326
RAC: 22,385
Message 59207 - Posted: 20 Dec 2018, 10:27:57 UTC

New batch 778, SAS50. Haven't looked at total number yet but I think from only 784 ready to send on status page, quite a small batch.
ID: 59207 · Report as offensive
Profile Iain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1084
Credit: 7,684,326
RAC: 4,454
Message 59209 - Posted: 20 Dec 2018, 13:14:40 UTC - in response to Message 59207.  

New batch 778, SAS50. Haven't looked at total number yet but I think from only 784 ready to send on status page, quite a small batch.

Yes, 1000 x 22-month South Asia at 50 km (batch list).
ID: 59209 · Report as offensive
Profile Iain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1084
Credit: 7,684,326
RAC: 4,454
Message 59210 - Posted: 20 Dec 2018, 14:24:22 UTC

... and 3172 x 16-month Southern Africa at 50 km (batch list).
ID: 59210 · Report as offensive
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4475
Credit: 18,448,326
RAC: 22,385
Message 59211 - Posted: 20 Dec 2018, 14:30:18 UTC - in response to Message 59210.  

... and 3172 x 16-month Southern Africa at 50 km (batch list).


The goal is to study October 2015 - March 2016 heat wave and drought in Southern Africa
ID: 59211 · Report as offensive
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4475
Credit: 18,448,326
RAC: 22,385
Message 59215 - Posted: 21 Dec 2018, 9:50:46 UTC - in response to Message 59211.  
Last modified: 21 Dec 2018, 9:57:40 UTC

778's have big downloads.124.76MB! 34 minutes and only half way through uploading it's first zip.
779 and 780 also released. Part of the same project as 777

Edit: Batches seem to be behaving like buses.
ID: 59215 · Report as offensive
Previous · 1 . . . 19 · 20 · 21 · 22 · 23 · 24 · 25 . . . 91 · Next

Message boards : Number crunching : New work Discussion

©2024 cpdn.org