climateprediction.net home page
Computation error, newly added project

Computation error, newly added project

Message boards : Number crunching : Computation error, newly added project
Message board moderation

To post messages, you must log in.

AuthorMessage
theapc

Send message
Joined: 27 Jul 16
Posts: 10
Credit: 55,923
RAC: 0
Message 61817 - Posted: 25 Dec 2019, 18:45:23 UTC
Last modified: 25 Dec 2019, 18:51:06 UTC

All tasks are immediately ending with computation error. This is on both windows and linux hosts. The windows host uses tthrottle and the linux is at 50% cpu time. My next step is to disable/100% these. The delay to update the project is so long I'm having a hard time troubleshooting. Any other ideas?

12/25/2019 12:27:20 PM | climateprediction.net | Starting task hadam4_a1tw_209810_6_856_011964087_1
12/25/2019 12:27:21 PM | climateprediction.net | Computation for task hadam4_a1tw_209810_6_856_011964087_1 finished
12/25/2019 12:27:21 PM | climateprediction.net | Output file hadam4_a1tw_209810_6_856_011964087_1_r489071904_1.zip for task hadam4_a1tw_209810_6_856_011964087_1 absent
12/25/2019 12:27:21 PM | climateprediction.net | Output file hadam4_a1tw_209810_6_856_011964087_1_r489071904_2.zip for task hadam4_a1tw_209810_6_856_011964087_1 absent
12/25/2019 12:27:21 PM | climateprediction.net | Output file hadam4_a1tw_209810_6_856_011964087_1_r489071904_3.zip for task hadam4_a1tw_209810_6_856_011964087_1 absent
12/25/2019 12:27:21 PM | climateprediction.net | Output file hadam4_a1tw_209810_6_856_011964087_1_r489071904_4.zip for task hadam4_a1tw_209810_6_856_011964087_1 absent
12/25/2019 12:27:21 PM | climateprediction.net | Output file hadam4_a1tw_209810_6_856_011964087_1_r489071904_5.zip for task hadam4_a1tw_209810_6_856_011964087_1 absent
12/25/2019 12:27:21 PM | climateprediction.net | Output file hadam4_a1tw_209810_6_856_011964087_1_r489071904_6.zip for task hadam4_a1tw_209810_6_856_011964087_1 absent
12/25/2019 12:27:21 PM | climateprediction.net | Output file hadam4_a1tw_209810_6_856_011964087_1_r489071904_restart.zip for task hadam4_a1tw_209810_6_856_011964087_1 absent
12/25/2019 12:27:21 PM | climateprediction.net | Output file hadam4_a1tw_209810_6_856_011964087_1_r489071904_out.zip for task hadam4_a1tw_209810_6_856_011964087_1 absent
ID: 61817 · Report as offensive     Reply Quote
theapc

Send message
Joined: 27 Jul 16
Posts: 10
Credit: 55,923
RAC: 0
Message 61818 - Posted: 25 Dec 2019, 20:11:14 UTC - in response to Message 61817.  
Last modified: 25 Dec 2019, 20:21:25 UTC

And now I've been put in time out making troubleshooting even harder. One host gets this, another gets no message whatsoever. Just zero tasks.

12/25/2019 1:59:12 PM | climateprediction.net | Requesting new tasks for CPU
12/25/2019 1:59:19 PM | climateprediction.net | Scheduler request completed: got 0 new tasks
12/25/2019 1:59:19 PM | climateprediction.net | No tasks sent
12/25/2019 1:59:19 PM | climateprediction.net | This computer has finished a daily quota of 1 tasks
ID: 61818 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2167
Credit: 64,403,322
RAC: 5,085
Message 61821 - Posted: 25 Dec 2019, 20:58:21 UTC - in response to Message 61817.  
Last modified: 25 Dec 2019, 21:04:10 UTC

On your 2600X, you don't have the 32bit libraries loaded that climateprediction.net needs. Sticky at the top of the Linux forum.

https://www.cpdn.org/forum_thread.php?id=8008&postid=59939

Since you're using 19.2, it's based off of Ubuntu 18.04 so use that command in the sticky to get the needed libraries.

Edit...also, you only have 8 GB of RAM on the 2600X with 12 cores that boinc sees. This is problematic in terms of memory usage if running on all cores, and I would suggest limiting the number of CPUs used by boinc to at most 6 for the hadam4 N144 models, or at most 4 if you get the hadam4h N216 models.
ID: 61821 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 61823 - Posted: 25 Dec 2019, 21:12:40 UTC - in response to Message 61817.  

Also, the "error messages" in your first post aren't. They're just BOINC saying that it can't find the output files to upload.
Which is obvious if the model crashed before they were created.

The real error message(s) will be before that in the list.
ID: 61823 · Report as offensive     Reply Quote
theapc

Send message
Joined: 27 Jul 16
Posts: 10
Credit: 55,923
RAC: 0
Message 61824 - Posted: 25 Dec 2019, 22:12:50 UTC

Added the linux packages. I assume I have to wait 24 hours to expect an update.

Any ideas on the windows host? There are no other errors in the logs.

12/25/2019 11:11:10 AM |  | Using account manager BOINCstatsBAM!
12/25/2019 11:11:10 AM |  | Setting up GUI RPC socket
12/25/2019 11:11:10 AM |  | Checking presence of 356 project files
12/25/2019 11:11:10 AM |  | Suspending GPU computation - computer is in use
12/25/2019 11:11:11 AM | climateprediction.net | Sending scheduler request: To fetch work.
12/25/2019 11:11:11 AM | climateprediction.net | Requesting new tasks for CPU
12/25/2019 11:11:13 AM | climateprediction.net | Scheduler request completed: got 0 new tasks
12/25/2019 11:11:13 AM | climateprediction.net | Not sending work - last request too recent: 703 sec
12/25/2019 11:11:21 AM |  | Suspending computation - CPU is busy
12/25/2019 11:11:31 AM |  | Resuming computation
12/25/2019 11:12:51 AM |  | Suspending computation - CPU is busy
12/25/2019 11:13:01 AM |  | Resuming computation
12/25/2019 11:13:23 AM | climateprediction.net | Computation for task wah2_anz50_a0jx_200612_31_860_011978332_0 finished
12/25/2019 11:13:23 AM | climateprediction.net | Output file wah2_anz50_a0jx_200612_31_860_011978332_0_r671891042_1.zip for task wah2_anz50_a0jx_200612_31_860_011978332_0 absent
12/25/2019 11:13:23 AM | climateprediction.net | Output file wah2_anz50_a0jx_200612_31_860_011978332_0_r671891042_2.zip for task wah2_anz50_a0jx_200612_31_860_011978332_0 absent
...
ID: 61824 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2167
Credit: 64,403,322
RAC: 5,085
Message 61825 - Posted: 25 Dec 2019, 22:25:00 UTC - in response to Message 61824.  

It's possible that the "day" will reset at 00 GMT, so it may ask for work after that.

The Windows errors are odd, with the "cannot find the device/drive specified" problem. I've had these types of errors occasionally, but not in bunches like that. It may have to do with the system trying to do too much disk reading and writing simultaneously, but that's just a hunch. If those types of errors continue frequently, I'm not sure what the solution would be.
ID: 61825 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 61826 - Posted: 25 Dec 2019, 22:47:26 UTC

Perhaps the errors are something to do with Windows' VirtualBox.

Also, the models are taking 13 hours to clock up 40 minutes of model time.
ID: 61826 · Report as offensive     Reply Quote
theapc

Send message
Joined: 27 Jul 16
Posts: 10
Credit: 55,923
RAC: 0
Message 61829 - Posted: 25 Dec 2019, 23:53:22 UTC
Last modified: 26 Dec 2019, 0:01:07 UTC

geophi, "cannot find the device/drive specified" - what are you referring to?

Les, where are you seeing that result? And CPDN uses virtualbox? I don't see a VM created by it. If you're seeing something on the back end, I did attempt attaching a linux VM to replicate the problem fresh but am still not receiving any tasks.

I do see that presently the only work available is for one of the linux applications, so I'm suspending the windows host.
ID: 61829 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2167
Credit: 64,403,322
RAC: 5,085
Message 61830 - Posted: 26 Dec 2019, 0:19:52 UTC - in response to Message 61829.  

geophi, "cannot find the device/drive specified" - what are you referring to?

Your Windows computer's tasks are here https://www.cpdn.org/results.php?hostid=1496463

If you click on the individual task number, you'll see a section labeled stderr where some additional errors are written. The problem up near the top of that listing is what resulted in the task error "The system cannot find the drive specified."
ID: 61830 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 61831 - Posted: 26 Dec 2019, 0:28:15 UTC - in response to Message 61829.  
Last modified: 26 Dec 2019, 0:28:49 UTC

About virtualbox: I'm referring to the info about your computer, here under Virtualization

cpdn DOESN'T use virtual box, but you have it installed and enabled.
ID: 61831 · Report as offensive     Reply Quote
theapc

Send message
Joined: 27 Jul 16
Posts: 10
Credit: 55,923
RAC: 0
Message 61832 - Posted: 26 Dec 2019, 0:32:34 UTC

Ok, thanks. I've suspended all but the 2600x linux host. It's limited to 1 core, 100% cpu time, and run-always. It's 0030 GMT and 30 minutes left until my next update. Cue suspense.
ID: 61832 · Report as offensive     Reply Quote
theapc

Send message
Joined: 27 Jul 16
Posts: 10
Credit: 55,923
RAC: 0
Message 61834 - Posted: 26 Dec 2019, 0:59:28 UTC

bah humbug.

12/25/2019 6:58:12 PM | climateprediction.net | This computer has finished a daily quota of 1 tasks
ID: 61834 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 61835 - Posted: 26 Dec 2019, 1:12:30 UTC - in response to Message 61834.  

:(
I've always thought that the "midnight" was where that project's server was.
Perhaps it's where the person's computer is.
ID: 61835 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2167
Credit: 64,403,322
RAC: 5,085
Message 61836 - Posted: 26 Dec 2019, 5:10:30 UTC - in response to Message 61835.  

I run some of my Linux boxes on UTC/GMT time, so that may be why I remember that.
ID: 61836 · Report as offensive     Reply Quote
theapc

Send message
Joined: 27 Jul 16
Posts: 10
Credit: 55,923
RAC: 0
Message 61838 - Posted: 26 Dec 2019, 16:26:06 UTC
Last modified: 26 Dec 2019, 17:03:17 UTC

Got a task on the linux host! So that was the libraries - thanks! Only letting it crunch the one task for now to test and I expect to see that run time come down. Plus this rig is going down tomorrow for hardware changes as it's a project-in-progress (literally building a winter space heater).

Still no work available for windows so have to wait to sort that drive error. It's an m.2 nvme and scores >1k on AS SSD so I don't think it's a speed issue. But I've limited it to one core so when a task does come down I can maybe rule that out.
ID: 61838 · Report as offensive     Reply Quote
theapc

Send message
Joined: 27 Jul 16
Posts: 10
Credit: 55,923
RAC: 0
Message 61848 - Posted: 27 Dec 2019, 3:12:20 UTC

Windows host got a task! Available work for weather ran out tho, so still need to run multiple cpdn tasks. And boinc is reporting a 10 day run time...
ID: 61848 · Report as offensive     Reply Quote

Message boards : Number crunching : Computation error, newly added project

©2024 climateprediction.net