climateprediction.net home page
Batch 1005 WAH2 NZ region

Batch 1005 WAH2 NZ region

Message boards : Number crunching : Batch 1005 WAH2 NZ region
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4347
Credit: 16,541,921
RAC: 6,087
Message 70199 - Posted: 25 Jan 2024, 11:25:32 UTC

Can someone who has one or more of these tasks let me know if zips are going through all right? The ones for that region on the testing site are stuck.
ID: 70199 · Report as offensive     Reply Quote
cetus

Send message
Joined: 7 Aug 04
Posts: 9
Credit: 139,753,972
RAC: 19,927
Message 70201 - Posted: 25 Jan 2024, 14:03:29 UTC - in response to Message 70199.  

I have two of these running. Both seem to be uploading OK.
ID: 70201 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4347
Credit: 16,541,921
RAC: 6,087
Message 70202 - Posted: 25 Jan 2024, 14:22:35 UTC - in response to Message 70201.  

Thanks, obviously going to a different server then.
ID: 70202 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1063
Credit: 16,546,621
RAC: 2,321
Message 70204 - Posted: 26 Jan 2024, 19:14:14 UTC - in response to Message 70199.  

Can someone who has one or more of these tasks let me know if zips are going through all right? The ones for that region on the testing site are stuck.

I just got two of them. One has uploaded its first trickle. This is on my pipsqueak Windows 10 machine.

Task 22387098
Name 	wah2_nz25_n31e_201205_25_1005_012258096_0
Workunit 	12258096
Created 	23 Jan 2024, 10:48:31 UTC
Sent 	25 Jan 2024, 19:27:25 UTC
Report deadline 	24 May 2024, 19:27:25 UTC

ID: 70204 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4347
Credit: 16,541,921
RAC: 6,087
Message 70206 - Posted: 26 Jan 2024, 21:20:33 UTC

Thanks both. the testing task was sending data to the wrong server. Also the test was just to make sure the task ran with a corrected ancillary file so the zips were not needed. In view of that the task has now been aborted.
ID: 70206 · Report as offensive     Reply Quote
bernard_ivo

Send message
Joined: 18 Jul 13
Posts: 438
Credit: 24,535,832
RAC: 2,045
Message 70316 - Posted: 5 Feb 2024, 16:19:03 UTC

Hi I've got this error on several of 1005 WUs I run

<core_client_version>7.24.1</core_client_version>
<![CDATA[
<message>
Disk usage limit exceeded</message>
<stderr_txt>
CPDN Monitor - Abort request from BOINC...
00:52:22 (7412): called boinc_finish(10)

</stderr_txt>
]]>
ID: 70316 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 943
Credit: 34,318,120
RAC: 11,354
Message 70317 - Posted: 5 Feb 2024, 16:45:53 UTC - in response to Message 70316.  

BOINC reports that CPDN is using 1.71 GB for one eas25 task and one nz25 task (and probably including some residual program files from older runs).

The nz25 task itself (at a little over 40% done) has a working set size of 263.69 MB.

Check those figures against the amount of space remaining on your BOINC data drive, and check what proportion of the available space BOINC is allowed to use.
ID: 70317 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 809
Credit: 13,604,352
RAC: 5,068
Message 70318 - Posted: 5 Feb 2024, 17:05:45 UTC - in response to Message 70317.  
Last modified: 5 Feb 2024, 17:07:49 UTC

That error can happen when the uploads are not going through and gradually eat up the allowed task space. Are your uploads working?

Also, check in the /var/lib/boinc/projects/climateprediction.net directory. Sometimes, the tasks do not manage to tidy up on failures. You may have some old task directories in there, check they are not running tasks first before deleting.
ID: 70318 · Report as offensive     Reply Quote
bernard_ivo

Send message
Joined: 18 Jul 13
Posts: 438
Credit: 24,535,832
RAC: 2,045
Message 70320 - Posted: 6 Feb 2024, 8:00:02 UTC - in response to Message 70318.  
Last modified: 6 Feb 2024, 8:02:08 UTC

There were only two ghost WUs occupying 0.5 Gb space and there is more than 170 GB available.

It seems though the server abort worked partially. I have pending transfer zips from batch 1005 with no corresponding WU in tasks. It is 12 zips so I will abort them. Additionally I have several 1003, 1004, 1005 who are still running. I guess I should cancel them manually. Or 1005 should be left computing?
ID: 70320 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 809
Credit: 13,604,352
RAC: 5,068
Message 70321 - Posted: 6 Feb 2024, 8:12:45 UTC - in response to Message 70320.  

Only cancel any 1002, 1003 & 1004. Any 1001 & 1005 should be left running/uploading.
ID: 70321 · Report as offensive     Reply Quote
bernard_ivo

Send message
Joined: 18 Jul 13
Posts: 438
Credit: 24,535,832
RAC: 2,045
Message 70322 - Posted: 6 Feb 2024, 12:10:36 UTC - in response to Message 70321.  

Only cancel any 1002, 1003 & 1004. Any 1001 & 1005 should be left running/uploading.


Thanks, however upload11 server can't let me upload the zips I have of 1005

06/02/2024 14:07:25 | climateprediction.net | [http] [ID#33153] Info:  processing: http://upload11.cpdn.org/cgi-bin/file_upload_handler
06/02/2024 14:07:25 | climateprediction.net | [http] [ID#33154] Info:  processing: http://upload11.cpdn.org/cgi-bin/file_upload_handler
06/02/2024 14:07:25 | climateprediction.net | [http] [ID#33154] Info:  Found bundle for host: 0x1f293a28eb0 [serially]
06/02/2024 14:07:25 | climateprediction.net | [http] [ID#33154] Info:  Connection #6232 is still name resolving, can't reuse
06/02/2024 14:07:26 | climateprediction.net | [http] [ID#33153] Info:    Trying 192.171.169.187:80...
06/02/2024 14:07:26 | climateprediction.net | [http] [ID#33154] Info:  Hostname 'upload11.cpdn.org' was found in DNS cache
06/02/2024 14:07:26 | climateprediction.net | [http] [ID#33154] Info:    Trying 192.171.169.187:80...
06/02/2024 14:07:47 | climateprediction.net | [http] [ID#33153] Info:  connect to 192.171.169.187 port 80 failed: Timed out
06/02/2024 14:07:47 | climateprediction.net | [http] [ID#33153] Info:  Failed to connect to upload11.cpdn.org port 80 after 21552 ms: Couldn't connect to server
06/02/2024 14:07:47 | climateprediction.net | [http] [ID#33153] Info:  Closing connection
06/02/2024 14:07:47 | climateprediction.net | [http] [ID#33154] Info:  connect to 192.171.169.187 port 80 failed: Timed out
06/02/2024 14:07:47 | climateprediction.net | [http] [ID#33154] Info:  Failed to connect to upload11.cpdn.org port 80 after 21552 ms: Couldn't connect to server
06/02/2024 14:07:47 | climateprediction.net | [http] [ID#33154] Info:  Closing connection
06/02/2024 14:07:47 | climateprediction.net | [http] HTTP error: Timeout was reached
06/02/2024 14:07:47 | climateprediction.net | [http] HTTP error: Timeout was reached
06/02/2024 14:07:47 | climateprediction.net | Backing off 04:43:25 on upload of wah2_nz25_n0kd_199005_25_1005_012254891_0_r710857397_1.zip
06/02/2024 14:07:47 | climateprediction.net | Backing off 05:07:36 on upload of wah2_nz25_n0kd_199005_25_1005_012254891_0_r710857397_2.zip
06/02/2024 14:07:47 | climateprediction.net | [http] [ID#33155] Info:  processing: http://upload11.cpdn.org/cgi-bin/file_upload_handler
06/02/2024 14:07:47 | climateprediction.net | [http] [ID#33156] Info:  processing: http://upload11.cpdn.org/cgi-bin/file_upload_handler
06/02/2024 14:07:48 | climateprediction.net | [http] [ID#33155] Info:  Hostname upload11.cpdn.org was found in DNS cache
06/02/2024 14:07:48 | climateprediction.net | [http] [ID#33155] Info:    Trying 192.171.169.187:80...
06/02/2024 14:07:48 | climateprediction.net | [http] [ID#33156] Info:  Found bundle for host: 0x1f293a28660 [serially]
06/02/2024 14:07:48 | climateprediction.net | [http] [ID#33156] Info:  Connection #6234 is still name resolving, can't reuse
06/02/2024 14:07:48 | climateprediction.net | [http] [ID#33156] Info:  Hostname upload11.cpdn.org was found in DNS cache
06/02/2024 14:07:48 | climateprediction.net | [http] [ID#33156] Info:    Trying 192.171.169.187:80...
06/02/2024 14:07:48 |  | Project communication failed: attempting access to reference site
06/02/2024 14:07:48 |  | [http] HTTP_OP::init_get(): https://www.google.com/
06/02/2024 14:07:48 |  | [http] [ID#0] Info:  processing: https://www.google.com/
06/02/2024 14:07:49 |  | [http] [ID#0] Info:    Trying 172.217.20.68:443...
06/02/2024 14:07:49 |  | [http] [ID#0] Info:  Connected to www.google.com (172.217.20.68) port 443
06/02/2024 14:07:49 |  | [http] [ID#0] Info:  schannel: disabled automatic use of client certificate
06/02/2024 14:07:49 |  | [http] [ID#0] Info:  ALPN: offers http/1.1
06/02/2024 14:07:49 |  | [http] [ID#0] Info:  ALPN: server accepted http/1.1
06/02/2024 14:07:49 |  | [http] [ID#0] Info:  using HTTP/1.1
06/02/2024 14:07:49 |  | [http] [ID#0] Sent header to server: GET / HTTP/1.1
06/02/2024 14:07:49 |  | [http] [ID#0] Sent header to server: Host: www.google.com
06/02/2024 14:07:49 |  | [http] [ID#0] Sent header to server: User-Agent: BOINC client (windows_x86_64 7.24.1)
06/02/2024 14:07:49 |  | [http] [ID#0] Sent header to server: Accept: */*
06/02/2024 14:07:49 |  | [http] [ID#0] Sent header to server: Accept-Encoding: deflate, gzip
06/02/2024 14:07:49 |  | [http] [ID#0] Sent header to server: Accept-Language: en_GB
06/02/2024 14:07:49 |  | [http] [ID#0] Sent header to server:
06/02/2024 14:07:49 |  | [http] [ID#0] Sent header to server: roject_name>
06/02/2024 14:07:49 |  | [http] [ID#0] Sent header to server:     <name>wah2_nz25_n0kd_199005_25_1005_012254891_0_r710857397_1.zip</name>
06/02/2024 14:07:49 |  | [http] [ID#0] Sent header to server:     <nbytes>90258709.000000</nbytes>
06/02/2024 14:07:49 |  | [http] [ID#0] Sent header to server:     <max_nbytes>150000000.000000</max_nbytes>
06/02/2024 14:07:49 |  | [http] [ID#0] Sent header to server:     <status>1</status>
06/02/2024 14:07:49 |  | [http] [ID#0] Sent header to server:     <persistent_file_xfer>
06/02/2024 14:07:49 |  | [http] [ID#0] Sent header to server:         <num_retries>15</num_retries>
06/02/2024 14:07:49 |  | [http] [ID#0] Sent header to server:         <first_request_time>1706963052.704195</first_request_time>
06/02/2024 14:07:49 |  | [http] [ID#0] Sent header to server:         <next_request_time>1707238272.680814</next_request_time>
06/02/2024 14:07:49 |  | [http] [ID#0] Sent header to server:         <time_so_far>337.586294</time_so_far>
06/02/2024 14:07:49 |  | [http] [ID#0] Sent header to server:         <last_bytes_xferred>0.000000</last_bytes_xferred>
06/02/2024 14:07:49 |  | [http] [ID#0] Sent header to server:         <is_upload>1</is_upload>
06/02/2024 14:07:49 |  | [http] [ID#0] Sent header to server:     </persistent_file_xfer>
06/02/2024 14:07:49 |  | [http] [ID#0] Sent header to server: </file_transfer>
06/02/2024 14:07:49 |  | [http] [ID#0] Sent header to server: <file_transfer>
06/02/2024 14:07:49 |  | [http] [ID#0] Sent header to server:     <project_url>https://climateprediction.net/</project_url>
06/02/2024 14:07:49 |  | [http] [ID#0] Sent header to server:     <project_name>climateprediction.net</project_name>
06/02/2024 14:07:49 |  | [http] [ID#0] Sent header to server:     <name>wah2_nz25_n0kd_199005_25_1005_012254891_0_r710857397_2.zip</name>
06/02/2024 14:07:49 |  | [http] [ID#0] Sent header to server:     <nbytes>90517431.000000</nbytes>
06/02/2024 14:07:49 |  | [http] [ID#0] Sent header to server:     <max_nbytes>150000000.000000</max_nbytes>
06/02/2024 14:07:49 |  | [http] [ID#0] Sent header to server:     <status>1</status>
06/02/2024 14:07:49 |  | [http] [ID#0] Sent header to server:     <persistent_file_xfer>
06/02/2024 14:07:49 |  | [http] [ID#0] Sent header to server:         <num_retries>12</num_retries>
06/02/2024 14:07:49 |  | [http] [ID#0] Sent header to server:         <first_request_time>1707005482.908285</first_request_time>
06/02/2024 14:07:49 |  | [http] [ID#0] Sent header to server:         <next_request_time>1707239724.238784</next_request_time>
06/02/2024 14:07:49 |  | [http] [ID#0] Sent header to server:         <time_so_far>271.271139</time_so_far>
06/02/2024 14:07:49 |  | [http] [ID#0] Sent header to server:         <last_bytes_xferred>0.000000</last_bytes_xferred>
06/02/2024 14:07:49 |  | [http] [ID#0] Sent header to server:         <is_upload>1</is_upload>
06/02/2024 14:07:49 |  | [http] [ID#0] Sent header to server:     </persistent_file_xfer>
06/02/2024 14:07:49 |  | [http] [ID#0] Sent header to server: </file_transfer>
06/02/2024 14:07:49 |  | [http] [ID#0] Sent header to server: <file_transfer>
06/02/2024 14:07:49 |  | [http] [ID#0] Sent header to server:     <project_url>https://climateprediction.net/</project_url>
06/02/2024 14:07:49 |  | [http] [ID#0] Sent header to server:     <project_name>climateprediction.net</project_name>
06/02/2024 14:07:49 |  | [http] [ID#0] Sent header to server:     <name>wah2_nz25_n0kd_199005_25_1005_012254891_0_r710857397_3.zip</name>
06/02/2024 14:07:49 |  | [http] [ID#0] Sent header to server:     <nbytes>90382194.000000</nbytes>
06/02/2024 14:07:49 |  | [http] [ID#0] Sent header to server:     <max_nbytes>150000000.000000</max_nbytes>
06/02/2024 14:07:49 |  | [http] [ID#0] Sent header to server:     <status>1</status>
06/02/2024 14:07:49 |  | [http] [ID#0] Sent header to server:     <persistent_file_xfer>
06/02/2024 14:07:49 |  | [http] [ID#0] Sent header to server:         <num_retries>11</num_retries>
06/02/2024 14:07:49 |  | [http] [ID#0] Sent header to server:         <first_request_time>1707047627.895128</first_request_time>
06/02/2024 14:07:49 |  | [http] [ID#0] Sent header to server:         <next_request_time>1707217908.505416</next_request_time>
06/02/2024 14:07:49 |  | [http] [ID#0] Sent header to server:         <time_so_far>248.874489</time_so_far>
06/02/2024 14:07:49 |  | [http] [ID#0] Sent header to server:         <last_bytes_xferred>0.000000</last_bytes_xferred>
06/02/2024 14:07:49 |  | [http] [ID#0] Sent header to server:
06/02/2024 14:07:49 |  | [http] [ID#0] Received header from server: HTTP/1.1 200 OK
06/02/2024 14:07:49 |  | [http] [ID#0] Received header from server: Date: Tue, 06 Feb 2024 12:07:49 GMT
06/02/2024 14:07:49 |  | [http] [ID#0] Received header from server: Expires: -1
06/02/2024 14:07:49 |  | [http] [ID#0] Received header from server: Cache-Control: private, max-age=0
06/02/2024 14:07:49 |  | [http] [ID#0] Received header from server: Content-Type: text/html; charset=ISO-8859-1
06/02/2024 14:07:49 |  | [http] [ID#0] Received header from server: Content-Security-Policy-Report-Only: object-src 'none';base-uri 'self';script-src 'nonce-VWQJlPawV74E_3zTcnJdYw' 'strict-dynamic' 'report-sample' 'unsafe-eval' 'unsafe-inline' https: http:;report-uri https://csp.withgoogle.com/csp/gws/other-hp
06/02/2024 14:07:49 |  | [http] [ID#0] Received header from server: P3P: CP="This is not a P3P policy! See g.co/p3phelp for more info."
06/02/2024 14:07:49 |  | [http] [ID#0] Received header from server: Content-Encoding: gzip
06/02/2024 14:07:49 |  | [http] [ID#0] Received header from server: Server: gws
06/02/2024 14:07:49 |  | [http] [ID#0] Received header from server: X-XSS-Protection: 0
06/02/2024 14:07:49 |  | [http] [ID#0] Received header from server: X-Frame-Options: SAMEORIGIN
06/02/2024 14:07:49 |  | [http] [ID#0] Received header from server: Set-Cookie: SOCS=CAAaBgiA7YWuBg; expires=Fri, 07-Mar-2025 12:07:49 GMT; path=/; domain=.google.com; Secure; SameSite=lax
06/02/2024 14:07:49 |  | [http] [ID#0] Received header from server: Set-Cookie: AEC=Ae3NU9NKF883ga8MHuHKh1r1SY9atTfhu4wRiWDXd6J6WUbvjQr9pgLqlA; expires=Sun, 04-Aug-2024 12:07:49 GMT; path=/; domain=.google.com; Secure; HttpOnly; SameSite=lax
06/02/2024 14:07:49 |  | [http] [ID#0] Received header from server: Set-Cookie: __Secure-ENID=17.SE=mfM3qEBAhbiflfe_MhQU9awPmCSF3l85SGOT8J4-x1W6KIHkohvrYc8PwebTl6eeB_Z1RvZY2yunws1VeUBKG7vSf93m2q8hEyFtJjp-0QGGo4WXU-uDXLcyCKrnNnYst5McT1TwYuXxwl2DOIT-uK0CXzbAIxZ7iHNuX-OgFM0-ojBq0vQ; expires=Sat, 08-Mar-2025 04:26:07 GMT; path=/; domain=.google.com; Secure; HttpOnly; SameSite=lax
06/02/2024 14:07:49 |  | [http] [ID#0] Received header from server: Set-Cookie: CONSENT=PENDING+273; expires=Thu, 05-Feb-2026 12:07:49 GMT; path=/; domain=.google.com; Secure
06/02/2024 14:07:49 |  | [http] [ID#0] Received header from server: Alt-Svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000
06/02/2024 14:07:49 |  | [http] [ID#0] Received header from server: Transfer-Encoding: chunked
06/02/2024 14:07:49 |  | [http] [ID#0] Received header from server:
06/02/2024 14:07:49 |  | [http] [ID#0] Received header from server: 00000001
06/02/2024 14:07:49 |  | [http] [ID#0] Received header from server: 
06/02/2024 14:07:49 |  | [http] [ID#0] Received header from server: 00000001
06/02/2024 14:07:49 |  | 
06/02/2024 14:07:49 |  | [http] [ID#0] Received header from server: 00000001
06/02/2024 14:07:49 |  | [http] [ID#0] Received header from server: 
06/02/2024 14:07:49 |  | [http] [ID#0] Received header from server: 00000001
06/02/2024 14:07:49 |  | [http] [ID#0] Info:  Connection #6236 to host www.google.com left intact
06/02/2024 14:07:49 |  | Internet access OK - project servers may be temporarily down.
ID: 70322 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4347
Credit: 16,541,921
RAC: 6,087
Message 70323 - Posted: 6 Feb 2024, 12:42:42 UTC

traceroute gets as far as 146.97.41.34 which is still in ja.net. I suspect these should be going to the Hobart server or somewhere in NZ?
ID: 70323 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 943
Credit: 34,318,120
RAC: 11,354
Message 70324 - Posted: 6 Feb 2024, 13:34:01 UTC

Putting the upload11 url into a browser just gives you the Apache test page:

This page is used to test the proper operation of the Apache HTTP server after it has been installed. If you can read this page it means that this site is working properly. This server is powered by CentOS.
ID: 70324 · Report as offensive     Reply Quote
bernard_ivo

Send message
Joined: 18 Jul 13
Posts: 438
Credit: 24,535,832
RAC: 2,045
Message 70443 - Posted: 19 Feb 2024, 16:22:52 UTC

I still have problems uploading to upload 11. I have two WUs one at 22 zip and these can't get through. I also had two more WUs failing with
<core_client_version>7.24.1</core_client_version>
<![CDATA[
<message>
Disk usage limit exceeded</message>
<stderr_txt>
CPDN Monitor - Abort request from BOINC...
22:22:29 (9908): called boinc_finish(10)

</stderr_txt>
]]>


And I have almost 200 GB allocated to BOINC so there is plenty of disk space
ID: 70443 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2169
Credit: 64,553,422
RAC: 6,017
Message 70444 - Posted: 19 Feb 2024, 16:49:45 UTC - in response to Message 70443.  

I still have problems uploading to upload 11. I have two WUs one at 22 zip and these can't get through. I also had two more WUs failing with
<core_client_version>7.24.1</core_client_version>
<![CDATA[
<message>
Disk usage limit exceeded</message>
<stderr_txt>
CPDN Monitor - Abort request from BOINC...
22:22:29 (9908): called boinc_finish(10)

</stderr_txt>
]]>


And I have almost 200 GB allocated to BOINC so there is plenty of disk space

Bernard. Just out of curiosity, how much disk space is the entire boinc data directory using?
ID: 70444 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2169
Credit: 64,553,422
RAC: 6,017
Message 70445 - Posted: 19 Feb 2024, 17:08:59 UTC
Last modified: 19 Feb 2024, 17:10:17 UTC

We had errors like this before when a bunch of upload files couldn't be uploaded for a long time, and built up in the directory. It's not actually exceeding the total boinc disk space allocated, it's exceeding the the rsc_disk_bound value set for that work unit in client_state.xml

<rsc_disk_bound> </rsc_disk_bound>
Is the maximum amount of disk space your application should take up while running any given task. Includes all input, temporary and output files. Is set in bytes.

https://boinc.bakerlab.org/rosetta/forum_thread.php?id=15160&postid=108374

We've seen this before, especially when a bunch of upload files can't be uploaded because of server problems. Maybe someone with better memory and/or a better understanding of boinc could expound on this.
ID: 70445 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4347
Credit: 16,541,921
RAC: 6,087
Message 70446 - Posted: 19 Feb 2024, 17:28:08 UTC
Last modified: 19 Feb 2024, 17:31:09 UTC

I suspect that this is the task going above the space allocated in one of the config files downloaded when the model starts. In client_state.xml you will find something like this for each task that is running

<name>wah2_eas25_h1pu_201312_24_1001_012232192</name>
    <app_name>wah2</app_name>
    <version_num>824</version_num>
    <rsc_fpops_est>3801388153458730.000000</rsc_fpops_est>
    <rsc_fpops_bound>38013881534587296.000000</rsc_fpops_bound>
    <rsc_memory_bound>364000000.000000</rsc_memory_bound>
    <rsc_disk_bound>2000000000.000000</rsc_disk_bound>


If you have a few zips stuck, it is not difficult for an individual task to go above the limit. In the past, I have increased this limit by editing the file but as this requires halting the client, there is a risk of crashing the tasks. So when it was a known problem on a batch I did it before starting computation.

I have emailed Andy to ask him to check on the server.
ID: 70446 · Report as offensive     Reply Quote
bernard_ivo

Send message
Joined: 18 Jul 13
Posts: 438
Credit: 24,535,832
RAC: 2,045
Message 70463 - Posted: 20 Feb 2024, 9:04:00 UTC - in response to Message 70444.  

I still have problems uploading to upload 11. I have two WUs one at 22 zip and these can't get through. I also had two more WUs failing with
<core_client_version>7.24.1</core_client_version>
<![CDATA[
<message>
Disk usage limit exceeded</message>
<stderr_txt>
CPDN Monitor - Abort request from BOINC...
22:22:29 (9908): called boinc_finish(10)

</stderr_txt>
]]>


And I have almost 200 GB allocated to BOINC so there is plenty of disk space

Bernard. Just out of curiosity, how much disk space is the entire boinc data directory using?


Currently it uses 13.28 GB, allocated to BOINC 190 GB. I wouldn't adjust the config files, would that be possible server side for the next batches? Some zips have cleared and one WU (batch 1005) uploaded but it gave computation error due the above disc exceed error. Crashed at the 21 zip. The other WU (batch 1005) has stuck 16 zips + restart.zip and I think it will error in few days when hitting the space limit. Other batches seem to upload just fine.
ID: 70463 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4347
Credit: 16,541,921
RAC: 6,087
Message 70464 - Posted: 20 Feb 2024, 9:07:25 UTC - in response to Message 70463.  

I would suspend computation for the NZ tasks till zips clear. If you run out of work, you can turn them back on long enough to get more work then suspend them again.
ID: 70464 · Report as offensive     Reply Quote
bernard_ivo

Send message
Joined: 18 Jul 13
Posts: 438
Credit: 24,535,832
RAC: 2,045
Message 70465 - Posted: 20 Feb 2024, 9:13:15 UTC - in response to Message 70464.  

I would suspend computation for the NZ tasks till zips clear. If you run out of work, you can turn them back on long enough to get more work then suspend them again.

Thanks, Dave. Paused it. New task started, all others are EAS25
ID: 70465 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : Batch 1005 WAH2 NZ region

©2024 climateprediction.net