climateprediction.net home page
Posts by ncoded.com

Posts by ncoded.com

21) Message boards : Number crunching : The uploads are stuck (Message 67794)
Posted 17 Jan 2023 by ncoded.com
Post:
I have to wonder why I am bothering to upload 500GB worth of uploads, if they are going to just be abandoned as soon as they get reported.
22) Message boards : Number crunching : The uploads are stuck (Message 67791)
Posted 17 Jan 2023 by ncoded.com
Post:
Thank you PDW,

That is my point. There are no tasks in progress on this server. It is in jail so we only get 1 or 2 tasks per day. We complete them in around 15 hours, upload and report them. Yet this is showing 17 In-progress.

Thank you also for confirming there is some kind of lock. I guessed there was, and it was using the Mac address and/or the local IP.

Obviously I am not sure what to do about the missing tasks now.

I have run 300 threads on CPDN since the OpenIFS large batch(s) were issued a couple of weeks ago. It's going to be a hard and better pill to swallow if it turns out that was all for nothing and wasted.
23) Message boards : Number crunching : The uploads are stuck (Message 67789)
Posted 17 Jan 2023 by ncoded.com
Post:
Dave, Richard, et al..

Can I ask, where have these completed tasks gone?

https://www.cpdn.org/results.php?hostid=1535374&offset=0&show_names=0&state=1&appid=

It says In-Progress but most (if not all) of these have already been completed, uploaded, and reported?

eg I just uploaded and reported one task just a few minutes ago which I downloaded 15 hours ago, but there is nothing showing in the list, just 'in-progress'.

**

I think the problem is that CPDN is treating 2 hosts, as a single host.

eg: L-7113-1 and L-7113-2 are two different hosts. But CPDN see's just 1 host. If I swap out the hard drive, all CPDN does is change the hostname of this device, rather than see it as a seperate device (host).

The two disks are completely separate. Both have a full install of Ubuntu and BOINC. Only one drive is inserted into the server at any one time.

I have ran this server on many BOINC projects, at different times with each drive, and all projects (except CPDN) see it as 2 different hosts.

**

Are all the reported tasks from this server since Dec 24th, now orphaned? If so, that would mean its not just the 17 tasks on this list, but also the 250+ tasks that this server has and is currently reporting?

If I report a task, should not the in-progress not decrease by 1, and the Valid, Invalid, or Error increase by 1?

This has not happened for ANY of the 250+ tasks reported (or being reported) by this server since Dec 24th.

However it has and does happen for all our other hosts (devices) from before and after this date.
24) Message boards : Number crunching : The uploads are stuck (Message 67730)
Posted 15 Jan 2023 by ncoded.com
Post:
Zombie we were in a similar situation just yesterday, with no uploads since the end of the year.

What changed things for us I believe is that we got militant with clicking the Retry option.

Every 5 mins if the status was 'backing off..' we hit the 'Retry pending transfers' in Menu->Tools

After an hour or so of constant retrying we then got a connection and since then all hosts much of the time now have a connection.
25) Message boards : Number crunching : The uploads are stuck (Message 67704)
Posted 14 Jan 2023 by ncoded.com
Post:
Okay thanks.
26) Message boards : Number crunching : The uploads are stuck (Message 67701)
Posted 14 Jan 2023 by ncoded.com
Post:
I have changed the count of uploads from 2 to 1 now. Maybe I can see something...


I didn't change anything, I just occasionally clicked the 'retry pending transfers' option in Menu->Tools.

Not requesting tasks: too many uploads in progress


Anyone know what the uploads limit is? I guess it must be a few thousand files.
27) Message boards : Number crunching : The uploads are stuck (Message 67695)
Posted 14 Jan 2023 by ncoded.com
Post:
Stony666 we were in a similar position, but just in the last hour 3 out of 5 hosts started to upload, so hopefully your get a connection shortly.

We are seeing a single connection uploading at around 500-1500 KBps (as reported by BOINC).
28) Message boards : Number crunching : The uploads are stuck (Message 67676)
Posted 13 Jan 2023 by ncoded.com
Post:
Thank you SolarSyonyk, I honestly completely missed that and just assumed it would be the usual 12 months.
29) Message boards : Number crunching : The uploads are stuck (Message 67670)
Posted 13 Jan 2023 by ncoded.com
Post:
Can you confirm what you are saying here, it sounds like you are saying that the WUs issued on Dec 20th 2022 (#950) and 21st 2022 (#951), will hit their deadline to be uploaded by 19th January 2023? I thought CPDN WUs had a 12 month deadline..

We have over 200+ WU's to upload, across 2 different drives; I am not sure which if any are from the 2 batches as stated.

Just to confirm, I have not seen *any uploads (on the main server with the majority of completed tasks) all I see is 'back off ..'.
30) Message boards : Number crunching : The uploads are stuck (Message 67583)
Posted 11 Jan 2023 by ncoded.com
Post:
I haven't seen any uploads in the last week either wujj123456.
31) Message boards : Number crunching : Best Swap file size for CPDN? (Message 67275)
Posted 4 Jan 2023 by ncoded.com
Post:
Okay thanks wujj123456
32) Message boards : Number crunching : If you have used VirtualBox for BOINC and have had issues, please can you share these? (Message 67270)
Posted 3 Jan 2023 by ncoded.com
Post:
<removed>
33) Message boards : Number crunching : Best Swap file size for CPDN? (Message 67220)
Posted 2 Jan 2023 by ncoded.com
Post:
I've been looking at these fails. Does this limit refer to the total disk usage under '../projects/climateprediction.net' plus '../slots/0' or is it specifically for the task allocated space in '../slots/0/'? Apologies if this has been asked before.


I have just repeated what someone explained to me that there is a indirect 100GB limit, meaning a default value was set on the one of the EditBoxes in the UI. And if you leave a non-zero, non-empty value then it uses that Default value.

In terms of Slots, I only know what i just read at:

https://boinc.berkeley.edu/trac/wiki/BoincFiles#:~:text=The%20slot%20directory%20contains%20links,file%20in%20the%20project%20directory).

Which doesn't really explain much apart from them being XML files with paths in them.

do not try to oversubscribe tasks in terms of available memory


Now I am aware of rsc_memory_bound I can see what value is required not to 'over subscribe' in terms of available memory.
34) Message boards : Number crunching : Best Swap file size for CPDN? (Message 67215)
Posted 2 Jan 2023 by ncoded.com
Post:
Well let's hope the upload issue gets sorted shortly so all can get back to crunching.
35) Message boards : Number crunching : Best Swap file size for CPDN? (Message 67213)
Posted 2 Jan 2023 by ncoded.com
Post:
Actually there were 2 issues.

One was the 100GB limit which was the cause of some of the earlier errors.

196 (0x000000C4) EXIT_DISK_LIMIT_EXCEEDED

https://www.cpdn.org/result.php?resultid=22264124

And I believe the 2nd was when it started not having enough memory and hence started use the swap:

1 (0x00000001) Unknown error code

stderr: https://gist.github.com/ncoded/c875e9a955252dd2a15540914de2e059

And before any asks no we didn't click cancel or abort or anything like in the Client (no matter what it says in stderr).
36) Message boards : Number crunching : Best Swap file size for CPDN? (Message 67212)
Posted 2 Jan 2023 by ncoded.com
Post:
I actually did notice that most of the tasks that failed did so just at the very end.

If you can track down a task which uploaded a full stderr.txt to the web, but is still reported as a failure, that might contain some clues.

Not sure I want to look through 332 errors.
37) Message boards : Number crunching : Best Swap file size for CPDN? (Message 67210)
Posted 2 Jan 2023 by ncoded.com
Post:
If the recent errors were due to your attempts to increase available disk space by installing an additional drive


Richard are you saying that it is incorrect to think that these tasks had such a high failure rate because ram use was above rsc_memory_bound and hence Swap was constantly used?

That is how I understood it from what Glenn said.
38) Message boards : Number crunching : Best Swap file size for CPDN? (Message 67208)
Posted 2 Jan 2023 by ncoded.com
Post:
667 successful tasks on *this server, most in the last few weeks.

The only thing that changed was that CPDN put out a large amount of tasks which allowed large hosts to get loaded up with tasks; in our case around 60%.

However I doubt we were the only ones that ran with less memory /task than rsc_memory_bound, so I guess you might see a pretty high failure rate across the board (on large core hosts).

Thank you for your info that you cannot get out of BOINC Jail, until you complete AND report tasks; something that no-one can do presently.
39) Message boards : Number crunching : Best Swap file size for CPDN? (Message 67206)
Posted 2 Jan 2023 by ncoded.com
Post:
The server is now running at 30% (load) eg 6+ GB /task, so there should be no reason why any further errors should happen (from my end).


And just to be clear because you trying to imply we are just wasting power. We have done 667 successful tasks in the last few months (on this server) with the majority done in the last few weeks. And at 7000w for 256t, the premise that we are the ones who are inefficient is franky absurd.

I am not psychic, if i can run at 50-99% load on every project on BOINC without getting 50% failure rate, then I of course (like most) would assume this is the case at CPDN.

I already stated the the 1 task quota was related to the errors; I never mentioned network connections.

Thank you for all the advice and help, especially about rsc_memory_bound from Glenn.

We have everything we need now. We wont be running VirtualBox, or upgrading RAM anytime soon.
40) Message boards : Number crunching : If you have used VirtualBox for BOINC and have had issues, please can you share these? (Message 67202)
Posted 2 Jan 2023 by ncoded.com
Post:
I have no plans to run any OS inside of VirtualBox.

The only reason it would be installed is for the BOINC Project(s) that require it.


Previous 20 · Next 20

©2024 climateprediction.net