21)
Message boards :
Number crunching :
no credit awarded?
(Message 67752)
Posted 15 Jan 2023 by Bryn Mawr Post: Over the last few weeks, a lot of my tasks have uploaded. And if I look at the CPDN web site for my tasks, almost all of them have received credit. Yet if I look at my projects list, my average work done has steadily decreased to about 50.54. If I look at the statistics graph for the last month, it has dropped from 1600 to about 50. I’d second this. Over the past month I’ve accumulated about 275,000 credits (about 240,000 over the past fortnight) and Boinc is showing a RAC of 12.05 overall and zero for the host that’s doing the work. BoincStats is showing no better. |
22)
Message boards :
Number crunching :
The uploads are stuck
(Message 67569)
Posted 11 Jan 2023 by Bryn Mawr Post: I’ve now cleared my backlog and files are still trickling up slowly - only trouble is that I’m generating new files just slightly more often than the link is clearing them :-) Definitely progress as all of the outstanding WUs have gone and I can see my task list again. |
23)
Message boards :
Number crunching :
Best Swap file size for CPDN?
(Message 67137)
Posted 30 Dec 2022 by Bryn Mawr Post: Server: 128c/256t Zero. Adjust then number of tasks you run so that you don’t swap out - more efficient and less likely to crash the tasks. |
24)
Message boards :
Number crunching :
OpenIFS Discussion
(Message 67018)
Posted 23 Dec 2022 by Bryn Mawr Post: Hi I’m glad I’m not alone - I just came in to report the same error. |
25)
Message boards :
Number crunching :
OpenIFS Discussion
(Message 67010)
Posted 22 Dec 2022 by Bryn Mawr Post: Is there any way that this could be made a user selectable option to set the default value before it is downloaded? I would want this on every WU I process and I can imagine so would all the other volunteers who process 24/7.We've had this discussion about adjusting the checkpointing already in this (or another) thread - if I wasn't supposed to be wrapping Christmas presents I'd find it. So don’t make it infinitely variable, just give the users the choice between 2 or 3 “safe” values? |
26)
Message boards :
Number crunching :
OpenIFS Discussion
(Message 67004)
Posted 22 Dec 2022 by Bryn Mawr Post: Adjusting write I/O from OpenIFS tasks Is there any way that this could be made a user selectable option to set the default value before it is downloaded? I would want this on every WU I process and I can imagine so would all the other volunteers who process 24/7. |
27)
Message boards :
Number crunching :
OpenIFS Discussion
(Message 66573)
Posted 25 Nov 2022 by Bryn Mawr Post: Add a project max concurrent to that and job's a good'un Yes, the project max controls the overall count and the itemised list controls the individual apps and you have as much control as you want. |
28)
Message boards :
Number crunching :
OpenIFS Discussion
(Message 66564)
Posted 24 Nov 2022 by Bryn Mawr Post: You would to add a separate <app>...</app> section for each IFS variant, once we know the exact application names in use. You could then use <max_concurrent> to limit each IFS type, but I don't see a way to limit the total IFS number of all types, once multiple versions are in play at the same time. Add a project max concurrent to that and job's a good'un |
29)
Message boards :
Number crunching :
New work discussion - 2
(Message 66281)
Posted 2 Nov 2022 by Bryn Mawr Post: I will not restrict my 24 core box to running 4 cores with the other 20 waiting for memory - I’ll block the OpenIFS jobs if they won’t play happily. 24T, I run 3900 rather than 3900X as they only pull 65w. I always run a mix, no more than 4 CPDN, no more than 6 Rosetta and the rest WCG, TN-Grid and SIDock and I find that sort of mix is fairly happy. I agree, running fully loaded runs up against the peak package power of the CPU, running fewer threads pulls the same power by running faster clock speeds but I’m more the big kid than the deep analytical thinker :-) |
30)
Message boards :
Number crunching :
New work discussion - 2
(Message 66272)
Posted 29 Oct 2022 by Bryn Mawr Post: I will not restrict my 24 core box to running 4 cores with the other 20 waiting for memory - I’ll block the OpenIFS jobs if they won’t play happily.I might be wrong but I think in this situation you would not get the OpenIFS tasks anyway, because the server would see there's not enough free memory available. Remember it's boinc making the decisions, not the model. I’ll give them their chance and I certainly won’t shoot the messenger, these new tasks sound perfect for those with the kit to run them but, same as the Rosetta Python tasks, if my set-up is not up to the job of running them and trying restricts my ability to run other work then I’ll block them and run what work I can. |
31)
Message boards :
Number crunching :
New work discussion - 2
(Message 66265)
Posted 29 Oct 2022 by Bryn Mawr Post: "BOINC starts multiple OpenIFS tasks because there are free CPU slots, even though the total memory for the tasks exceeds what's available. " I will not restrict my 24 core box to running 4 cores with the other 20 waiting for memory - I’ll block the OpenIFS jobs if they won’t play happily. |
32)
Message boards :
Number crunching :
New work discussion - 2
(Message 66125)
Posted 20 Sep 2022 by Bryn Mawr Post: I'd like to put it more bluntly and say that CPDN tasks are definitely very sensitive to interruptions (and I believe it's relatively well documented in the forums). By far the worst of any project I'm aware of. Even a couple of LHC subprojects that must be run to completion without interruption, will just restart from the beginning. CPDN's error rate is at least 10%, Bryn Mawr's (who posted above) is over 11%. Mine is over 22%. Many of those are due to restarts (especially if happens more than once). I'd expect CPDN to have a higher error rate than other projects due to valid reasons (i.e. "Negative Pressure Detected"). But for a project that has workunits that take days to weeks to complete, 10%+ error rate is too high, I think, as that means that days' and weeks' worth of processing time is wasted because the tasks can't handle interruptions well. Glenn, it's encouraging to hear that you'd like to look into this and potentially fix it. I'm not sure which OS is worse but the issue affects Windows, macOS, and Linux tasks. Whilst I have had errors, mostly negative theta, I have not had a task fail on restart in a long time. Then, I very rarely restart more than once during the running of a single task. |
33)
Message boards :
Number crunching :
New work discussion - 2
(Message 66120)
Posted 20 Sep 2022 by Bryn Mawr Post: Dave, that's a very poor survival for the linux tasks. Other projects seem to handle a cold restart just fine. I am surprised because operational models are pretty resilient to hardware & data failures but it could be something in the wrapper code that's not tolerating restarts properly. I'll ask the CPDN team as I'm interested to find out. I can only report my experiences. I do not take any precautions when rebooting (Ubuntu 20.04) and I have not had any CPDN fails in a couple of years. |
34)
Message boards :
Number crunching :
New work Discussion
(Message 66048)
Posted 5 Sep 2022 by Bryn Mawr Post: Milkyway for example has a 'project preferences' page under the user account which allows you to limit the number of cores in workunits sent to you. CPDN doesn't support this at present because until now they have not done any multicore work. That would quite happily limit Milky Way to running 4 WUs each using 4 cores and CPDN to running 4 WUs at any time but would not limit the number of cores that each CPDN WU used. It would be interesting to see how much work would need to be done on the CPDN server to implement average CPUs and total CPUs - it might just be filling a data field within each WU with the number of CPUs it is set up to grab, the rest of the checking might be part of the standard Boinc server software. |
35)
Questions and Answers :
Unix/Linux :
*** Running 32bit CPDN from 64bit Linux - Discussion ***
(Message 65577)
Posted 17 Jun 2022 by Bryn Mawr Post: Its a Linux host. max_concurrent in app_config.xml but be aware that this can lead to runaway downloads if you’re unlucky. |
36)
Message boards :
Number crunching :
New work Discussion
(Message 65562)
Posted 14 Jun 2022 by Bryn Mawr Post: Sorry, I've been away from this project for a while, and hadn't kept up to date with recent changes. It would normally be on https://www.cpdn.org/prefs.php?subset=project, but I see it's been taken away. The setting that caused me grief of a similar nature was no_alt_platform in the cc_config.xml file. With this set on the system worked fine with all other projects but would not download any CPDN WUs. |
37)
Message boards :
Cafe CPDN :
World Community Grid mostly down for 2 months while transitioning
(Message 65494)
Posted 4 Jun 2022 by Bryn Mawr Post: But they already had this and will be using the same scientific programs as before, they're not going to change all that. And why on earth didn't they get this one up and running before they stopped using the other one?! Imagine if Google shut down for 3 months while they moved house.It worked fine before, why are they messing about?Because they need back end systems to create WUs in the first place and validate and post process the WUs on return, all of which is project related and not part of Boinc. Evidently they have been changing all that, probably to make it easier to launch new projects in the future. |
38)
Message boards :
Cafe CPDN :
World Community Grid mostly down for 2 months while transitioning
(Message 65491)
Posted 3 Jun 2022 by Bryn Mawr Post: If you look at the updates they’ve provided the development they’re doing is WebSphere / Message Broker and whilst MB has been around for a long time WS is quite new and the combination is very much current technology for real time transaction processing and is complex and difficult to get right - especially if you get into scenarios like dual centre working for security fallback.It worked fine before, why are they messing about? Because they need back end systems to create WUs in the first place and validate and post process the WUs on return, all of which is project related and not part of Boinc. |
39)
Message boards :
Cafe CPDN :
World Community Grid mostly down for 2 months while transitioning
(Message 65483)
Posted 3 Jun 2022 by Bryn Mawr Post:
If you look at the updates they’ve provided the development they’re doing is WebSphere / Message Broker and whilst MB has been around for a long time WS is quite new and the combination is very much current technology for real time transaction processing and is complex and difficult to get right - especially if you get into scenarios like dual centre working for security fallback. This I know having worked on just such a system and seen the problems first hand. |
40)
Message boards :
Number crunching :
Windows Work Units
(Message 65444)
Posted 15 May 2022 by Bryn Mawr Post: Is this project generating any work for windows machines? I've had a Windows 10 machine attached for several months and nada . . . I used to get work up until a year or so -- stopped getting work so removed the project from my four machines and put them to work on other things. Then reconnected one machine awhile back and still nada . . . No Windows work units for a long time. Currently Mac with a side portion of Linux unless you spin up a vm to run the work in. |
©2024 climateprediction.net