climateprediction.net home page
Posts by mmonnin

Posts by mmonnin

1) Message boards : Number crunching : New work Discussion (Message 65327)
Posted 5 Apr 2022 by mmonnin
Post:
First one that started running had this at 16 minutes into processing.
Model crashed: ATM_DYN : NEGATIVE THETA DETECTED.

Another on same PC has made it to 20 min.
2) Questions and Answers : Unix/Linux : Incorrect Disk Space Available to BOINC (Message 64029)
Posted 5 Jun 2021 by mmonnin
Post:
Some things BOINC only looks at during client startup. If OpenCL drivers are added a client restart is needed for the client to see the drivers. Was the client restarted after the VM disk size change?
3) Message boards : Number crunching : Dr Lisa Su says up to 192MB L3 on newer Ryzen -- hope it's true (Message 64025)
Posted 3 Jun 2021 by mmonnin
Post:
Several die of near bleeding edge SRAM at 6mm/sq. It's not going to be cheap
4) Message boards : Number crunching : Model crashed: INITTIME: Atmosphere basis time mismatch (Message 63997)
Posted 26 May 2021 by mmonnin
Post:
Resends are still being sent out. Its a lot of downloading to just abort in a minute. And since there are no app selections here I've got a ton of N216 work.
5) Message boards : Number crunching : Model crashed: INITTIME: Atmosphere basis time mismatch (Message 63992)
Posted 26 May 2021 by mmonnin
Post:
Reviving an old thread as it was the 1st result on Google.

I am getting the thread title errors on HadSM4 at N144 tasks. No CPU usage then abort after around a minute. PC completes N216 tasks.
https://www.cpdn.org/result.php?resultid=22071424
https://www.cpdn.org/result.php?resultid=22071421

I've got several more paused as BOINC downloaded too many. I'd like to fix if possible before resuming.


This has been reported to the project, looking through tasks from this batch I have so far found one other with this type of crash and will let project know. As of about 0100Hrs UTC there were only 46 of this batch running so it is difficult to know how widespread the problem is yet but having found a third one out of those 46, I suspect a problem with the ancillary files for the tasks.

Edit:As of 13 minutes ago, the batch has been paused while they do some checking. Also subsequent batch which was about to go out paused as part of same experiment.


Ok thanks for checking. I resumed the other 4 tasks on that PC since it seemed like batch issues vs missing libs or something on the PC. Same result.
6) Message boards : Number crunching : Model crashed: INITTIME: Atmosphere basis time mismatch (Message 63989)
Posted 25 May 2021 by mmonnin
Post:
Reviving an old thread as it was the 1st result on Google.

I am getting the thread title errors on HadSM4 at N144 tasks. No CPU usage then abort after around a minute. PC completes N216 tasks.
https://www.cpdn.org/result.php?resultid=22071424
https://www.cpdn.org/result.php?resultid=22071421

I've got several more paused as BOINC downloaded too many. I'd like to fix if possible before resuming.
7) Questions and Answers : Unix/Linux : Run Linux work units with Windows 10 WSL (Message 63827)
Posted 10 Apr 2021 by mmonnin
Post:
We already have to open a second ubuntu window to comunicate with boinc and all communication is with command line (boinccmd).

Possible the same thing could apply. boinccmd by default uses the same port to communicate with the client as manager does but I am guessing at things that are a fair bit out of my depth here.


Have you tried starting boinc with --daemon so that it's not tied to that terminal window? I have this in my terminal startup cmd for my 2nd GPU client instance.
sudo /usr/bin/boinc --daemon --allow_multiple_clients --gui_rpc_port 31422 --dir /var/lib/boinc2


https://boinc.berkeley.edu/wiki/client_configuration

--daemon
Linux: detach from controlling terminal; Windows: run as service.
8) Message boards : Number crunching : AMD Ryzen 7 2700X taking 50 days to complete a project running 24/7 (Message 63475)
Posted 3 Feb 2021 by mmonnin
Post:
I just suspend the extra CPDN tasks that I don't want to run. When 1st batch completes then I resume some more. This still allows other projects to DL work instead of using app_config.
9) Message boards : Number crunching : Upload failures (Message 60966)
Posted 21 Sep 2019 by mmonnin
Post:
I had an anz50 task stuck uploading the last 4 zips, a restart and out file. I think it happened around when the server went down. I aborted the uploads as the task had 20 trickles and then it changed to completed.
https://www.cpdn.org/result.php?resultid=21741910
10) Message boards : Number crunching : Irritated - some questions (Message 60745)
Posted 30 Jul 2019 by mmonnin
Post:
If you look at the due dates, they are one year out. Speed is not their thing.

the other times most best is shut them down, and save electric.

Thanks


I, and others, run other projects when there is no CPDN work available. CPDN has a lack of work, not a lack of CPU power.
11) Message boards : Number crunching : Credits (Message 60722)
Posted 26 Jul 2019 by mmonnin
Post:
Credit at CPDN has been updated. Step 1 complete.

Now the external stats files need updated. This still hasn't updated since 6/24
https://www.cpdn.org/cpdnboinc/stats/

Data from 2015:
https://www.climateprediction.net/stats/
12) Message boards : Number crunching : Credit scoring question. (Message 60721)
Posted 26 Jul 2019 by mmonnin
Post:
Yup, the other tasks have the same 19,196.51 total credit as the other user.

Be glad this isn't CreditNew where your own credit will vary depending on your wingman's processor.
13) Message boards : Number crunching : BOINC Client Improvements (Message 60711)
Posted 24 Jul 2019 by mmonnin
Post:
I know it is possible to set specific hours to either do work or access the network. How easy would it be to have it so that it woulde be possible to set specific hours to use all processors and a reduced number at other times e.g. use 50% 0f processors between 08:30 and 20:30 and 100% at other times?


My first thought was a script/windows scheduler to swap app_config files to use max # of concurrent tasks. Restart the client after the swap unless there is a boinccmd to re-read app_configs.
14) Message boards : Number crunching : Boinc config question (Message 60710)
Posted 24 Jul 2019 by mmonnin
Post:
I have no idea why it was set to take so long for resource share to get to the desired ratio between projects. I don't see the benefit of having a setting that takes a month to get to the desired effect.

Very long tasks like CPDN can mess with that as well. Users are better off with setting a project_max_concurrent. That can put a host in a spot where there are CPDN tasks available but limited by that option and no other tasks will download from other projects because the queue is full. Cores can end up idle.

If I end up downloading more CPDN tasks that I want to run at once I will suspend any buffer and extra tasks I do not want to run. That stops more from downloading and clears up the queue for another project. Set CPDN to something like 1000 resource share so they never get interrupted and another project at 1% so it won't take over anything but the left over cores. It's above 0% so a queue will download.

When some CPDN tasks complete I release some more to run. Repeat.
15) Message boards : Number crunching : Credits (Message 60689)
Posted 21 Jul 2019 by mmonnin
Post:
Over/under on the # of weeks until this gets corrected?
16) Message boards : Number crunching : Long Project and other working projects disappeared, now few tasks performing. Questions. (Message 60679)
Posted 19 Jul 2019 by mmonnin
Post:
These?
https://www.cpdn.org/cpdnboinc/results.php?hostid=1328590
17) Message boards : Number crunching : Credits (Message 60672)
Posted 15 Jul 2019 by mmonnin
Post:
This time, the stats at your profile have not been affected. Only external stats sites. Read through the threads.

Sorry, not true. The number of credits on climateprediction.net is higher than those external sites show, but still short over 2 million credits from what they should be.


That is exactly what I said. Your profile, here at CPDN, is correct and is higher. Stats site have not updated (BOINCStats) or have reverted to old stats(Free-DC). The stats here also did not update last week. Like the post right above yours said an admin is being emailed about restarting the stats script.
18) Message boards : Number crunching : Credits (Message 60662)
Posted 14 Jul 2019 by mmonnin
Post:
On June 24 I lost about 2.3 million credits. This happened once before and that time I saw messages that indicated that there was a general failure and other uses besides myself were affected. Eventually the problem was fixed and the credits lost on that previous occasion were restored. This time I found no such messages and wonder if this time the loss was just for me. Is there anyone I can contact who can look into this?

Thanks.


That older situation was completely different. Previously the server crashed, a db restored with a lower credit for everyone. Later restored as mentioned.

This time, the stats at your profile have not been affected. Only external stats sites. Read through the threads.
19) Message boards : Number crunching : BOINC Client Improvements (Message 60638)
Posted 11 Jul 2019 by mmonnin
Post:
By BOINC client, you actually mean the part of BOINC behind the scenes and not the GUI? Or the whole BOINC install? Or BOINC Mgr? Some use the terms interchangeably.

If it includes the mgr, take a look at the program BOINCTasks. It already has a lot of improvements to the GUI that should be in the Mgr.

If its the actual client then many people have asked about managing hardware via slots. I guess kind of like the how FAH handles hardware. GPU 1 can be setup with X settings in slot 1, while GPU 2 can be setup with Y settings in slot 2. 3 CPU threads can run this task in slot 4, while 5 more threads in slot 4. For BOINC that could be set 3 threads on one project, 5 on another project. For finer control like this, separate/concurrent clients are needed to setup the GPUs on separate projects with separate queues. Or CPU queues different than GPU queues.

I never want tasks to be interrupted. Never ever stop a task to run something else. Ever. Discard a task if it will miss a deadline before an already task is running is stopped for another.

Separate the Run priority/Resource Share into 2 parts. Which project has its tasks downloaded via resource share and which project has its tasks run next. I'd much rather have the tasks run FIFO.

Setting "No New work" should not affect the tasks that have already been downloaded/in queue from running. The client thinks its OK to leave those tasks until the minute before their deadline. Again FIFO and let resource share control task download, not task running order.

Make resource share instant, not a gradual change over time. Or an option for it. So many people complain at project sites because they don't know how this works and assume a new project is controlling task download when its really the client.
20) Message boards : Cafe CPDN : BOINC CONFERENCE (Message 60629)
Posted 10 Jul 2019 by mmonnin
Post:
There are several possible origins of the nickname. The one mentioned on the Hancock tower tour is when Chicago lobbied to host the World Fair.


Next 20

©2024 climateprediction.net