climateprediction.net home page
Posts by geophi

Posts by geophi

1) Message boards : Number crunching : New work discussion - 2 (Message 66621)
Posted 8 hours ago by Profile geophi
Post:
Looks like about 13 hours running two at a time on an i7-4790K and about 9 hours running two at a time on my Rzyen 5 5600X.
2) Message boards : Number crunching : OpenIFS Frequently Asked Questions (Message 66572)
Posted 4 days ago by Profile geophi
Post:
Have moved the discussion that was in this thread to the "OpenIFS Discussion" thread . Please discuss the OpenIFS in that thread and not the this FAQ.
3) Message boards : Number crunching : Task completed, but not all trickles acknowledged yet. Normal? (Message 66569)
Posted 4 days ago by Profile geophi
Post:
In a similar vein, I have This WU 22236909 that reported all trickles and seems to have been awarded full credit but still says it is on my computer and still running.

It uploaded with the last trickle so does anyone know what has happened to it?

I do not have it on my computer.

(there are 3 failed work units on that same computer reported today but they stem from a power failure which upset them)

Thanks
Conan

I've occasionally seen this before. It looks like your computer would have reported the success status sometime after 0000 GMT Sunday. This is usually when the credit scripts run and it is database intensive. I've had a few problems with tasks reporting during the credit run, and have read of others that have had the issue during that time as well. It doesn't happen frequently, but every once in awhile, the report status must get lost.
4) Message boards : Number crunching : New work discussion - 2 (Message 66456)
Posted 13 days ago by Profile geophi
Post:
Perhaps try some of the other Event Log flags ...
I always enable <cpu_sched> and <sched_op_debug> as a matter of routine. Compared to some of the others, they're lightweight - adding very little extra to the log - but they clarify what's going on during a work fetch very nicely.

15/11/2022 21:33:46 | climateprediction.net | Sending scheduler request: To fetch work.
15/11/2022 21:33:46 | climateprediction.net | Requesting new tasks for CPU
15/11/2022 21:33:46 | climateprediction.net | [sched_op] CPU work request: 4215.86 seconds; 0.00 devices
15/11/2022 21:33:46 | climateprediction.net | [sched_op] NVIDIA GPU work request: 0.00 seconds; 0.00 devices
15/11/2022 21:33:47 | climateprediction.net | Scheduler request completed: got 1 new tasks
15/11/2022 21:33:47 | climateprediction.net | [sched_op] Server version 715
15/11/2022 21:33:47 | climateprediction.net | Project requested delay of 3636 seconds
15/11/2022 21:33:47 | climateprediction.net | [sched_op] estimated total CPU task duration: 87303 seconds
15/11/2022 21:33:47 | climateprediction.net | [sched_op] estimated total NVIDIA GPU task duration: 0 seconds
15/11/2022 21:33:47 | climateprediction.net | [sched_op] Deferring communication for 01:00:36

@Richard
Is the "estimated total CPU task duration" dependent on benchmark results?

I'm just wondering if that could be a problem since I see quite a few people with PCs where boinc has never run a benchmark. Could that in any way complicate whether tasks are sent to a host? I see Glenn has a couple of his Linux boxes that have the default 1 billion ops/sec for the benchmark.
5) Questions and Answers : Unix/Linux : *** Running 32bit CPDN from 64bit Linux - Discussion *** (Message 66390)
Posted 15 days ago by Profile geophi
Post:
You explained hadsm4_8.02_i686-pc-linux-gnu and hadsm4_um_8.02_i686-pc-linux-gnu but then what are the ones like hadsm4_se_8.02_i686-pc-linux-gnu.so?

The .so files are somehow involved in the creation or zipping of output files for uploading. The libnsl.so.1 file is a dependency needed by hadsm4_se_8.02_i686-pc-linux-gnu.so and without it being installed, the model errors out at the end because it couldn't find the expected upload file/files.
6) Message boards : Number crunching : Windows & WSL give quite different CPU benchmark results in boinc? (Message 66241)
Posted 26 Oct 2022 by Profile geophi
Post:
I've seen the same thing in Linux vs. Windows boinc benchmarks for quite awhile.

On a Ryzen 5 5600X (6 cores/12 threads) with a dual boot of Windows 10 and Ubuntu 20.04 and also a VMWare Ubuntu 20.04 VM in Windows 10:
Windows 10 boinc 7.20.2
~ 6300
Ubuntu 20.04 boinc 7.16.6
~ 8945
Ubuntu 20.04 boinc 7.16.6 in a VMWare VM on Windows 10 host
~ 8880

On an i7-3770 in Windows 10 and in an Ubuntu 20.04 VM on that Windows 10 host
Windows 10 boinc 7.20.2
~ 4800
Ubuntu 20.04 boinc 7.16.6 in a VMWare VM on Windows 10 host
~ 5350

It isn't specific to these boinc versions as this difference in benchmarks have been seen for quite some time. The Whetstone FP benchmark is supposed to be compiled without optimizations, but my guess as to the difference in scores between Linux and Windows is that boinc was compiled with different compiler brands or versions and/or optimization switches so Linux boinc uses more optimizations for the benchmark. This shouldn't change the project's science application speed which is dependent on what compiler switches are used on that application. Over at World Community Grid, over the years, some science applications ran faster or as fast with Windows as with Linux on the same system, but others definitely give the edge to Linux (like ARP).

What i3 do you have and what applications are you running that has the same speed as your i7 11700K?
7) Message boards : Number crunching : New work discussion - 2 (Message 66172)
Posted 5 Oct 2022 by Profile geophi
Post:
More HADCM3s tasks in testing. Now October is here, I am checking daily for when the OpenIFS start testing but suspect it won't be before mid month.

I don't think these are coming to the main site. The scientist running these experiments appears to be using the "testers" (mainly me) to run his experiment/experiments.
8) Message boards : Number crunching : Excuse me? Zero Credit? (Message 66169)
Posted 2 Oct 2022 by Profile geophi
Post:
Ok, that's something I wasn't aware of. I'll keep an eye out for a weekly update.

Looks like your tasks got their credit earlier today, so it still must be running once a week on UTC Sunday.
9) Message boards : Number crunching : Excuse me? Zero Credit? (Message 66166)
Posted 2 Oct 2022 by Profile geophi
Post:
It used to be that the credit script was run only once a week early UTC Sunday. But I thought that had changed recently. Respond back to this thread if no credit shows up by Monday and we'll alert Andy to the problem.
10) Questions and Answers : Unix/Linux : *** Running 32bit CPDN from 64bit Linux - Discussion *** (Message 66103)
Posted 14 Sep 2022 by Profile geophi
Post:
Hopefully soon there will be no need for this thread.

That would be fantastic! Hopefully your effort will be successful.
11) Questions and Answers : Unix/Linux : *** Running 32bit CPDN from 64bit Linux - Discussion *** (Message 66100)
Posted 12 Sep 2022 by Profile geophi
Post:
What makes me a little uncertain about the zlib package is that I recently saw a post that a user ran some tasks for like 40 days and they errored out because the files couldn't be zipped. The suggestion was to make sure that the lib32z1 was installed. Unfortunately I don't remember any more details and couldn't find that post again.


This may have been the people running Fedora 3x which didn't have libnsl installed. The files do not upload for unknown reasons without libnsl. Is it zipping, renaming, or some type of inter-application communication breakdown? I don't know and I"m not going to install Fedora 3x again to find out if the files exist but are not zipped, otherwise named wrong, or something else.
12) Questions and Answers : Unix/Linux : *** Running 32bit CPDN from 64bit Linux - Discussion *** (Message 66096)
Posted 12 Sep 2022 by Profile geophi
Post:
Yes, but what makes me not trust that completely is that lib32z1 is not listed anywhere but without it the upload files won't zip and the task will error out despite completing the calculations.

The libz is only needed for the hadcm3s tasks. Since, at this time, that is Mac only, it isn't needed for any hadam4x tasks, but is left in the command line in case they release the hadcm3s for Linux again.
13) Questions and Answers : Unix/Linux : *** Running 32bit CPDN from 64bit Linux - Discussion *** (Message 66093)
Posted 12 Sep 2022 by Profile geophi
Post:
These are all the dependencies for all the important 8.52 executables:

hadam4_8.52_i686-pc-linux-gnu:
	linux-gate.so.1 (0xf7f5d000)
	libpthread.so.0 => /lib/i386-linux-gnu/libpthread.so.0 (0xf7f18000)
	libdl.so.2 => /lib/i386-linux-gnu/libdl.so.2 (0xf7f12000)
	libstdc++.so.6 => /usr/lib/i386-linux-gnu/libstdc++.so.6 (0xf7d33000)
	libm.so.6 => /lib/i386-linux-gnu/libm.so.6 (0xf7c2e000)
	libgcc_s.so.1 => /lib/i386-linux-gnu/libgcc_s.so.1 (0xf7c0f000)
	libc.so.6 => /lib/i386-linux-gnu/libc.so.6 (0xf7a20000)
	/lib/ld-linux.so.2 (0xf7f5f000)
hadam4_se_8.52_i686-pc-linux-gnu.so:
	linux-gate.so.1 (0xf7f20000)
	libnsl.so.1 => /lib/i386-linux-gnu/libnsl.so.1 (0xf7dc4000)
	libstdc++.so.6 => /usr/lib/i386-linux-gnu/libstdc++.so.6 (0xf7be5000)
	libm.so.6 => /lib/i386-linux-gnu/libm.so.6 (0xf7ae0000)
	libgcc_s.so.1 => /lib/i386-linux-gnu/libgcc_s.so.1 (0xf7ac1000)
	libc.so.6 => /lib/i386-linux-gnu/libc.so.6 (0xf78d2000)
	/lib/ld-linux.so.2 (0xf7f22000)
hadam4_um_8.52_i686-pc-linux-gnu:
	linux-gate.so.1 (0xf7f84000)
	libdl.so.2 => /lib/i386-linux-gnu/libdl.so.2 (0xf7f5c000)
	libm.so.6 => /lib/i386-linux-gnu/libm.so.6 (0xf7e57000)
	libpthread.so.0 => /lib/i386-linux-gnu/libpthread.so.0 (0xf7e34000)
	libc.so.6 => /lib/i386-linux-gnu/libc.so.6 (0xf7c45000)
	/lib/ld-linux.so.2 (0xf7f86000)


You can see libnsl is needed in the "se" file used for some kind of communication
"Functions in this library provide routines that provide a transport-level interface to networking services for applications, facilities for machine-independent data representation, a remote procedure call mechanism, and other networking services useful for application programs."
14) Questions and Answers : Unix/Linux : *** Running 32bit CPDN from 64bit Linux - Discussion *** (Message 66090)
Posted 12 Sep 2022 by Profile geophi
Post:
Does anyone know what the library lib32ncurses6 does and why it's listed as required for CPDN for Ubuntu?

I'm not sure that it is needed anymore. The commands were originally taken from boinc guidance of potentially needed libraries for 32bit apps in Ubuntu. It was originally taken from this section of boinc 32bit app considerations.

https://boinc.berkeley.edu/wiki/Installing_BOINC#Ubuntu_2
15) Message boards : Number crunching : upload failure zip file not found (Message 65905)
Posted 20 Aug 2022 by Profile geophi
Post:
Thanks, this is a fully patched and current Fedora 36 installation. The trickles are successful, only the final transfer fails.

Tommy

@Tommy

Within the long list of messages in stderr.txt on the failed task webpages is this:

Unable to load library hadam4_se_8.52_i686-pc-linux-gnu.so
dlopen error: libnsl.so.1: cannot open shared object file: No such file or directory


libnsl.so.1 is needed for either the creation or the transfer of the zip files.

The instructions for loading the proper libraries for various linux distributions, including Fedora 32 (which should also work for Fedora 36) is here:

https://www.cpdn.org/forum_thread.php?id=8916#62038
16) Message boards : Number crunching : NZ25 file upload server problems? (Message 65809)
Posted 11 Aug 2022 by Profile geophi
Post:
Hmmm. On the 3rd retry, it finally went up so either they fixed it, or the problem is intermittent.
17) Message boards : Number crunching : NZ25 file upload server problems? (Message 65808)
Posted 11 Aug 2022 by Profile geophi
Post:
Anyone else getting file upload errors on these nz25 tasks? Mine uploads to 100% but then gives an error:

8/11/2022 5:52:48 PM | climateprediction.net | Started upload of wah2_nz25_a141_199505_25_936_012151203_0_r55943290_1.zip
8/11/2022 5:58:10 PM | climateprediction.net | [checkpoint] result wah2_nz25_a141_199505_25_936_012151203_0 checkpointed
8/11/2022 6:06:07 PM | climateprediction.net | [checkpoint] result wah2_nz25_a141_199505_25_936_012151203_0 checkpointed
8/11/2022 6:08:26 PM | climateprediction.net | [error] Error reported by file upload server: EOF on socket read : asked for 262144, got 150376
8/11/2022 6:08:26 PM | climateprediction.net | Temporarily failed upload of wah2_nz25_a141_199505_25_936_012151203_0_r55943290_1.zip: transient upload error
8/11/2022 6:08:26 PM | climateprediction.net | Backing off 01:12:41 on upload of wah2_nz25_a141_199505_25_936_012151203_0_r55943290_1.zip

This is reminiscent of previous problems with the ANZ model uploads that go to Tasmania/New Zealand servers. If others chime in with a problem, I'll notify Andy and Suzanne Rosier about it so someone can kick the server.
18) Message boards : Number crunching : New work Discussion (Message 65806)
Posted 11 Aug 2022 by Profile geophi
Post:
Not sure why the Linux batch #935 shows a submission date of the 21st of last month. I know I said I expected a further batch from that experiment though.

I think Sarah just made that comment on the Trello board for bookkeeping purposes. The batch had already been released in July, it just hadn't been properly placed on the Trello boards.
19) Message boards : Number crunching : New work Discussion (Message 65712)
Posted 31 Jul 2022 by Profile geophi
Post:
These errors were discussed in this thread awhile back.

https://www.cpdn.org/forum_thread.php?id=8701#59554

I'm not sure anything was resolved. When I infrequently get them. it was usually in a task or tasks where I exited boinc and restarted.
20) Message boards : Number crunching : Computation Errors (Message 65666)
Posted 19 Jul 2022 by Profile geophi
Post:
A few weeks ago I decided to test and max out my Ryzen 5900X (12C/24T) with 50GB RAM dedicated to WSL2 Ubuntu 22.04. Ran 24 HadAM4 N144s at the same time and they all finished without errors. The CPU has 64MB of L3 cache so about 2.6MB per task available on average. They all got done in about 20 days so about 1.2 tasks per day average, not a bad throughput I thought.

I'm assuming you are talking about the 13 month HADAM4 N144 tasks. Running 5 at a time on my 5600X, each task takes about 4 days, so in 20 days it would finish about 25.

I really think that you should test this with no use of the SMT threads, running 12 at a time. My guess is that total model throughput would be considerably higher than what happened running 24 at a time.

Now I realize that the comparison of my PC with yours is not apples to apples as you are running these in a VM, with the associated performance penalty, and my 5600X is running these natively in Linux. Also, it was running at 4.4 to 4.5 GHz and I'm sure yours is throttling more running that many. But it's been a long time since running a significant number of models above the total number of cores resulted in more total model throughput. Perhaps with something like hadcm3s (if it were again to be released for Linux), using some of the SMT threads would increase throughput, but I doubt the HADAM4 N144 models would see much, if any, by running more tasks than cores.


Next 20

©2022 climateprediction.net