climateprediction.net home page
New work Discussion

New work Discussion

Message boards : Number crunching : New work Discussion
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 66 · 67 · 68 · 69 · 70 · 71 · 72 . . . 91 · Next

AuthorMessage
Bryn Mawr

Send message
Joined: 28 Jul 19
Posts: 148
Credit: 12,830,559
RAC: 228
Message 64783 - Posted: 3 Nov 2021, 22:43:36 UTC - in response to Message 64780.  

From an old memory, I think that the climate models checkpoint at the end of each model year.


Woah, that would be even worse, for me once every 2 days with the trickle.
ID: 64783 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 64787 - Posted: 4 Nov 2021, 0:18:45 UTC

It is an old memory, perhaps from the original Slab Ocean models at the start of the project.
I just let mine get on with it. They can checkpoint when they want to.
ID: 64787 · Report as offensive
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2173
Credit: 64,760,426
RAC: 3,180
Message 64788 - Posted: 4 Nov 2021, 0:53:19 UTC

Back in the old days, with the slab models, the models checkpointed every 3 model days. The WAH2 models, HADCM3S and the HADAM4 models, checkpoint each model day. The HADAM4H (N216) models checkpoint every 6 model hours.

Most of the models upload and trickle once per month, on the first model day following the end of the month.
ID: 64788 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 64790 - Posted: 5 Nov 2021, 5:08:07 UTC

Getting back to batches 920/921:

My batch 921 finished and uploaded OK in the early hours of this morning, so I didn't get to see the file sizes.

Now on another 921.

This one's on it's 2nd last life, so fingers crossed.
ID: 64790 · Report as offensive
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1109
Credit: 17,121,631
RAC: 5,430
Message 64791 - Posted: 5 Nov 2021, 13:02:45 UTC - in response to Message 64790.  

Getting back to batches 920/921:


These completed successfully over the last day or two on my Computer 1511241
Red Hat Enterprise Linux release 8.4 (Ootpa)
4.18.0-305.25.1.el8_4.x86_64

Name hadam4h_h12y_200902_4_920_012116620_1
Workunit 12116620

Name hadam4h_10x3_209602_4_921_012118509_0
Workunit 12118509

Name hadam4h_11cx_209902_4_921_012119079_0
Workunit 12119079
ID: 64791 · Report as offensive
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1032
Credit: 36,242,218
RAC: 12,342
Message 64793 - Posted: 6 Nov 2021, 11:20:02 UTC

One machine has completed and reported its last batch 920 tasks. Combination of machine speed and line speed ensured that all uploads were completed well before the danger point.

Got some batch 921 resends in return. All downloads complete, and the upload size limit has been set to 200,000,000 bytes - that should be plenty, and signal the end of that particular problem.
ID: 64793 · Report as offensive
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1109
Credit: 17,121,631
RAC: 5,430
Message 64799 - Posted: 8 Nov 2021, 4:19:05 UTC - in response to Message 64793.  

I seem to be getting those too.

Why is nbytes zero in all these?
<file>
    <name>hadam4h_h013_200602_4_920_012115257_2_r1137320672_4.zip</name>
    <nbytes>0.000000</nbytes>
    <max_nbytes>200000000.000000</max_nbytes>
    <status>0</status>
    <upload_url>http://upload11.cpdn.org/cgi-bin/file_upload_handler</upload_url>
</file>
<file>
    <name>hadam4h_h013_200602_4_920_012115257_2_r1137320672_restart.zip</name>
    <nbytes>0.000000</nbytes>
    <max_nbytes>200000000.000000</max_nbytes>
    <status>0</status>
    <upload_url>http://upload11.cpdn.org/cgi-bin/file_upload_handler</upload_url>
</file>
<file>
    <name>hadam4h_h013_200602_4_920_012115257_2_r1137320672_out.zip</name>
    <nbytes>0.000000</nbytes>
    <max_nbytes>200000000.000000</max_nbytes>
    <status>0</status>
    <upload_url>http://upload11.cpdn.org/cgi-bin/file_upload_handler</upload_url>
</file>

ID: 64799 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 64800 - Posted: 8 Nov 2021, 4:55:57 UTC

That's were the actual size will be written when it's known, after the zip has been created.
ID: 64800 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 64801 - Posted: 8 Nov 2021, 5:04:01 UTC

The much larger <max_nbytes>, is because Sarah has sent out new tasks with this increased value, and is waiting to see what happens before doing anything about the original tasks.

As all appears to be well, we can relax and crunch. :)
ID: 64801 · Report as offensive
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4472
Credit: 18,448,326
RAC: 22,385
Message 64822 - Posted: 19 Nov 2021, 11:31:55 UTC

Time to order some more RAM.

Peak usage for latest OpenIFS tasks in testing is about 12GB!
ID: 64822 · Report as offensive
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1109
Credit: 17,121,631
RAC: 5,430
Message 64823 - Posted: 19 Nov 2021, 12:24:46 UTC - in response to Message 64822.  

Time to order some more RAM.

Peak usage for latest OpenIFS tasks in testing is about 12GB!


Is that total virtual memory size, or working-set size? How much is shared if more than one task of the same code (but different data, of course) are running?

Ready when you are -- I think. I run only eight Boinc tasks at a time, of which only four are CPDN.

Computer 1511241
Computer information

CPU type 	GenuineIntel
Intel(R) Xeon(R) W-2245 CPU @ 3.90GHz [Family 6 Model 85 Stepping 7]
Number of processors 	16
Operating System 	Linux Red Hat Enterprise Linux
Red Hat Enterprise Linux 8.5 (Ootpa) [4.18.0-348.el8.x86_64|libc 2.28 (GNU libc)]
BOINC version 	7.16.11
Memory 	62.4 GB
Cache 	16896 KB
Swap space 	15.62 GB
Total disk space 	117.21 GB
Free Disk Space 	91.53 GB
Measured floating point speed 	6.58 billion ops/sec
Measured integer speed 	31.49 billion ops/sec


And my RAM is now:

$ free -h
              total        used        free      shared  buff/cache   available
Mem:           62Gi        10Gi       2.7Gi       108Mi        49Gi        51Gi
Swap:          15Gi       105Mi        15Gi

ID: 64823 · Report as offensive
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 64824 - Posted: 19 Nov 2021, 14:16:14 UTC - in response to Message 64822.  

Peak usage for latest OpenIFS tasks in testing is about 12GB!

I have already retired one machine (64 GB) waiting for this project. I am now up to 96 GB, and can do 128 GB if needed.
They are just trying to support the memory companies.
ID: 64824 · Report as offensive
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4472
Credit: 18,448,326
RAC: 22,385
Message 64825 - Posted: 19 Nov 2021, 16:08:42 UTC

Is that total virtual memory size, or working-set size? How much is shared if more than one task of the same code (but different data, of course) are running?


Not shared but per task. Some have been as low as 4GB/task in the past so this small testing batch of three tasks is no guarantee that they will be as heavy on RAM when they finally make it to the main site or it may be like the testing ones, some are as bad and others are lower. But I am ordering some more RAM as with 8 real cores, it is pretty clear 32GB does not cut the mustard any more.
ID: 64825 · Report as offensive
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1109
Credit: 17,121,631
RAC: 5,430
Message 64826 - Posted: 19 Nov 2021, 18:10:45 UTC - in response to Message 64825.  

Is that total virtual memory size, or working-set size? How much is shared if more than one task of the same code (but different data, of course) are running?

Not shared but per task.


What I meant was that Linux will let processes share RAM if the RAM that is being shared is identical, It does not need to be explicitly coded into the program being run. For example, any libraries being used in common would share the binary code in question. So if I am running four instances of

UK Met Office HadAM4 at N216 resolution v8.52 i686-pc-linux-gnu, there would be only one copy of that code in Physical RAM shared by all four instances of it actually running.
It seems to me that most of the RAM used by these program tasks are data (that is not shared) rather than the instructions.

top - 12:53:54 up 7 days, 22:57,  1 user,  load average: 8.45, 8.63, 8.67
Tasks: 462 total,   9 running, 453 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.5 us,  0.2 sy, 49.6 ni, 47.3 id,  2.2 wa,  0.2 hi,  0.1 si,  0.0 st
MiB Mem :  63902.2 total,   3604.0 free,  10906.3 used,  49391.8 buff/cache
MiB Swap:  15992.0 total,  15874.7 free,    117.2 used.  52157.1 avail Mem 

    PID    PPID USER      PR  NI S    RES    SHR  %MEM  %CPU  P     TIME+ COMMAND                                        
 529767  529746 boinc     39  19 R   1.3g  19940   2.2  99.4  5   3595:28 /var/lib/boinc/projects/climateprediction.net+ 
 209079  209064 boinc     39  19 R   1.3g  19864   2.1  99.4  4   7314:56 /var/lib/boinc/projects/climateprediction.net+ 
 767343  767321 boinc     39  19 R   1.3g  19944   2.1  99.3 13 303:14.35 /var/lib/boinc/projects/climateprediction.net+ 
 721167  721157 boinc     39  19 R   1.3g  19920   2.1  99.2  1 939:53.41 /var/lib/boinc/projects/climateprediction.net+ 
 ...
  13809       1 boinc     30  10 S  36956  17404   0.1   0.1  4  77470:47 /usr/bin/boinc     [Boinc Client]     
                      
 209064   13809 boinc     39  19 S  19088  17340   0.0   0.0 12   5:46.09 ../../projects/climateprediction.net/hadam4_8+ 
 767321   13809 boinc     39  19 S  17808  17148   0.0   0.1 10   0:23.02 ../../projects/climateprediction.net/hadam4_8+ 
 529746   13809 boinc     39  19 S  17720  17288   0.0   0.0 10   2:26.96 ../../projects/climateprediction.net/hadam4_8+ 
 721157   13809 boinc     39  19 S  17348  17216   0.0   0.1 12   0:37.06 ../../projects/climateprediction.net/hadam4_8+ 

ID: 64826 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 64827 - Posted: 19 Nov 2021, 21:01:14 UTC

We'll all find out when / if they get released.

But these models appear to not be for wimpy under resourced computers.
ID: 64827 · Report as offensive
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1109
Credit: 17,121,631
RAC: 5,430
Message 64828 - Posted: 20 Nov 2021, 1:34:38 UTC - in response to Message 64827.  

We'll all find out when / if they get released.

But these models appear to not be for wimpy under resourced computers.


I have two machines. My wimpy machine runs Windows 10 so I suppose it will not be getting any of these big work units when they come out.
Computer 1512658

CPU type 	GenuineIntel
11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz [Family 6 Model 140 Stepping 1]
Number of processors 	8
Operating System 	Microsoft Windows 10
Memory 	15.64 GB
Cache 	256 KB
Swap space 	19.39 GB
Total disk space 	460.73 GB
Free Disk Space 	359.43 GB
Measured floating point speed 	4.24 billion ops/sec
Measured integer speed 	12.61 billion ops/sec


I think my main machine, that runs Linux, is not too wimpy.
Computer 1511241

CPU type 	GenuineIntel
Intel(R) Xeon(R) W-2245 CPU @ 3.90GHz [Family 6 Model 85 Stepping 7]
Number of processors 	16
Operating System 	Linux Red Hat Enterprise Linux
Red Hat Enterprise Linux 8.5 (Ootpa) [4.18.0-348.el8.x86_64|libc 2.28 (GNU libc)]
Memory 	62.4 GB
Cache 	16896 KB
Swap space 	15.62 GB
Total disk space 	117.21 GB
Free Disk Space 	92.64 GB
Measured floating point speed 	6.58 billion ops/sec
Measured integer speed 	31.49 billion ops/sec

ID: 64828 · Report as offensive
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 64829 - Posted: 20 Nov 2021, 2:18:42 UTC - in response to Message 64825.  

Some have been as low as 4GB/task in the past so this small testing batch of three tasks is no guarantee that they will be as heavy on RAM when they finally make it to the main site or it may be like the testing ones, some are as bad and others are lower. But I am ordering some more RAM as with 8 real cores, it is pretty clear 32GB does not cut the mustard any more.

Can you determine anything about cache requirements yet? That often determines how many work units we can run (efficiently), rather than the RAM requirements.
I am all in favor of using lots of RAM, but there is no point in buying it if it can't be used.
ID: 64829 · Report as offensive
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4472
Credit: 18,448,326
RAC: 22,385
Message 64830 - Posted: 20 Nov 2021, 8:49:00 UTC

I have two machines. My wimpy machine runs Windows 10 so I suppose it will not be getting any of these big work units when they come out.


OpenIFS is only for Linux and Mac with as far as I am aware no plans to develop a Windows version.

Can you determine anything about cache requirements yet? That often determines how many work units we can run (efficiently), rather than the RAM requirements.


I have forgotten how to look at cache usage, My only experience running several of these was on a wimpy underpowered machine and I found running more than one task up to the maximum of four on the machine resulted in an increasing throughput with adding a second getting close to doubling throughput but the third and fourth tasks only gave marginal gains. (Those ones peaked around 5GB/task and the machine had its maximum of 8GB installed.)
ID: 64830 · Report as offensive
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 64832 - Posted: 20 Nov 2021, 13:15:41 UTC - in response to Message 64830.  

... with adding a second getting close to doubling throughput but the third and fourth tasks only gave marginal gains.

Thanks. I think that is a good first indication. It is not surprising that they use a lot of cache.
ID: 64832 · Report as offensive
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1109
Credit: 17,121,631
RAC: 5,430
Message 64833 - Posted: 20 Nov 2021, 15:48:18 UTC - in response to Message 64830.  

I have forgotten how to look at cache usage,


Does this help? (I have not tried it yet.)

https://www.geeksforgeeks.org/see-cache-statistics-linux/
ID: 64833 · Report as offensive
Previous · 1 . . . 66 · 67 · 68 · 69 · 70 · 71 · 72 . . . 91 · Next

Message boards : Number crunching : New work Discussion

©2024 cpdn.org