climateprediction.net home page
The old -131 (file size too big) shows up again

The old -131 (file size too big) shows up again

Message boards : Number crunching : The old -131 (file size too big) shows up again
Message board moderation

To post messages, you must log in.

AuthorMessage
nairb

Send message
Joined: 3 Sep 04
Posts: 84
Credit: 4,470,980
RAC: 0
Message 64867 - Posted: 12 Dec 2021, 15:09:44 UTC

It does annoy me that this error crashes the model right at the end. In this case a 20 day jobbie. What a waste.
This one.
https://www.cpdn.org/result.php?resultid=22153705

3 other finished ok
ID: 64867 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 3568
Credit: 10,761,686
RAC: 5,528
Message 64868 - Posted: 12 Dec 2021, 15:21:33 UTC - in response to Message 64867.  
Last modified: 12 Dec 2021, 17:46:28 UTC

It does annoy me that this error crashes the model right at the end. In this case a 20 day jobbie. What a waste.
This one.
https://www.cpdn.org/result.php?resultid=22153705

3 other finished ok
I raised this with Sarah after getting the same on one of either #920 or 921. I think on 922 the file size was increased.

If you have a reasonably fast connection and internet access isn't turned off as the task nears its end, the zip gets uploaded before the file size check when the task actually finishes and you don't get a problem. If there are other files getting uploaded already when the zip is created, this can on my bored band cause the problem. I think on #922 Sarah increased the limit. I went through all of mine from those batches and added a 0 to the end of the file size in client_state.xml but that intervention is at the user's own risk.
ID: 64868 · Report as offensive     Reply Quote
nairb

Send message
Joined: 3 Sep 04
Posts: 84
Credit: 4,470,980
RAC: 0
Message 64870 - Posted: 12 Dec 2021, 17:07:51 UTC - in response to Message 64868.  

I thought these error had been fixed. But there was 2 w/u finishing at the same time and the uploads were slow. I know there is a fix to change the xml file but it requires a w/u restart?? and we all know how hit & miss that can be. I dident even look to edit any files. Why the w/u cannot be sent out with a large file size set as default?. I'm sure there is a technical reason.
ID: 64870 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 3568
Credit: 10,761,686
RAC: 5,528
Message 64871 - Posted: 12 Dec 2021, 18:02:00 UTC - in response to Message 64870.  
Last modified: 12 Dec 2021, 18:06:33 UTC

I thought these error had been fixed. But there was 2 w/u finishing at the same time and the uploads were slow. I know there is a fix to change the xml file but it requires a w/u restart?? and we all know how hit & miss that can be. I dident even look to edit any files. Why the w/u cannot be sent out with a large file size set as default?. I'm sure there is a technical reason.


Certainly batch 124 has the line <max_nbytes>200000000.000000</max_nbytes> for the zip files compared to 150000000.000000 for #120 and #121 so I think it has now been increased for new batches since I raised it with the project.

And I have changed my initial reply to reflect the fact that it is client_state.xml not cc_config.xml that I edited. Doubtless as computers get faster, at some point the file sizes will increase still further with more complex models being computed. (In testing OpenIFS tasks have had uploads of over 500MB recently.
ID: 64871 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 64872 - Posted: 12 Dec 2021, 20:37:28 UTC

As Dave said, the limit was increased some time ago. So all but the really slow connections should be OK.
But then, it depends on the computer. e.g. One that's running 64 tasks at once, all finishing at about the same time can STILL have problems, even with a very large limit.

With a slow connection, I'd suggest that you Suspend all except one task, wait until it's had a few hours head start, then Resume them one at a time, allowing a few hours before the next one.
That way they won't all finish at once.

Also, that computer needs lots more memory for this new generation of models.
2-3 gigs per core.
ID: 64872 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2127
Credit: 58,267,435
RAC: 8,229
Message 64873 - Posted: 13 Dec 2021, 1:14:12 UTC - in response to Message 64870.  

While by far most of the hadam4h N216 models in the queue to be sent out are from batches 922 to 925, there are still quite a few from 920 and 921. They can't change the xml files on the server for the batches already issued so they can't fix this for 920 and 921 server-side.
ID: 64873 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 64874 - Posted: 13 Dec 2021, 3:43:04 UTC

I've just remembered something, vaguely, about the more recent batches having a check on the memory that a computer has, before download.
The batches in the 900's need lots more than the ones from a year ago.

If this has been implemented, it may explain why that computer is getting all old tasks, with the smaller upload limit, and not the more recent ones.
ID: 64874 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 3568
Credit: 10,761,686
RAC: 5,528
Message 64875 - Posted: 13 Dec 2021, 8:38:00 UTC - in response to Message 64874.  

If this has been implemented, it may explain why that computer is getting all old tasks, with the smaller upload limit, and not the more recent ones.
I think it would pass the test for sufficient memory as before my laptop died it could still get 4 OpenIFS tasks despite only having 8GB of RAM. The test only checks to see if there is enough memory to run one task rather than whether it has enough to run tasks on all cores. Or that is my understanding at any rate.
ID: 64875 · Report as offensive     Reply Quote
nairb

Send message
Joined: 3 Sep 04
Posts: 84
Credit: 4,470,980
RAC: 0
Message 64878 - Posted: 17 Dec 2021, 4:35:16 UTC

Well the machine might only have 12gig of memory but with 4 models running it still uses very little swap space. I try to fill the other 4 threads with low impact tasks such as covid tasks from WCG or other projects. If I run 4 ARP & climate at once it all grinds to a slow crawl. But still little swap used. Just 4 models at once seems ok.
I know an I7 processor is only a glorified I5 processor..... once upon a time a P4 was the mighty processor. I still have a couple of P75 machines.....
Regardless of the size of the processor etc, why cant the file size be set at the maximum size? Upload speed can be slow for a variety of reasons.

Anyway..... whats a decent upgrade. A Ryzen 7/motherboard combo?. We always want more speed.
ID: 64878 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 3568
Credit: 10,761,686
RAC: 5,528
Message 64879 - Posted: 17 Dec 2021, 6:10:27 UTC - in response to Message 64878.  

Regardless of the size of the processor etc, why cant the file size be set at the maximum size? Upload speed can be slow for a variety of reasons.
As has been said, the limit has been increased on newer batches.

Anyway..... whats a decent upgrade. A Ryzen 7/motherboard combo?. We always want more speed.
I went for a Ryzen7 with 32 GB of RAM. I am now thinking of swapping that 32GB for 64GB of slightly faster RAM. While they have yet to make it out of the testing branch of the project, using over 5GB of memory per task. My upgrade was from a core2duo desktop. Laptop was 4 cores with 8GB of RAM. Running more than 5 N216 tasks still results in a slow down as that reaches the limit of the cpu cache memory. It will be interesting to see how much faster RAM mitigates that slow down.
ID: 64879 · Report as offensive     Reply Quote
nairb

Send message
Joined: 3 Sep 04
Posts: 84
Credit: 4,470,980
RAC: 0
Message 64880 - Posted: 18 Dec 2021, 5:52:04 UTC

I used to take the matching of cpu to m/b and type of ram very seriously. This was many years ago when a P3 700 mhz slot2 xeon with 2 meg of L2 cache was the biz. I still have that machine (dual slot2). But today its utterly useless for ..... well, anything. Fedora core 4 is not quite leading edge anymore.

Nowadays I am content to copy what others do. I have spent a small amount of time looking at the various Ryzen setups. It would be nice to have a reasonably modern bit of kit for a while. I doubt I will be around long enough to see it consigned to the trash can like the dual slot2.
ID: 64880 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 64881 - Posted: 18 Dec 2021, 6:53:39 UTC - in response to Message 64880.  

I suspect that you main problem is with your connection speed to the internet.
If this is slow, then the collection of files at the end get clogged up trying to upload.
You should get about half an hour from the time the last big zip is created, and the rest start showing up.
ID: 64881 · Report as offensive     Reply Quote

Message boards : Number crunching : The old -131 (file size too big) shows up again

©2022 climateprediction.net