1)
Message boards :
Number crunching :
Lost another one!
(Message 69124)
Posted 6 Jul 2023 by nairb Post: I find that a power cut is good for killing w/u as well. All other projects w/u recover ok. |
2)
Message boards :
Number crunching :
New work discussion - 2
(Message 68863)
Posted 7 Jun 2023 by nairb Post: I must confess that I must have confused those 64bit applications with the other apps that need those 32bit libs. I did think for a moment that the 64 bit ones had been statically linked. It would be far better if those 32 bit jobbies could be re-compiled for 64 bit. It's only the fedora 32 that I have those extra libs installed. Whilst I have vbox installed on fedora 36 I failed to get those 32 bit libs installed. So I had better remember which climate w/u's are to be worked on and not download them to the wrong machine. Anyway, I'm sure the world will keep turning. |
3)
Message boards :
Number crunching :
New work discussion - 2
(Message 68849)
Posted 5 Jun 2023 by nairb Post: I meant w/u's not needing those 32 bit libs, We had some of those earlier. Worked well on fedora 36 without installing those 32 bit libs. |
4)
Message boards :
Number crunching :
New work discussion - 2
(Message 68847)
Posted 5 Jun 2023 by nairb Post: Are all the future w/u's going to be statically linked.... Hope so. |
5)
Message boards :
Number crunching :
w/u failed at the 89th zip file
(Message 68514)
Posted 28 Feb 2023 by nairb Post: At least this w/u ran to the end before aborting with "Process still present 5 min after writing finish file; aborting</message>£ No other error messages. Only 50% success so far with the latest bunch. |
6)
Message boards :
Number crunching :
w/u failed at the 89th zip file
(Message 68453)
Posted 25 Feb 2023 by nairb Post: Old friend is back..... "double free or corruption (out)" I have been missing these. |
7)
Message boards :
Number crunching :
w/u failed at the 89th zip file
(Message 68114)
Posted 30 Jan 2023 by nairb Post: With 1 cpdn task running by its self it failed after zip no 83 with 13:30:43 STEP 2039 H=2039:00 +CPU= 18.156 double free or corruption (out) I will be glad to see this problem solved........ the machine will have been running with endless free memory and several idle threads. |
8)
Message boards :
Number crunching :
w/u failed at the 89th zip file
(Message 68027)
Posted 25 Jan 2023 by nairb Post:
Yup. I agree it can be difficult to track down. I did work for years on call centre kit with over a 1000 concurrent users. We did test for 8000 concurrent jobs on the machines. With multiple layers of software it was a challenge to find the culprit with a memory leak/corruption problem. I always tried to get the application programming teams to "try" and give informative error messages........... not always seen as the most important issue. But a useful error message can save endless hours later!!!. The machine I use for cpdn seems able to run any combination of projects without issues. 8 of anything seems ok, and they all seem to recover from a power cut..... Unlike some cpdn w/u. With 4 cpdn w/u running at once it uses very little swap space and usually shows about 5~6 gig of memory free. I know peak usage will vary. Anyway, I hope the bug is found, since it will save a lot of frustration for everyone |
9)
Message boards :
Number crunching :
The uploads are stuck
(Message 68026)
Posted 25 Jan 2023 by nairb Post: And WCG can't feed tasks either right now (they assign them, but most of them don't download), Just got a bucket full of WCG with some ARP w/u as well. They do download after a while with a bit of prodding. |
10)
Message boards :
Number crunching :
The uploads are stuck
(Message 67938)
Posted 21 Jan 2023 by nairb Post: Just had 3 uploads fail at 100% with same message "No space left on server" |
11)
Message boards :
Number crunching :
w/u failed at the 89th zip file
(Message 67925)
Posted 20 Jan 2023 by nairb Post: 2) Do you have "Leave non-GPU tasks in memory while suspended" enabled in Computing preferences? It's highly recommended, especially if the tasks often get interrupted for any reason like task swapping, BOINC/PC restarts. Yes its ticked. It's a dedicated machine and I try to ensure that once a climate task starts its not suspended by other work and runs to completion. When I checked the machine today, the machine seemed to be running almost idle with 2 tasks using almost no cpu time. It needed a hard reboot. I should have done a memory check but it looks to have come back to life, but has dumped 2 of the working w/u's with computation errors. When it's cleared the running jobs I will run a memory checker just to be sure. I do tend to load the thing with 4 climate jobs and 4 WCG jobs at once....... its done ok so far. But maybe I have been lucky. |
12)
Message boards :
Number crunching :
w/u failed at the 89th zip file
(Message 67915)
Posted 19 Jan 2023 by nairb Post: This w/u https://www.cpdn.org/result.php?resultid=22269116 failed with a most informative error "double free or corruption (out)" Anybody had one of these? Just curious what it might mean?? Ta Nairb |
13)
Message boards :
Number crunching :
The uploads are stuck
(Message 67710)
Posted 14 Jan 2023 by nairb Post: For some reason I seem to be blessed with endless uploads. It worked all thru the night and now keeps up with the output of 4 running w/u. It does seem that once a connection is made it keeps uploading the zips until they are all gone....... lucky me.. |
14)
Message boards :
Number crunching :
The uploads are stuck
(Message 67662)
Posted 13 Jan 2023 by nairb Post:
Well, on an optimistic note, I finally snared a connection and at a rattling 30+kBits/s have managed to upload a whole w/u. And still the connection is holding. I might make it up to 65 kbits/s later in the evening. With luck all the rest of the zips will go overnight. |
15)
Message boards :
Number crunching :
The uploads are stuck
(Message 67592)
Posted 12 Jan 2023 by nairb Post: Good job its not the weekend....... At least I got 2 complete w/u uploaded. |
16)
Message boards :
Number crunching :
The uploads are stuck
(Message 67542)
Posted 11 Jan 2023 by nairb Post: Well, here in slow land, one zip file at a time is being uploaded....... Huuurrray. It's happening as I write this. |
17)
Message boards :
Number crunching :
The uploads are stuck
(Message 67358)
Posted 5 Jan 2023 by nairb Post: Well I managed to get 2 w/u worth of zip files uploaded but it seems to have gone "pop" again. Nothing uploading. Good job those zip files are not 100+meg in size or we might never catch up. |
18)
Message boards :
Number crunching :
The uploads are stuck
(Message 67342)
Posted 4 Jan 2023 by nairb Post: Still going here. Took 5.5 hrs to upload all the zips for the first w/u. Could finish the lot by sometime tomorrow evening(late) if all keeps going. |
19)
Message boards :
Number crunching :
The uploads are stuck
(Message 67315)
Posted 4 Jan 2023 by nairb Post: Its looking hopeful....... one just uploaded. 400+ to go. |
20)
Questions and Answers :
Unix/Linux :
Fedora 36
(Message 66740)
Posted 3 Dec 2022 by nairb Post: These are 64bit models which they need to be to address the amount of RAM they use. I have had two that failed right at the end and 4 successes so far. The final one failed at the end and there was nothing in the stderr log. But at least the w/u are short. And static linked removes the problem of missing libs. |
©2024 climateprediction.net