climateprediction.net home page
Posts by Dave Jackson

Posts by Dave Jackson

1) Message boards : Number crunching : Uploading files fails (Message 62539)
Posted 11 hours ago by Profile Dave Jackson
Post:
Still have more than two screens of uploads (around 140, I think?) to go... All from batch 870.


Things are clearing now.
#870 now has 40 tasks showing as completed.
2) Message boards : Number crunching : Uploading files fails (Message 62534)
Posted 1 day ago by Profile Dave Jackson
Post:
A couple of dozen zips have just uploaded for me; only two left.


And at least three tasks are now showing as completed e.g.
https://www.cpdn.org/result.php?resultid=21935480

Please don't all now hit the try again button at once. The server will be maxed out with file transfers for a good few hours yet so do not despair if you get transient http errors.
3) Message boards : Number crunching : Uploading files fails (Message 62518)
Posted 2 days ago by Profile Dave Jackson
Post:
Plenty of time yet.
They've only been running long enough for "weather", not for "climate." :)


OK, got an idea of length of these now. 32 months, one i5 at 3.3GHz has returned six out of 32 trickle ups so at least a couple of days till the fastest machines complete I would guestimate.
4) Message boards : Number crunching : Uploading files fails (Message 62516)
Posted 2 days ago by Profile Dave Jackson
Post:
The server down at the bottom of the planet is being attacked by thousands of Windows computers, and is getting a bit battered and bruised.
It may take some time to recovery, so Patience.

Ommmmmm ...


May be a bit more than the server getting a bit battered and bruised. None of #870 the first of these to go out have been reported as completed yet.
5) Message boards : Number crunching : Credits (Message 62510)
Posted 3 days ago by Profile Dave Jackson
Post:
Last weeks credits were calculated but not exported to BAM.


I don't think they are exported as such but rather BAM harvests them from whatever URL keeps them and should get updated as part of the credit script. I know when that URL changed it took some while for the various credit aggregation sites to all catch up even once they had been informed of the change.
6) Message boards : Number crunching : Work available and being requested but none downloaded (Message 62494)
Posted 4 days ago by Profile Dave Jackson
Post:
Still not spotting what the problem is. Next time you post from the event log it may be worth first clicking on one of the Climateprediction.net messages and then at the bottom right of the event log click on "Show only this Project."

It may not help us to spot the problem but I have certainly found it easier on my ageing eyes looking for stuff without the other projects' messages there to filter out with my brain.
7) Message boards : Number crunching : New work Discussion (Message 62481)
Posted 6 days ago by Profile Dave Jackson
Post:
I STRONGLY DISAGREE! I have been running CPDN (almost exclusively) for more than 10 years, since the days of the BBC experiment. I have over 20,000,000 credits. All of this work has been run exclusively on laptops.

The time limit is the point, and also the error rate. If you can get past those, you can do them however you want insofar as I am concerned.
But on Windows, most of the errors I see are for too many suspends. I don't think that is from the desktops.


Two machines is hardly statistically significant but I have one desktop and one laptop, however I see no difference in the error rates between the two. I do though vary the number of cores according to time of year/temperature. The machine tells me it is still OK running all 4 cores on the laptop in summer but even if true the fan noise is excessive on the laptop so I cut it to either 75% or 50%. (two or three cores out of four.) Two cores gives about the same level of fan noise at 25C that four give at about 17C ambient.

I don't know if I would get more errors running the laptop hotter or just wear it out more quickly.
8) Message boards : Number crunching : New work Discussion (Message 62468)
Posted 7 days ago by Profile Dave Jackson
Post:
Good that this has been taken care of. Boinc servers have had as a default rule to not allow to post in forums unless you have RAC > 1. That was to prevent junk posts from users who are not interested to do their bit for the project.


I am not sure if CPDN has ever had this restriction since I started. I know it is the default in the BOINC server code but because so many people had trouble installing the 32bit libs in Linux it didn't make sense to stop people posting globally.
9) Message boards : Number crunching : New work Discussion (Message 62462)
Posted 7 days ago by Profile Dave Jackson
Post:
Yes as long as they get "their results".... of course if the 3000 WU's were spread 2 per system across the available Windows systems the researchers would get their 3000 WU's back faster than waiting on fewer systems with huge queued stacks of WU's waiting on long due dates.

IMHO

Bill F



I do not agree with this statement " huge queued stacks of WU's waiting on long due dates". Long due dates besides the point, how many cores a machine has is also not much of a deciding point. Store at least ___days work is set at ten and store additional work is also set at ten days of work maximum. So, how many WU's a machine gets is still a self-limiting factor. I have a twelve thread machine which gets twenty-four WU's max. They report back pretty much at the expected time, within one month.

So, where exactly are these ' huge queued stacks of WU's waiting on long due dates" sitting and sitting they are somewhere. In the old day's we used to squirrel away WU's on floppies or alternative media. Is this still going on?
Then there are crashed hard drives which take WU's with them to the grave but they still get reported.


So you have 12 WUs your working on and 12 queued and the results will be back in a month. How much better if you have 12 WUs and someone else has 12 WUs and the results get back in a fortnight.

The queued stacks of WUs tend to be systems where they have, say, 12 cores, download 24 WUs but then only allow 2 WUs to run concurrently alongside their other projects.


The real problem is machines that are only switched on for a couple of hours a day and grab lots of tasks. We still see tasks returning which take over a year to be completed! I for one don't really mind if a task takes two weeks or a month to complete. But a year may be after the researcher's deadline for submitting their PhD!
10) Message boards : Number crunching : New work Discussion (Message 62454)
Posted 7 days ago by Profile Dave Jackson
Post:
As if by magic, New Work Announcements - so the RAC decay is a problem for another day.

Though only for the computers that managed to grab some of them. ;)
11) Questions and Answers : Unix/Linux : Failing tasks with exit code 12 and 25 (Message 62448)
Posted 9 days ago by Profile Dave Jackson
Post:
Thanks for posting a solution.

I got confirmation from Richard, over on the BOINC forums that this almost certainly was a problem with the CPDN setup. I have informed the project so see if someone knows how to fix it their end.

I worked out after I last posted that with my system disk only being a 500MB SSD and my data disk being 1GB mechanical that I probably wouldn't see the problem. I will post your solution over on the BOINC forums in case anyone who reads them needs it.
12) Questions and Answers : Unix/Linux : Failing tasks with exit code 12 and 25 (Message 62445)
Posted 9 days ago by Profile Dave Jackson
Post:
Or you can put it in VirtualBox, as some other projects have done, which should avoid both file system and library issues.


Virtual box would as you say solve the problems of systems missing the 32bit libraries. However, there would be some performance hit and from time to time over on the BOINC boards I see users who have problems with it so it adds another layer where problems might occur. I don't know how straightforward it would be for the people at the project to set up virtual box for the Linux applications either and whether anyone there has experience of doing so.

Because of other reasons, I am going to do a clean install of Ubuntu on my laptop when the work currently on it is finished and will try using XFS to test it but, it is not a fast machine so it is likely to be over a month till I do so.

If there is anyone here using XFS who is either running tasks successfully or has the same problem it would be good if you could post to help us sort this one out and at least either confirm or disprove that XFS is the root of the problem.

Edit:If the XFS file system does prevent things working, I am a bit surprised nothing has come up on the BOINC forums when I did a search there.
13) Message boards : Number crunching : New Work Announcements (Message 62430)
Posted 11 days ago by Profile Dave Jackson
Post:
Yesterday another 600 of #869 were released. I don't know if this is because the researcher responsible only had three back and doesn't want to wait for the person who snagged the other 12 out of the original 15 or if it is because they have looked at the three and decided it is worth putting out some more?

I edited the original post first but just twigged that doesn't make it show there has been a change unless yo have another reason to click on it.
14) Questions and Answers : Unix/Linux : Failing tasks with exit code 12 and 25 (Message 62423)
Posted 18 days ago by Profile Dave Jackson
Post:
I have seen this error before looking at tasks that have failed for other reasons.

I am clutching at straws a bit here but a couple of things worth checking.

1 That you have enough disk space allocated. Unlikely this is a problem with only 8 cores.)

2. Something to do with Ram and or cache memory. If the tasks complete fine when you restrict BOINC to only 4 cores at a time then cache memory would be the most likely reason.

Sarah at the project replied to my post and thinks the strange error messages you are getting are likely not directly from the crash but because something doesn't clean up properly after the crash.

3. Just thought of this, it could be that they are crashing because you have a corrupted file downloaded. If you detach from CPDN then re-attach that will download fresh copies of all the relevant files and resolve the problem. (Might be worth trying that one first.)

It isn't that common an issue I think as I have never seen it on my own boxes and only rarely when looking through crashed tasks looking for patterns.
15) Questions and Answers : Unix/Linux : Failing tasks with exit code 12 and 25 (Message 62422)
Posted 18 days ago by Profile Dave Jackson
Post:
After looking at several of the tasks I found a few others failing with similar but not the exact same messages as you had. Took a while because most of the failures were due to missing 32bit libs. The fact that others also failed with similar errors suggests a problem with the tasks. I will let the project know.
16) Message boards : Number crunching : "No tasks sent" (Message 62419)
Posted 19 days ago by Profile Dave Jackson
Post:
There should be some way to better spread these small test batches (if that’s what this micro-batch was) around so that they don’t all get sucked into this kind of black hole machine. Otherwise this will become more and more of a problem as processors get ever larger numbers of cores.


I agree. Usually this is done via the testing site. I have I suspect the slowest machine on there and on that these tasks would take about 20 days. The machine I use for testing site runs 24/7 which the machine with the tasks clearly doesn't as it is a lot faster yet still takes 65 days to turn tasks around..

I don't know why these were sent out on the main site rather than testing. The other option if there is a good reason for using the main site for them would be to send out say 100 instead of 15 which would hopefully get enough data back quickly.
17) Message boards : Number crunching : "No tasks sent" (Message 62416)
Posted 20 days ago by Profile Dave Jackson
Post:
There are no Windows tasks (wah2) at the moment, and none on the horizon that anyone knows about.


Though as I posted a few days ago,

#869 Micro-batch of 15 pnw tasks went out on the 29th April.


Sometimes these micro-batches are the herald of a larger release though I have seen nothing on the moderator communications to suggest this on this occasion. Three of the fifteen have completed. The other twelve have all gone to a machine with an average turn around time of 65 days so even if it is the herald of a larger batch we may have to wait a while.
18) Questions and Answers : Getting started : Not receiving work for some reason (Message 62413)
Posted 20 days ago by Profile Dave Jackson
Post:
Firstly are your computers running BOINC under Windows or Linux. There has been no Windows work for a couple of months or more bar a small 15 work unit batch recently. As you computers are hidden, we can't look and check out some of the common problems.

Assuming you are using Linux, check that your computers have the 32bit libs installed. There are instructions for various distributions under the Linux section of the forums. Without these even if you get any work all the tasks will crash.

Edit: When requesting help in these forums it is always worth posting your Operating system and the version number of BOINC you are using. &.17.6 being the most recent.
19) Message boards : climateprediction.net Science : Misconfigured Machine? (Message 62407)
Posted 22 days ago by Profile Dave Jackson
Post:
Maybe you should just ban everyone until they can present a certificate of compliance.


Or competence, though on Mandrake/Mandriva which I think I was running when I first started, it wasn't always straightforward finding all the 32bit libs though I suspect I would have little difficulty tracking them down now.
20) Message boards : Number crunching : Updated BOINC Clients (Message 62402)
Posted 23 days ago by Profile Dave Jackson
Post:
Updated BOINC Clients for Win 64, Mac OS X 64 and Linux 64 have been released for public use

https://boinc.berkeley.edu/wiki/Release_Notes_for_BOINC_7.16

Bill F


Download point URL https://boinc.berkeley.edu/download_all.php

Bill F


Interesting to see the return of the self contained version for Linux. I should ask on the BOINC forums whether there is a way to put the 32bit libraries into that version. I also see that for Linux it says, "Development version, may be unstable." So it goes against the convention of development releases having an odd number as in 7.17.0 which I am running and even numbers 7.16.x being allegedly stable release versions.

Edit:7.16.0 has been around in some of the repositories for a couple of weeks or so now. What this one does mean is that when newbies ask questions about file locations etc. we will need to check how they installed their brand new shiny BOINC.


Next 20

©2020 climateprediction.net