climateprediction.net home page
Posts by AndreyOR

Posts by AndreyOR

1) Message boards : Number crunching : Batch 1015 Discussion/problems (Message 70825)
Posted 10 days ago by AndreyOR
Post:
It seems like that task directory & files that should go into the slots directory still goes into the projects/climateprediction.net directory. When I ran out of work a couple of days ago I cleaned out all of the older ones but when I got new work today, new ones appeared.
2) Message boards : Number crunching : Batch 1008, and test batches 1009 to 1014 for Windows - issues (Message 70729)
Posted 22 days ago by AndreyOR
Post:
All 6 on my Intel PC crashed too, the ones on AMD are humming along. Sounds like a version of the old Y2K problem, switch to a new year - crash. :-D
3) Message boards : Number crunching : Batch 1008, and test batches 1009 to 1014 for Windows - issues (Message 70702)
Posted 23 days ago by AndreyOR
Post:
I had 2 crash just shy of 12 hour mark on same PC (i7-4790) with same errors. Seems like global model is crashing?

https://www.cpdn.org/result.php?resultid=22417487
https://www.cpdn.org/result.php?resultid=22416101
4) Message boards : Number crunching : Should full credit be given for time on non successful tasks? (Message 70686)
Posted 25 days ago by AndreyOR
Post:
Richard, could you please explain the time limit exceeded error a bit? From the error log of the last failed task of host 1548623:
exceeded elapsed time limit 823339.74 (38013881.53G/46.17G)
Which is ~9.5 days which is about when the task failed.

The numbers in the fraction seem off though. The numerator is 10x the estimated computation size from task properties and the denominator is ~2% lower than the reported number of 47.15. Is that what BOINC does to set time limits, multiply the estimated computation size by 10 and somewhat reduce the measured floating point speed?
5) Message boards : Number crunching : processors, memory, performance and heat. (Message 70613)
Posted 6 Mar 2024 by AndreyOR
Post:
Looking around, seems like i9-13900K is near the top of single-core performance. Multi-core too (outside of Threadrippers) but this doesn't make a difference for CPDN. Seems like a good chip.
6) Message boards : Number crunching : WaH v8.29 bug leaves files behind in BOINC/data/projects/climateprediction -- please delete by hand (Message 70544)
Posted 23 Feb 2024 by AndreyOR
Post:
On one of my PCs I found a number of these directories from old tasks, with not only the text files but those folders too: 3 from batch 996, 1 from 1002, 2 from 1003, 2 from 1004.
7) Message boards : Number crunching : New Work Announcements 2024 (Message 70512)
Posted 22 Feb 2024 by AndreyOR
Post:
CPDN models are very floating point intensive. Since a cpu core only has one set of floating point units, two threads have to compete for resource. That's why your throughput drops.

That's probably why there can be large discrepancies between Elapsed Time and CPU Time. I've seen Elapsed Time being days longer than CPU Time.
8) Message boards : Number crunching : Completing a WU? Impossible. What am i doing wrong? (Message 70020)
Posted 30 Oct 2023 by AndreyOR
Post:
Yes, I've noticed this apparent behaviour too. I think this is down to the way the code is compiled; the compiler is told to add in conditional code for different instruction sets depending what chip it finds. For instance, instructions for SSE2/3/4.* are included. When the code executes it will use different assembly instructions depending on what capabilities it finds on the chip. Older chips therefore will not necessarily be using the same assembler as more recent chips. That's great for speed but a bugger for debugging.

The executable sent out with WaH is also old. It's not been recompiled for many years and that could also be introducing issues.

The above sounds like more likely possible explanations than some of the previous ones given, at least to me it does.

Have you or anyone else tried recompiling the executable and running CPDN via the Anonymous Platform setup (which allows you to use your own executables)? That could give some useful info in finding the problem. Anonymous Platform setup is described here: https://boinc.berkeley.edu/wiki/Anonymous_platform
9) Message boards : Number crunching : Batch 996 Weather@Home2 East Asia25 (Message 69980)
Posted 24 Oct 2023 by AndreyOR
Post:
Sardis73,

Looks different than mine. For me, both Client & Manger crash, in your case it seems to be just the Client. I've seen the Invalid Client RPC password error before but it usually happens right away when you start BOINC. The fact that yours happens some time after everything has started and been running for a while is new and a bit puzzling.
10) Message boards : Number crunching : Batch 996 Weather@Home2 East Asia25 (Message 69978)
Posted 23 Oct 2023 by AndreyOR
Post:
BOINC crashing when left unattended is a new one for me. The only times I can remember BOINC itself crashing as opposed to tasks is the recent bug that would sometimes make it crash switching between advanced and simple view then back again. I think that has been fixed now but not 100% certain.
This happens to me once in a while, I'll come check on it and find BOINC isn't running. I can't remember when the first time was, this year or last. I think this was the first time with the latest version (7.24.1). It was also the first time with CPDN running which was costly due to loss of tasks and a lot of processing time. I haven't tried to investigate it in any way yet.
If it happens again it would be good to check in the system logs to see why it failed. I'm assuming this is the Windows client only and not linux (I've never seen it myself).

Yes, it's a Windows 10 PC (I use WSL2 for any Linux BOINC work which I haven't ran for months now). I'd look but don't really know where and what to look for. I did look at Reliability History and Event Viewer after posting that first post but couldn't find anything but I'm also not exactly sure what to look for.

It's definitely not related to switching between views as I don't switch and always use the Advanced view. It's also not just the Manager crashing as that'd be easy to tell (BOINC start up, CPU temperature changes). It also isn't due to a system reboot due to some critical system component crash or power failure as that'd also be easy to tell. I have RyzenMaster & MSI Afterburner that start up first to turn on undervolt settings before other things like BOINC start and I have to manually apply the settings & close those programs before anything else proceeds so I can tell when there was a system restart.
11) Message boards : Number crunching : Batch 996 Weather@Home2 East Asia25 (Message 69973)
Posted 22 Oct 2023 by AndreyOR
Post:
and others due to BOINC seemingly crashing
BOINC crashing when left unattended is a new one for me. The only times I can remember BOINC itself crashing as opposed to tasks is the recent bug that would sometimes make it crash switching between advanced and simple view then back again. I think that has been fixed now but not 100% certain.

This happens to me once in a while, I'll come check on it and find BOINC isn't running. I can't remember when the first time was, this year or last. I think this was the first time with the latest version (7.24.1). It was also the first time with CPDN running which was costly due to loss of tasks and a lot of processing time. I haven't tried to investigate it in any way yet.
12) Message boards : Number crunching : Batch 996 Weather@Home2 East Asia25 (Message 69971)
Posted 22 Oct 2023 by AndreyOR
Post:
I am sure alot of the hard fails are simply due to this and not because of an inherent problem with the model perturbations. Not sure whether CPDN will decide to rerun them or not yet.

Yeah, 12 of my 32 failures so far have been due to BOINC restarts. Some due to an unintentional PC shutdown and others due to BOINC seemingly crashing (came to check on things and found BOINC wasn't running). Sucks as most of those have run for ~12 days and had no more than a day to go to finish. Hopefully the remaining 21 will successfully finish as I only have 5 successfully finish so far.
13) Questions and Answers : Unix/Linux : *** Running 32bit CPDN from 64bit Linux - Discussion *** (Message 68806)
Posted 23 May 2023 by AndreyOR
Post:
I was rather shocked recently to find a tool to discover what package contains a file you need to run programs. Useful if the package containing what you want changes with a new version or really useful if compiling BOINC from source.

apt-file find 
will if you type or paste the name of the missing file tell you the package you need to install to get it. Had I known of it earlier it could have saved me many hours of searching the interweb to discover what package I needed. I don't know if there is anything similar for RPM distros.

An interesting find. Some things I found from playing with it a bit is that it doesn't come preinstalled (at least not on WSL2 Ubuntu) and so needs to be installed separately. Also, one needs to create a database and keep it up to date, so a database update should be ran before trying to look up packages. Lastly, one can only look up one file at a time, I was really hoping for being able to list multiple files and have it tell you which if any packages contain all of them.

For looking up packages that contain certain files, at least for Ubuntu, I think I'd still use the Ubuntu Packages Search website https://packages.ubuntu.com/ as it'll give you the same info but it's easier to use and browse around and, presumably, the database is always up to date.
14) Message boards : Number crunching : Big credit jump! (Message 68712)
Posted 10 May 2023 by AndreyOR
Post:
The credit hike does seem a bit high. I had a bit over 80 tasks missing credit since last fall. Doing the math, it turns out to over 40k of credit per task got awarded for those tasks, which is definitely too high. I think the highest credit per task I've seen is in the 30k range and those weren't common. It looks like after the BOINC server update, credit changed for some of the OIFS task types from what was originally awarded. I guess it's OK if it was done uniformly for everyone. The biggest thing is that the missing credit issue seems to be resolved.
15) Questions and Answers : Wish list : Merge computers despite different OS (Message 68643)
Posted 11 Apr 2023 by AndreyOR
Post:
I'd like to be able to delete the old, obsolete computers in my account, some of which were scrapped over ten years ago.

This may not necessarily be a good idea as if I'm not mistaken, credit would be lost if a computer with credit is deleted. The only computers that would be safe to delete are those that for one reason or another have produced 0 total credit.
16) Message boards : Number crunching : East Asia testing. (Message 68599)
Posted 17 Mar 2023 by AndreyOR
Post:
My hope is that those in charge have learned from the effectiveness of the shorter deadlines given to OIFS tasks.

While it was helpful to add a 30 day grace period in addition to the 30 day deadline for OIFS tasks during the storage outage, it seems to me that it should've been removed once things stabilized. There are still almost 200 PS tasks out and from what I remember, the contract deadline was supposed to be end of February. BL, and regular OIFS apps also still have 100-150 tasks out each although I'm not sure if 30 days have passed yet on those.
Just so I do not be considered greedy, I do not wish to hog an unfair number of tasks, but OTOH, I do want a large enough number of tasks on my machine to coast me over the dead spots.

I agree, since work availability is not constant or consistent I also like to have a large enough cache store but not too big so that I can still finish all work by the 30 day deadline. It's not always possible as there are limits based on one's consecutive valid tasks.
17) Message boards : Number crunching : East Asia testing. (Message 68593)
Posted 15 Mar 2023 by AndreyOR
Post:
Because the region covered is much bigger than the ANZ region, these tasks will be long if they get here. Currently looking like about 50 days on my Ryzen. Slower machines will be well over 2 months even if running 24/7.

Sounds like these are Windows Hadley models?
18) Message boards : Cafe CPDN : Tropical cyclones modelling via DreamLab app (Message 68585)
Posted 10 Mar 2023 by AndreyOR
Post:
Yes DreamLab only works when power is plugged, whether charging or not. I've not noticed any temp bump from extra power. It's a well behaved app. I use both my phone & chromebook, though the CBook is significantly faster.

Sounds like it's better than BOINC on phones. Makes me wonder how much work one can get done under those circumstances though. The tasks must be tiny.

It uses a very small amount of memory. Have been trying to find out exactly what the Tropical Storms app is computing in such a small footprint but there's scant details about it on the web (other than the publicity). From what I can gather it's crunching small amounts of data rather than running a real model but it's hard to find out more details.

Yes, not a lot of details. The only thing that's mentioned is: "... the three-year DreamLab project is not about weather forecasting, but about risk assessment to help civic authorities prepare for such natural disasters." That's pretty broad so who knows what exactly is being calculated.
19) Message boards : Cafe CPDN : Tropical cyclones modelling via DreamLab app (Message 68573)
Posted 10 Mar 2023 by AndreyOR
Post:
Hopefully it's not too long of a break but I wonder if it might be.

I tried using phone for BOINC and I feel like the demand on the battery is too high and it'll wear out prematurely. Better options would probably be a PC or laptop, perhaps even a macOS or Android VM if PC is Windows or Linux.
20) Message boards : Cafe CPDN : World Community Grid mostly down for 2 months while transitioning (Message 68572)
Posted 10 Mar 2023 by AndreyOR
Post:
sigh

WCG has been hard-down again, for some hard drive failure issue or another.

To say that WCG has struggled since leaving IBM would be an understatement.

Some projects that have plenty of CPU work that I find interesting enough to run, and are not just pure math projects, are: Rosetta, Einstein, Universe, MilkyWay, LHC.

Yes, like mentioned above, Rosetta is a bit unique in that on the home page they have a stats section that lists, among other things, total queued jobs. The traditional server status page just shows the buffer from which clients get tasks, they seem to keep it at around 30k. Einstein also has that kind of info on their server status page broken down by sub-project although that section sometimes disappears.


Next 20

©2024 climateprediction.net