climateprediction.net home page
Posts by SolarSyonyk

Posts by SolarSyonyk

21) Message boards : Cafe CPDN : Off-Grid Solar/Renewable Energy Discussion (Message 70728)
Posted 3 Apr 2024 by SolarSyonyk
Post:
If only I could simply have the computers suspend (or just pause Boinc) when battery voltage is under x. Sounds like a very simple thing, but unless you're a programmer, impossible.


I can think of half a dozen ways to do it. But I'm a programmer, and I operate in the Linux ecosystem.


Do you mean 5kW peak? That's not very much.


Yes. I've got 5200W of panel hung for my office in various orientations. Which is an 8' x 12' shed of about 86 square feet internal floor space (8 square meters, if you prefer).

The house system is 15.9kW of panel, though optimized for "long solar day" production vs "peak kWh per panel kW," given what I expected to happen with net metering agreements out here.

... plus another kW of panel on a solar trailer.


If it's only 5kW peak it shouldn't be far more. But then you bought it a while ago and prices are plummeting.


Yes, the bulk of my system was built 8 years ago. I added panels a few years ago (2019 or so?) with a small test A-frame before I launched into building the large house A-frames. For a DIY, code compliant, ground mound, grid tie solar install around here, $1/W is doable (I know someone who's done that), but batteries are just expensive.


The only setting I have on my ones is "power limit". I guess that would do the job, I've never tried putting it under the standard "0%". I only raise it to max (usually +20% or +50%). The +50% one then immediately melted the power connector, so I had to solder the wires on.


That adds a LOT of power consumption for only marginally additional compute power. I'd be surprised if you got more than about 10-15% extra compute from the 50% bump in power limit. If you're power limited, go the other way. I run my 2080 (240W nominal power limit) at 125W quite a bit of the time, and it definitely isn't halving compute power.
22) Message boards : Cafe CPDN : Off-Grid Solar/Renewable Energy Discussion (Message 70725)
Posted 3 Apr 2024 by SolarSyonyk
Post:
Ah, so pretty much manual then, I'll stick to that too.


Yup. I poke the "suspend time" a couple times throughout the year, will manually change things as needed, but I've avoided too much automation on it. I just don't find automation helpful for a lot of this stuff, and I've seen it go badly wrong more than a few times. It's not a production system, it's a hobby system, but "the rest of my office" is a production system (I work from here), so I err on the side of battery state of charge when needed.


You must live in an alternate reality where weather forecasting is remotely correct.


It's not, but it generally will give me day or two ahead guidance on total solar flux. I don't really care if it's raining or not, I care if there's a heavy cloud layer or not, and the forecast for that is at least better. Sometimes. Or sometimes I'm wrong and hook the backup power up. But I'm seriously overpaneled out here, so can make up a lot of it on "Well, I screwed that up, but 5kW of panels solves a lot."


... but that was 3 grand (UK money) of panels and batteries


I haven't totaled up my system... it's certainly far more, but I also have a system suited to a small house on and around my shed out here. It's also partly R&D for me on other projects - knowing my way around off grid power systems is useful for some other side projects I have.

I'll probably start buying some newer more efficient CPU/GPUs instead of more solar.


Take a look at the power settings on your CPUs. If I'm not mistaken, you have some 3900X chips too - I've disabled turbo on mine, because it adds a lot of power consumption for rather marginal additional compute throughput when the system is loaded up (and on hotter days, it starts throttling more) - I've seen a net loss in throughput, at higher power consumption (on a Wraith Pro cooler, that genuinely can't keep up with a loaded 3900X unless it's very cold in here).

The same goes for GPUs - you can lower the power use on them, and substantially increase compute-per-joule.
23) Message boards : Cafe CPDN : Off-Grid Solar/Renewable Energy Discussion (Message 70720)
Posted 3 Apr 2024 by SolarSyonyk
Post:
But you can have a sunny week or a rainy week, where presumably you adjust something. Do you get computers to run for half the day? Or reduce the cores used?


If it's a sunny week, there's nothing to adjust. The computers just run. If it's particularly hot, I may let them sleep overnight to avoid cooling loads, though I'm planning to move some hardware into a box outside with filtered vent fans this year to help reduce the summer cooling demand.

If it's rainy, I'll only run one computer at a time, or let them sleep, though I try to avoid grabbing too many short-lived tasks if I know the weather is going to be cloudy in the next week or so - I just set my machines to not grab new tasks, drain them out, and wait until I have more power.


Reducing the number of computers running causes tasks to not get done for a week, which is no good.


That depends entirely on what the deadlines are. If it's a week deadline, it's probably a fairly short (4-6h task). If it's got a 90 day deadline, not working for a week doesn't matter much. Though outside the dead of winter, it's rare to get a week of "no compute at all." It's more a day or two at a time where I'll let the machines idle.

But I can also swap around which machines run - a few of them share a power supply, so I try to run those machines together, and just toggle between what machines run one day to another to help spread the load out. Lots of ways to solve it.


I'm on windows and I'm not sure this can be automated at all.


I don't automate any of this beyond "Sometimes I have a cron job that will S3 suspend machines at a certain time of the evening." I used to have wake on LAN capability to power machines on remotely (from either powered off or to wake them from sleep), but I just walk out and turn the machines on physically now.


What I'd really like is a way to tell windows "you're on battery", which will suspend Boinc, or even the computer (although the computer would require manual turning back on if I went that far).


If you really want to do it, you can do wake on LAN with most hardware - you'll have to tweak some settings in your BIOS or NIC or something, but I had a Windows box that would wake-on-LAN just fine years back.

But if I wanted to automate more, I'd just have scripts on the systems that poll the state of my battery (I have a VM that reads charge controller state and reports useful things) and act locally. I've just not found it to be worth the hassle, over doing it manually and keeping an eye on things. If I miss a couple hours of compute because I'm out, oh well. So be it.
24) Message boards : Number crunching : New Work Announcements 2024 (Message 70716)
Posted 3 Apr 2024 by SolarSyonyk
Post:
It's great there's new work, but it'd be even better if it would actually let machines requesting it HAVE some.


At least as of right now, there's nothing meaningful in the unsent queue. The Windows machines tend to suck up the tasks in a prompt hurry.
25) Message boards : Cafe CPDN : Off-Grid Solar/Renewable Energy Discussion (Message 70698)
Posted 2 Apr 2024 by SolarSyonyk
Post:
Solar and Boinc is no good for long tasks. Solar requires variation of the computer power.


My track record of tasks completed says, "It works just fine the way I do it." https://www.cpdn.org/results.php?userid=744912 - I don't think I've set that or my computers to be private.

I don't try to match instant power demand to solar production, my machines are either running at rated load (typically either 8 or 12 tasks on a 12C/24T AMD chip, loading up the hyperthreads reduces total instructions retired per second which is what I try to optimize), or they're in S3 suspend, which is the "everything in RAM" suspend state. It works fine with native Linux tasks, it works fine with Windows tasks in a VM, and it works fine with the old 32-bit MacOS Intel tasks that showed up a while back. I vary power demand by modulating how many machines are powered on at any given point in time, and I still meet the deadlines even doing this - though I'm not inclined to pull too many CPDN tasks during the dead of winter. The tasks are never suspended or resumed - only the full system, which doesn't trigger the suspend/resume problems a number of the binaries have. From their perspective, they run continuously from start to end.

I've been doing it this way for 8 years now. And at this point, given a string of sunny weather, I have enough battery bank to actually just run tasks throughout the night and make up for it the next day.

But there's also another thread, specifically related to off-grid solar, better suited to informing me how what I'm doing can't possibly be working.
26) Message boards : Cafe CPDN : Off-Grid Solar/Renewable Energy Discussion (Message 70696)
Posted 2 Apr 2024 by SolarSyonyk
Post:
Grabbed a few for my VMs, and they're running with no trouble. Of course, as soon as there's new work, I've got half a week of clouds, rain, and snow coming...
27) Message boards : Number crunching : Should full credit be given for time on non successful tasks? (Message 70684)
Posted 27 Mar 2024 by SolarSyonyk
Post:
Really just floating it to get an idea of whether it would deter those who crash most tasks even if they gain a substantial amount of credit first.


Given how hard it was to get people to just install 32-bit libraries, I don't think most people interact with a project beyond "Select it in the BOINC add project interface and let it run."

I would doubt that anyone crashing many tasks has ever even posted on the forum, or even gone to look at the forum threads. The forum is filled with the sort of people who don't breathe on their computers too hard, in case it crashes a W@H task (seriously, Glenn, thank you so much for your work on improving reliability!). ;) So I'm not sure if it really matters - will crashers even notice zero credit for tasks?

I support aligning "credit rewarded" with "scientific usefulness of results," but I also don't think it's worth a lot of effort to change things up in attempt to get through to people whose computers simply don't work.
28) Message boards : Number crunching : Should full credit be given for time on non successful tasks? (Message 70679)
Posted 27 Mar 2024 by SolarSyonyk
Post:
I don't mind a system that's credit only for completed tasks, once the tasks are well enough behaved that they don't regularly crash of their own accord, and assuming that "world goes physically impossible" is still considered a completed task.

"Me getting no credit for a failure of one of my machines in terms of hardware or configuration" is totally fine with me, but I don't think "The task crashes because of code issues or world physics issues" should lead to no credit.

How scientifically useful are trickles of partially completed tasks? If they're of no substantial value, then not giving credit until the work is properly done makes sense to me.
29) Message boards : Number crunching : New Work Announcements 2024 (Message 70658)
Posted 18 Mar 2024 by SolarSyonyk
Post:
sigh

WCG's feeder is down, Rosetta's out of tasks again, and my compute rigs are all...



There's a lot of silicon impatiently waiting for things to chew on! Windows, Linux, MacOS 32-bit Intel, whatever!

And it's sunny enough, with the new battery bank, to be able to mostly crunch 24/7 this year!
30) Message boards : Number crunching : processors, memory, performance and heat. (Message 70617)
Posted 6 Mar 2024 by SolarSyonyk
Post:
I've got a pair of 3900Xs that do most of my computation (12C/24T), and I've found that I see almost no "net system throughput" improvements between 8 and 12 threads running with CPDN tasks - it may be marginally faster at 12, but not by much (mine are typically retiring 50-60G instructions per second when loaded). Going up past 12 actually reduces net system throughput. I think turbo might increase that slightly, but I generally keep it disabled to avoid the corner of "tons of extra power for a slight bit extra performance."

There doesn't seem to be any benefit to hyperthreading with CPDN tasks (making sense, they're floating point/vector engine heavy), and they seem to prefer "enough cache" - though I think there's still a ton spilling to main DRAM, based on counters on my Intel boxes.

I don't care a bit about single threaded speed for CPDN, just total system throughput. But Dave clearly needs some test chips! ;)
31) Message boards : Number crunching : WaH batches 996 & 1001 have been closed (Message 70609)
Posted 5 Mar 2024 by SolarSyonyk
Post:
It's not 996 or 1001, so crunch it, far as I know.
32) Message boards : Number crunching : WaH batches 996 & 1001 have been closed (Message 70606)
Posted 5 Mar 2024 by SolarSyonyk
Post:
Yeah, I understand they're exceedingly sensitive to disturbances and such, with non-linear follow on effects. I just work in a space where if the code generated different results based on the compiler, we'd be running down the bugs. But I also like to think x86 floating point, even vector, is well enough defined that you shouldn't get differences between chips, and I'm aware that's a falsehood - I just don't work in floating point spaces. Just interesting. I suppose if it's reordering some of the rounding operations and such you can get subtly different output from a series of operations. I like my computers deterministic, darn it! :p

*rant* I just don't understand how a "Windows Update" can be allowed to stop and restart the system anytime, idiotic..


There's probably some way to disable it. I don't really "do Windows" anymore, so I'm not sure how to do it. I have Linux compute rigs, and when there's non-Linux work (Windows, 32-bit Intel Mac, etc), I spin up VMs for the duration of the work, and then destroy them when done, because I don't have enough disk space to store all of them on the compute rigs, and "copying VMs around between hosts" causes some very interesting failures when two systems are identical enough that they get the same computer ID and start smashing each other's work allocation.

What's extra double special is that unless you change some other notification settings, it's likely to install updates, reboot, and then sit at the "But would you pretty please make an Online Microsoft Account????" nag screen (which doesn't allow any compute to start). No, you blasted OS, I created an offline account, through your increasingly troublesome process (now you have to actually not have a network connection at all to even see the option), because I wanted an offline account!
33) Message boards : Number crunching : WaH batches 996 & 1001 have been closed (Message 70600)
Posted 5 Mar 2024 by SolarSyonyk
Post:
Comparison shows that the newer v8.29, recently recompiled, produced slightly warmer temperatures in the winter months, compared to the old version 8.24. The differences are not statistically significant (and not unexpected).


Good news! Is there a reason model result changes were "not unexpected"? Fixing correctness issues shouldn't alter results... I'd think... but the WaH stuff seems a bit special case as far as code goes.


WaH v8.29 is much more stable with very few hard fails and correctly restarts on a host power cycle. There will be no more new batches using WaH v8.24.


Even better news! I won't miss Windows Update or a power outage trashing a CPU-month or two of work.

Thank you so much for your work on improving this code!
34) Message boards : Number crunching : processors, memory, performance and heat. (Message 70574)
Posted 29 Feb 2024 by SolarSyonyk
Post:
I just think reducing the number of cores in use is a better option.


That also helps with improving cache-per-task, which can make a big difference in per-task performance.

Though on most systems, if you're having thermal problems, the right answer is to tweak the power limit settings in the BIOS or with some mainboard utilities. You can clamp that down and not worry about what's loading up the cores, and you usually get a pretty nice boost in compute-per-watt.
35) Questions and Answers : Unix/Linux : New Work Coming? (Message 70571)
Posted 29 Feb 2024 by SolarSyonyk
Post:
Excellent. I'll be moving some compute rigs of mine outdoors this summer so I can run more compute, and I've got more battery bank that should let me run them overnight - so I'm estimating being able to run a couple 3900X systems plus a few older Intel boxes 24/7 on use-it-or-lose-it solar.
36) Message boards : Number crunching : WaH v8.29 bug leaves files behind in BOINC/data/projects/climateprediction -- please delete by hand (Message 70549)
Posted 24 Feb 2024 by SolarSyonyk
Post:
The OpenIFS tasks were designed this way and I'm planning on making this change for WaH too.


Yay! More code quality of life tweaks from Glenn! :D
37) Message boards : Number crunching : WaH v8.29 bug leaves files behind in BOINC/data/projects/climateprediction -- please delete by hand (Message 70543)
Posted 23 Feb 2024 by SolarSyonyk
Post:
Ah, thanks. Found it, I'll clean it up at some point after tasks are done.

... and c:\Programdata is a hidden folder. So helpful, Microsoft... so helpful.
38) Message boards : Number crunching : WaH v8.29 bug leaves files behind in BOINC/data/projects/climateprediction -- please delete by hand (Message 70539)
Posted 23 Feb 2024 by SolarSyonyk
Post:
Where is this directory usually located for Windows machines?
39) Message boards : Number crunching : New Work Announcements 2024 (Message 70538)
Posted 23 Feb 2024 by SolarSyonyk
Post:
The anonymous owner grabbed 16 tasks for their 8 core Ryzen on 19 February: 13 of them have returned just one trickle each since then. One has crashed, showing many, many quit requests from BOINC: I got the resend, which is how it came to my attention.


Either the tasks will crash, (eventually) complete, or timeout and be reassigned. The timeout is only 3 months now on the tasks, so they won't sit churning for a year like they used to. I'm certain there are plenty of Windows machine owners who've forgotten they have BONIC requesting CPDN tasks, it's been so long since there's been a solid supply of them (the past few, if I recall, were grabbed within a day or two - not that this set lasted too much longer).

Some of my machines compute a lot faster than others, though I try very hard to make sure they're making steady progress (they S3 suspend at night because I'm doing the work from an off-grid solar install - though my new battery pack is letting me run some of them through the night).

Use at most x% of cpu time should be removed from boinc source code.


I believe the core BOINC software forums would be a good place to discuss that. Does it interact particularly badly with CDPN tasks or something? I've never actually used it.
40) Message boards : Number crunching : Time taken anomaly. (Message 70529)
Posted 22 Feb 2024 by SolarSyonyk
Post:
The fortran models call C++ and shared memory is used for the 3 processes to talk to each. Would need work to check sizes.


Ew. :( Yeah, that's not going to be trivial, then.


Ok, looks like best to move to 64bit first. The other option would be to distribute on macOS as a VM. More urgent things to do first. Interesting discussion though.


As you're porting to 64-bit, it's worth keeping MacOS-isms in mind - may as well prepare for it while you're in there. I don't think distributing MacOS as a VM is viable from a licensing perspective, and most hypervisors won't run it out of the box either - there's some hacking around and odd configuration to do to set the environment up right. As much as "Old MacOS VMs" would be slick, it's probably not really viable. Easier to just get everything into the modern, 64-bit world, and then look at Apple Silicon support from there. If it's FORTRAN and C++, and not vectorized, it shouldn't be too hard to get it working over there. But projects for a later date! Or when someone else has the time to throw at it.


Previous 20 · Next 20

©2024 climateprediction.net