climateprediction.net home page
Posts by wolfman1360

Posts by wolfman1360

1) Questions and Answers : Unix/Linux : No work being downloaded on Linux host (Message 64820)
Posted 17 Nov 2021 by wolfman1360
Post:
Update - just randomly installed nextcloud. The next check for updates work has downloaded - 1 N216 - and begun crunching with, so far, no errors.
Your guess is as good as mine, at this point. I will test on the 4790.
2) Questions and Answers : Unix/Linux : No work being downloaded on Linux host (Message 64819)
Posted 17 Nov 2021 by wolfman1360
Post:
This is my output with no debugging options enabled. This tells me nothing.
11/17/2021 11:15:05 AM | climateprediction.net | No tasks sent
11/17/2021 11:15:05 AM | climateprediction.net | Project requested delay of 3636 seconds
11/17/2021 12:15:42 PM | climateprediction.net | No tasks sent
11/17/2021 12:15:42 PM | climateprediction.net | Project requested delay of 3636 seconds
This is on a machine with no other projects running, all benchmarks ran, all libraries installed that I know of. It doesn't even try and fetch work to fail later.
I don't know how to fix this. There are over 4000 WUs available.
3) Questions and Answers : Unix/Linux : No work being downloaded on Linux host (Message 64817)
Posted 17 Nov 2021 by wolfman1360
Post:
@wolfman1360

I'm not sure what is going on. First things first on another of your computers, your Core 2 8400 is requesting work and all tasks are failing. The stderr on the failed task webpages states a missing library. Try to reinstall the needed compatibility libraries for that version of Ubuntu.

Libraries installed, my apologies. That's a machine I generally set and forget - last I checked it was getting work just fine, but I think I absently installed Ubuntu 20.04 and forgot to update those.

On the i7 4770, the benchmark scores are 1 billion ops/sec for both floating point and integer. While this would not be causing your work fetch problem, it would give ridiculously large estimates for time to task completion in boinc manager. Run a boinc benchmark when you get a chance.

Will do right now. Benchmarks are in progress.
How many other projects are the two work fetch problem computers attached to? And did you attach to the other projects before cpdn?

The i7-4790 is aalso attached to WCG. The 4770 is attached to nothing but CPDN and wcg was attached before CPDN. Oddly enough a tiny little i3-2130 that I just attached recently is getting work alongside WCG.

Perhaps suspend all other projects in boinc manager and then allow boinc to try to request work from cpdn. I'm not sure if that will do anything, but it's worth a try.

I can do this. Left both machines sitting over night - the 4770 is still completely idle with work fetches attempted throughout the remaining time.
I really don't understand what is going on or what I am missing to make this work.
4) Questions and Answers : Unix/Linux : No work being downloaded on Linux host (Message 64815)
Posted 17 Nov 2021 by wolfman1360
Post:
I am once again having this problem.
Machines 1524147 and 1524121 are not able to download work, despite work being available. Tried rebooting without success.
Ubuntu 20.04 fresh installs on both. The following libraries installed as per the 32 bit on 64 bit linux systems.
lib32ncurses6 lib32z1 lib32stdc++-7-dev.

I will have more information from the event log now that I've enabled more debug options, but right now I simply am seeing this. I do apologize if this is too much info.

11/15/2021 6:12:21 PM | climateprediction.net | Project requested delay of 3636 seconds
11/15/2021 7:13:00 PM | climateprediction.net | No tasks sent
11/15/2021 7:13:00 PM | climateprediction.net | Project requested delay of 3636 seconds
11/15/2021 8:13:40 PM | climateprediction.net | No tasks sent
11/15/2021 8:13:40 PM | climateprediction.net | Project requested delay of 3636 seconds
11/16/2021 3:44:55 PM | climateprediction.net | No tasks sent
11/16/2021 3:44:55 PM | climateprediction.net | Project requested delay of 3636 seconds
11/16/2021 11:39:31 PM | | Re-reading cc_config.xml
11/16/2021 11:39:31 PM | | Config: GUI RPC allowed from any host
11/16/2021 11:39:31 PM | | Config: GUI RPCs allowed from:
11/16/2021 11:39:31 PM | | log flags: file_xfer, sched_ops, task, cpu_sched, cpu_sched_debug, work_fetch_debug
11/16/2021 11:39:31 PM | | [cpu_sched_debug] Request CPU reschedule: Core client configuration
11/16/2021 11:39:31 PM | | [work_fetch] Request work fetch: Core client configuration
11/16/2021 11:39:31 PM | | [cpu_sched_debug] schedule_cpus(): start
11/16/2021 11:39:31 PM | | [cpu_sched_debug] enforce_run_list(): start
11/16/2021 11:39:31 PM | | [cpu_sched_debug] preliminary job list:
11/16/2021 11:39:31 PM | | [cpu_sched_debug] final job list:
11/16/2021 11:39:31 PM | | [cpu_sched_debug] enforce_run_list: end
11/16/2021 11:39:32 PM | | choose_project(): 1637127572.607807
11/16/2021 11:39:32 PM | | [work_fetch] ------- start work fetch state -------
11/16/2021 11:39:32 PM | | [work_fetch] target work buffer: 43200.00 + 43200.00 sec
11/16/2021 11:39:32 PM | | [work_fetch] --- project states ---
11/16/2021 11:39:32 PM | climateprediction.net | [work_fetch] REC 0.000 prio -0.000 can request work
11/16/2021 11:39:32 PM | | [work_fetch] --- state for CPU ---
11/16/2021 11:39:32 PM | | [work_fetch] shortfall 87971.46 nidle 0.00 saturated 68116.50 busy 0.00
11/16/2021 11:39:32 PM | climateprediction.net | [work_fetch] share 0.500
11/16/2021 11:39:32 PM | | [work_fetch] ------- end work fetch state -------
11/16/2021 11:39:32 PM | climateprediction.net | choose_project: scanning
11/16/2021 11:39:32 PM | climateprediction.net | can fetch CPU
11/16/2021 11:39:32 PM | | [work_fetch] No project chosen for work fetch
11/16/2021 11:40:31 PM | | [cpu_sched_debug] Request CPU reschedule: periodic CPU scheduling
11/16/2021 11:40:31 PM | | [cpu_sched_debug] schedule_cpus(): start
11/16/2021 11:40:31 PM | | [cpu_sched_debug] enforce_run_list(): start
11/16/2021 11:40:31 PM | | [cpu_sched_debug] preliminary job list:
11/16/2021 11:40:31 PM | | [cpu_sched_debug] final job list:
11/16/2021 11:40:31 PM | | [cpu_sched_debug] enforce_run_list: end
11/16/2021 11:40:32 PM | | choose_project(): 1637127632.758010
11/16/2021 11:40:32 PM | | [work_fetch] ------- start work fetch state -------
11/16/2021 11:40:32 PM | | [work_fetch] target work buffer: 43200.00 + 43200.00 sec
11/16/2021 11:40:32 PM | | [work_fetch] --- project states ---
11/16/2021 11:40:32 PM | climateprediction.net | [work_fetch] REC 0.000 prio -0.000 can request work
11/16/2021 11:40:32 PM | | [work_fetch] --- state for CPU ---
11/16/2021 11:40:32 PM | | [work_fetch] shortfall 88175.35 nidle 0.00 saturated 68075.60 busy 0.00
11/16/2021 11:40:32 PM | climateprediction.net | [work_fetch] share 0.500
11/16/2021 11:40:32 PM | | [work_fetch] ------- end work fetch state -------
11/16/2021 11:40:32 PM | climateprediction.net | choose_project: scanning
11/16/2021 11:40:32 PM | climateprediction.net | can fetch CPU
11/16/2021 11:40:32 PM | | [work_fetch] No project chosen for work fetch
11/16/2021 11:41:31 PM | | [cpu_sched_debug] Request CPU reschedule: periodic CPU scheduling
11/16/2021 11:41:31 PM | | [cpu_sched_debug] schedule_cpus(): start
11/16/2021 11:41:31 PM | | [cpu_sched_debug] enforce_run_list(): start
11/16/2021 11:41:31 PM | | [cpu_sched_debug] preliminary job list:
11/16/2021 11:41:31 PM | | [cpu_sched_debug] final job list:
11/16/2021 11:41:31 PM | | [cpu_sched_debug] enforce_run_list: end
11/16/2021 11:41:33 PM | | choose_project(): 1637127692.995801
11/16/2021 11:41:33 PM | | [work_fetch] ------- start work fetch state -------
11/16/2021 11:41:33 PM | | [work_fetch] target work buffer: 43200.00 + 43200.00 sec
11/16/2021 11:41:33 PM | | [work_fetch] --- project states ---
11/16/2021 11:41:33 PM | climateprediction.net | [work_fetch] REC 0.000 prio -0.000 can request work
11/16/2021 11:41:33 PM | | [work_fetch] --- state for CPU ---
11/16/2021 11:41:33 PM | | [work_fetch] shortfall 88362.31 nidle 0.00 saturated 68042.88 busy 0.00
11/16/2021 11:41:33 PM | climateprediction.net | [work_fetch] share 0.500
11/16/2021 11:41:33 PM | | [work_fetch] ------- end work fetch state -------
11/16/2021 11:41:33 PM | climateprediction.net | choose_project: scanning
11/16/2021 11:41:33 PM | climateprediction.net | can fetch CPU
11/16/2021 11:41:33 PM | | [work_fetch] No project chosen for work fetch

Edit: Since I for the first time installed minimal Ubuntu, I went ahead and installed the full desktop packages.
sudo apt install ubuntu-desktop
We will see what difference this brings, if any, over night. All other projects I have attached work without issue.
5) Message boards : Number crunching : New work Discussion (Message 63094)
Posted 4 Dec 2020 by wolfman1360
Post:
One more question - I remember seeing something about figuring out the actual computing speed by something like t/sec, but not sure where that's located or if it's still something to consider. E.g. 20 is faster than 30, I think.
I could be mistaken and I forget the specifics of what the T stands for. Surely not trickles. Maybe the steps per second or something.

The speed can be represented in units of seconds/timestep (sec/TS) and after trickles are uploaded, can be seen on each task's webpage. The lower the average number of sec/TS, the relatively faster the model is running and the less CPU time a completed model will take.

One can also see the sec/TS on running models by going into the .../projects/climateprediction.net/{task name} directory and looking at the file stdout_mon.txt, which is a log of the timesteps throughout the model run. In Linux, one can be in that directory and do a

tail -f stdout_mon.txt

and it will output a display to the terminal window continuously as the model runs. Depending on the Linux distribution and how it handles permissions for that directory, one might need to be a superuser to maneuver to that directory and tail that file.

Edit...For the same PC, the value of the sec/TS for a given model will be dependent on how complex one model type may be relative to another. So for the same PC, the sec/TS for a hadam4 N144 model will be lower than the sec/TS for a hadam4 N216 model which is run at a higher resolution.

Thank you so much!
I'm trying to figure out if it is even worthwhile to keep running these on my struggling i7-920 and Xeon w3520.
I've been trying to run them into the ground but they just won't quit. My Ryzen 3700x is absolutely flying through whatever I throw at it I may end up going for a 5000 series of some sort. It will probably do as much work as 4 of the older i7's put together.
6) Message boards : Number crunching : New work Discussion (Message 63059)
Posted 30 Nov 2020 by wolfman1360
Post:
Random question because I have forgotten over the years.
Each month is a trickle, so a 12 month task would give 12 total thus 12 bits of credit (when the credit is actually calculated and updated weekly)?
thanks!

Yes though the size of the bits of credit varies depending on the amount of computing that needs to go into it. Credit is currently updated on Thursdays but that has been a moveable feast over the years. At some point Andy plans to introduce a credit script that needs less work by the server running it at which point we should move to daily but there is no news of a date for that to happen and it is something he works on when he has spare work hours rather than being a priority task. I am pretty sure he could have sorted it by now had that been the case.

Thanks!
One more question - I remember seeing something about figuring out the actual computing speed by something like t/sec, but not sure where that's located or if it's still something to consider. E.g. 20 is faster than 30, I think.
I could be mistaken and I forget the specifics of what the T stands for. Surely not trickles. Maybe the steps per second or something.
7) Message boards : Number crunching : New work Discussion (Message 63057)
Posted 30 Nov 2020 by wolfman1360
Post:
Random question because I have forgotten over the years.
Each month is a trickle, so a 12 month task would give 12 total thus 12 bits of credit (when the credit is actually calculated and updated weekly)?
thanks!
8) Message boards : Number crunching : New work Discussion (Message 63051)
Posted 29 Nov 2020 by wolfman1360
Post:
0% after 4 days means something is seriously wrong. Have you checked how much CPU it is using in Task Manager? I suspect that whatever BOINC is telling you it isn't actually running.

Sorry, it is climbing again. What I meant was it seemed to have encountered an error to make it start over again after 4 days of work.
9) Message boards : Number crunching : New work Discussion (Message 63049)
Posted 29 Nov 2020 by wolfman1360
Post:
I forgot myself and restarted one of my windows machines. The WU at least started over but now is claiming 4 days elapsed at 0%.
These poor sandy bridge (and even a few below) are trucking along. I think some of them need to be retired soon.
10) Message boards : Number crunching : New work Discussion (Message 62994)
Posted 20 Nov 2020 by wolfman1360
Post:
Out of curiousity, will the Linux subsystem for Windows allow one to download Linux tasks for windows if the correct libraries are installed? Or how does that work?
Thanks!
11) Message boards : Number crunching : New work Discussion (Message 62125)
Posted 15 Feb 2020 by wolfman1360
Post:
@Wolfman1360
I vaguely remember discussion of Rosetta eating up l3 cache as well, but can't find the discussion anywhere.
Is this still true today and should I be limiting it alongside the n216 and n144?

Jim1348 has referred to local threads where this has come up; if you look in the threads about UK Met Office HadAM4 at N216 resolution and UK Met Office HadAM4 at N144 resolution you'll find several mentions of L3 cache bashing (especially in the N216 thread, but in this message in the N144 thread I actually replied to one of your posts, talking about workload mixes (and again in this message)... Jim1348 (and others) had some good contributions in those threads too. I don't recall many explicit references to Rosetta, but WCG MIP1 (which uses Rosetta) got some dishonourable mentions...

You may also have seen (or even participated in) threads about MIP1 at WCG -- because of the model construction it uses, the rule of thumb is that one MIP1 per 4 or 5 MB of L3 cache! I haven't got time to track those down at the moment - sorry!

For what it's worth, if you run MIP1 alongside N216 you'll see the same sort of hit as if running extra N216 tasks; N144 is nowhere near as bad!

Cheers - Al.

[Edited to fix a broken link, then to fix a typo I'd missed!]


Thanks for all of these.
So far I am seeming to be doing okay, but I may have bitten off a little more than I can chew. I have an old Dual Opteron plugging away at 3 N216 - I figure a month that they are actually worked on is better than a month of sitting there with nothing grabbing them. I am exaggerating, of course - it shouldn't take quite that long since it is a dedicated cruncher, but who knows.
I tend to stay away from MIP at WCG and have recently been crunching Asteroids at home alongside CPDN and Rosetta, though I do have one machine running TN grid and it seems to be doing fine as well. My RAC has drastically decreased but should be raising soon enough after playing with the config for CPDN. I am still being very conservative since I'd rather not have computing errors, as has happened a few times already on my Ryzen 1700.
12) Message boards : Number crunching : New work Discussion (Message 62046)
Posted 26 Jan 2020 by wolfman1360
Post:
And on my i7-9700 (which has eight full cores), it checkpoints at 23 minutes. But that is again with limiting the N216 to running on only four cores. The other four cores are on TN-Grid, which seems to be an easy project for this purpose.

In general, I find that I need to limit any of my CPUs (Intel coffee lake or Ryzen) to four cores for the N216, but can put just about anything else on the other cores without much ill effect. Beyond four cores, it drops off a cliff.

I vaguely remember discussion of Rosetta eating up l3 cache as well, but can't find the discussion anywhere.
Is this still true today and should I be limiting it alongside the n216 and n144?
13) Message boards : Number crunching : New work Discussion (Message 62034)
Posted 25 Jan 2020 by wolfman1360
Post:
The n216 tasks will definitely take a while.
My poor core 2 duo e8400 was one of the machines I forgot to run an app_config on. Consequently, as a prime example of set it and forget it, 36% of an n216 completed after 8 days of work.
An n144 on a w3530 has taken 8 days to reach 70%, on the other hand.
Of course these are examples of 10 year old hardware at this point, so this is worst case, though the machines in question do not get shut down or work suspended.
Remember there used to be single workunits completed on single core machines that in some cases took months. So long as the work is getting done and being sent up to the servers when it is completed in a reasonable (of course there is the subjective part) amount of time and can be of help to the researchers, this is what is important to me.
I may invest in a Ryzen 3600 or 3700x at some point. Not right now. May even wait for the 4000.
14) Message boards : Number crunching : UK Met Office HadAM4 at N216 resolution (Message 61973)
Posted 12 Jan 2020 by wolfman1360
Post:
It's not really about how BOINC sees it. It is complicated to answer because it has to do with how the kernel's process scheduler and memory management more data into and out of RAM to and from the CPUs. The kernel can only get data in/out of RAM at the speed of the RAM and it can only move data between CPU cores at the speed of the CPU die interconnect and it can only move data between CPU sockets at the speed of the motherboard's North Bridge controller.

My advice is to start with less and work up to more until you find a sweet spot.

That makes complete sense and clarifies a lot. Thanks. This probably goes for a lot of projects.
15) Message boards : Number crunching : UK Met Office HadAM4 at N216 resolution (Message 61967)
Posted 11 Jan 2020 by wolfman1360
Post:
Making sure I understand this correctly.
I have a dual Xeon 2670v2 with 25 MB l3 cache per processor.
Will Boinc and the workunits see this as 50 total - thus I should be able to run 10-12 concurrently? (leaning more toward less at this point)
16) Message boards : Number crunching : Dual Opteron 6128, Dual Xeon E5-2670, both, or neither? (Message 61966)
Posted 11 Jan 2020 by wolfman1360
Post:
This definitely isn't going to be something I'll be using full time or even for years to come.

Oops, I made the wrong assumption. If it's not for purpose of crunching 24/7, ignore what I said. :-P EPYC is just an example for power difference. I don't mean to add that to the choices here anyway.

Then latest generation stuff makes no sense in this case. Release of Ryzen 3 dropped Ryzen 2 to a much lower price, like <$150 during holiday season for a new 2700X. You can probably get really good deals on used ones soon. A used Ryzen 2 is likely something you want to consider instead, even if you want something decently recent. Since you have all other parts already, I would go with E5. E5 2670 should be quite better than Opteron 6128.

The office. Now that's a fantastic idea I'll have to consider....maybe this can be written off as a work expense?

LOL. I just realized "the office" is not from your reply...

Well, lucky break for me. Both of these, as it turns out, are e5-2670v2' - so Ivy Bridge with 10 real cores per cpu. Can't even consider the Opteron anymore - not that I was for very long anyway. It might do okay for WCG, but I could do better with 2 Phenom's and use less power to boot.
I'm debating grabbing up a Ryzen 5 3400g to put in a small, tiny case - though eagerly waiting on the Intel Nuc equivalent companies using AMD will come out with.
17) Message boards : Number crunching : Dual Opteron 6128, Dual Xeon E5-2670, both, or neither? (Message 61958)
Posted 10 Jan 2020 by wolfman1360
Post:
I would say neither. I grabbed dual E5-2670 when they were dirt cheap three years ago. That 700W system was less productive than my EPYC 7401P system which is like 250W and I've recycled those E5s last year. Ryzen 3 is really awesome in terms of power efficiency. It looks a bit pricer up front, but unless your electricity is free, a 400-500W saving running 24/7 will make up for it in a few month. (Though I guess you are probably not paying power bill for your office, which is nice. :-P) In addition, if you don't have the memory already, eight DDR3-1333 RDIMMs needed to populate all channels for dual E5-2670 is going to cost more than the two DDR4-3600 sticks for Ryzen 3.

This definitely isn't going to be something I'll be using full time or even for years to come.
I have 2 dual socket 2670's right now, both were bought for less than 3 cases of beer. I just need time to put everything together. I have 32 gb DDR3 ram for each as well as drives and power supplies. Ddr4 is more expensive (32 gb cost four times as much as I spent on these 4 processors and 2 motherboards. 64 GB is as much as the Ryzen processor at this point in Canada, and that Epyc you mention is about $1500 CAD on its own, even 2 years after its release.


I don't have thousands to throw around up front on machines for simply crunching - though I am saving for a new Ryzen 5 or 7 around spring or summer, or whenever I see a good deal. It also appears Ryzen's are selling out mighty quick - places around here haven't had many in supply. The ryzen G series (Ryzen 5 3400g) looks tempting too. I really like the power efficiency figures I'm seeing too. I love that AMD is putting Intel in its place.

Before I got natural gas, I was heating my place with 2 1000 w infrared heaters. At least my two space heaters will be doing something useful at least until it gets above 70 f or so outside. Old faithful fx8350 might have to be replaced with a Ryzen 3600.
The office. Now that's a fantastic idea I'll have to consider....maybe this can be written off as a work expense?
18) Questions and Answers : Unix/Linux : 64bit on Ubuntu19.10 (Message 61947)
Posted 10 Jan 2020 by wolfman1360
Post:
Curious. Does 19.10 bring any improvements to crunching over 18.04? That's all my Linux machines are used for.
These n216 tasks sure run a while.
19) Message boards : Number crunching : New work Discussion (Message 61923)
Posted 7 Jan 2020 by wolfman1360
Post:
While I would really like to test how well my CPUs can cope with multiple N216's

Your Ryzen 3700x will work fine if you limit the N216 to four at a time.

Will that work for my 1700x as well?
Unfortunately the 1800x is running Windows and I have no plans to install Linux, but does the same (running all cores) apply to WAH tasks?
Will be setting max concurrent to 8 just in case, that way WCG can go ahead and scoop up the remainder.
20) Message boards : Number crunching : Dual Opteron 6128, Dual Xeon E5-2670, both, or neither? (Message 61908)
Posted 4 Jan 2020 by wolfman1360
Post:
Your main concern (and your children's next seven generations' concern) will be "Performance per Watt"


Though this time of year, I wouldn't nmind a little extra heating in my office!

I'm likely not going to be using these full time, but I've now been offered a Nehalem xeon as well which is likely faster than the Opterons, at least.
These two Xeon's plus boards are going to cost me less than a Ryzen 3. Yes, performance per watt is terrible, but I don't plan on using these for years to come - if anything until spring or summer.
I think the sweet spot will be the Ryzen 5 3600 or 3600x. Heck, even the g series would be decent at this point if you threw them in a smaller case. I'm not even looking at Intel for a new build. AMD is stomping all over them.
That 45 w mode is very interesting - maybe a Ryzen 7 could fit in one of those SFF cases without too much fuss.


Next 20

©2024 climateprediction.net