climateprediction.net (CPDN) home page
Thread 'New work Discussion'

Thread 'New work Discussion'

Message boards : Number crunching : New work Discussion
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 54 · 55 · 56 · 57 · 58 · 59 · 60 . . . 91 · Next

AuthorMessage
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4541
Credit: 19,039,635
RAC: 18,944
Message 63344 - Posted: 19 Jan 2021, 12:28:07 UTC

They certainly didn't last long and gien the number of N216's in the queue, it will be a while till I get to test the Ryzen with Windows tasks running under WINE.
Please do not private message myself or other moderators for help. This limits the number of people who are able to help and deprives others who may benefit from the answer.
ID: 63344 · Report as offensive
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 63345 - Posted: 19 Jan 2021, 13:53:17 UTC - in response to Message 63340.  

12? [giggle] Sorry. I have 89 on 7 computers. Global warming is occurring inside my house, these things make a lot of heat!
My machines are 3 laptops with only 4 cores (2 physical and 2 hyperthreaded) each. So 12 is a lot for me. Also I managed to snag 10 more before the supply ran out.
Sorry if I sounded rude. I spend way too much money on computers (and parrots), but they're 2 of my 3 hobbies. The other is hillwalking/outdoor swimming/canoeing/etc, which costs a lot in petrol and car maintainance for travel. The parrots are supposed to breed to earn their keep, and the computers heat the house. If only there was a way to make computers run as air conditioners in summer, you'd have to take the heat off the CPU to run a refrigeration pump through some kind of steam engine system.
ID: 63345 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 63347 - Posted: 19 Jan 2021, 19:43:12 UTC

The 3 Windows batches are showing early signs of failures, but no hard fails yet, so I can't see what the problem is.
ID: 63347 · Report as offensive
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 63348 - Posted: 20 Jan 2021, 0:06:42 UTC - in response to Message 63347.  

The 3 Windows batches are showing early signs of failures, but no hard fails yet, so I can't see what the problem is.

One of my PCs grabbed one of that first batch. It errored out with a Signal 11 between the 5th and 6th trickles.
ID: 63348 · Report as offensive
KAMasud

Send message
Joined: 6 Oct 06
Posts: 204
Credit: 7,608,986
RAC: 0
Message 63349 - Posted: 20 Jan 2021, 3:56:42 UTC - in response to Message 63332.  

If I may add to this discussion on WU's crashes and system reboots. I have two computers running the same version of Boinc. When I reboot (which I have too) I normally exit Boinc beforehand. One of the computers was crashing WU's. What in my case was happening, on exiting Boinc on this computer the WU's were still running in Taskmanager? Now, what will happen if upfront you have exited Boinc but the WU's are still running and the system goes through a reboot? I have again reinstalled Boinc but the problem is still there. The defaulting computer is an Acer while Dell is cooperating. I have still not been able to solve this one. The version of Boinc is the current one.
When you close Boinc, do you get asked "do you wish to stop tasks running"? If not, you may have ticked a "don't ask again" box in there. If you don't get the dialog, in Boinc Manager go to options menu, other options, general tab, enable manager exit dialog.

Also, make sure you have: Options menu, computing preferences, disk and memory tab, "leave non-GPU tasks in memory when suspended". This stops the climate tasks screwing up if Boinc pauses them to run another project.

________________________
Look at my date of joining carefully. I am not a child learning his first steps. No, this is a Boinc fault, platform dependent. Instead of obfuscating an observed fault, it would be better if others whose WU's are crashing should check-up in their Taskmanager when exiting. If more people observe this fault then either I or Les can take it up on the Boinc Developers Forum. Just now it is an isolated incident.
ID: 63349 · Report as offensive
KAMasud

Send message
Joined: 6 Oct 06
Posts: 204
Credit: 7,608,986
RAC: 0
Message 63350 - Posted: 20 Jan 2021, 4:05:47 UTC - in response to Message 63341.  

KAMasud

The last few versions of BOINC have had some features removed, to meet the requirements of some organisations that don't want people to be able to fiddle.
This may be why your tasks are still running when you exit.
You'll need to check what options you have in the menu, possibly under File.

As for cpdn models, they each have a lot of files open, which all need to be saved before shutdown.
If shutdown occurs in the middle of a model doing a save, then some of what is saved is "old", and some is "new", and the program can't restart that model.

____________________________
Thank you Les, I had just given an isolated observation and am waiting for others to observe in their Taskmanager when Exiting if this phenomenon is true on other computers also.
This is going on on an Acer Predator. Then either one of us can take the issue up, on the Boinc Forums.
ID: 63350 · Report as offensive
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 63351 - Posted: 20 Jan 2021, 13:44:54 UTC - in response to Message 63347.  

The 3 Windows batches are showing early signs of failures, but no hard fails yet, so I can't see what the problem is.
I have 6 of them up to 39% without any sign of an error at this end. https://www.cpdn.org/results.php?hostid=1512477 All I can see is they've done 4 trickleups, can you get more details at your end?
ID: 63351 · Report as offensive
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 63352 - Posted: 20 Jan 2021, 13:48:21 UTC - in response to Message 63348.  

The 3 Windows batches are showing early signs of failures, but no hard fails yet, so I can't see what the problem is.

One of my PCs grabbed one of that first batch. It errored out with a Signal 11 between the 5th and 6th trickles.
That's some speed you do those. What's your secret? That computer shouldn't be that much faster than mine!
ID: 63352 · Report as offensive
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 63353 - Posted: 20 Jan 2021, 13:52:03 UTC - in response to Message 63349.  
Last modified: 20 Jan 2021, 14:06:11 UTC

Look at my date of joining carefully. I am not a child learning his first steps. No, this is a Boinc fault, platform dependent. Instead of obfuscating an observed fault, it would be better if others whose WU's are crashing should check-up in their Taskmanager when exiting. If more people observe this fault then either I or Les can take it up on the Boinc Developers Forum. Just now it is an isolated incident.
Sorry, but I have no way of knowing your technical expertise. The date is meaningless, there are people who joined Boinc right at the start who know nothing about computers at all. I can agree there will be a Boinc fault, it's riddled with bugs. There's meant to be some rewritten 64 bit only version coming out soon?

As for your query, I just tried it on my Ryzen, which has 23 CPDN tasks running. All of them stopped and disappeared from the task manager within a second or two. As for other projects, I regularly do the same as you and have never seen something stay running on any of my 7 vastly differing machines, apart from on computers with a slow disk, if the task is virtualbox based, it takes a while to stop, and sometimes even though I can see no sign of life, rebooting windows tells me Virtualbox has "active connections". Ignoring that doesn't seem to harm anything.
.
ID: 63353 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4541
Credit: 19,039,635
RAC: 18,944
Message 63354 - Posted: 20 Jan 2021, 14:39:41 UTC

One of my PCs grabbed one of that first batch. It errored out with a Signal 11 between the 5th and 6th trickles.

That's some speed you do those. What's your secret? That computer shouldn't be that much faster than mine!


Yours has half the cache George's does and he was only running one task at once. I know the Windows tasks don't thrash the level3 cache the way the N216 resolution Linux ones do but unless you have an i/o bottleneck on your system which I think unlikely I would guess that is the answer. As you say, the actual cpu processing power doesn't seem that different.
Please do not private message myself or other moderators for help. This limits the number of people who are able to help and deprives others who may benefit from the answer.
ID: 63354 · Report as offensive
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 63355 - Posted: 20 Jan 2021, 15:29:40 UTC - in response to Message 63354.  
Last modified: 20 Jan 2021, 16:08:52 UTC

One of my PCs grabbed one of that first batch. It errored out with a Signal 11 between the 5th and 6th trickles.
That's some speed you do those. What's your secret? That computer shouldn't be that much faster than mine!
Yours has half the cache George's does and he was only running one task at once. I know the Windows tasks don't thrash the level3 cache the way the N216 resolution Linux ones do but unless you have an i/o bottleneck on your system which I think unlikely I would guess that is the answer. As you say, the actual cpu processing power doesn't seem that different.
Yes GEN3 improved the cache quite a bit. Although benchmarks don't show it's that much faster, I guess it is for CPDN. I didn't buy one because they cost a lot more for not much more benchmark speed, and are also rare to find! A lot of UK stores have a waiting list of 2 weeks or more, which I wasn't prepared to accept.

Are there any utilities for Windows that monitor how much the RAM is being accessed, number of cache misses etc? Something similar to Linux's perf. I can only find an Intel utility, which would obviously be useless for my Ryzen.

Also, I'm using HT, I'm guessing George isn't. More throughput overall, but less per task.

When you say i/o bottleneck, do you mean disk? I have SSDs (not NVME) on some and hard disks on others. The Ryzen is SSD, the i5 (which is furthest ahead as it's faster per core) is HDD. I can't see much disk activity even on the slow ones. They both have dual channel memory as fast as the CPU will take (no overclocking).
ID: 63355 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4541
Credit: 19,039,635
RAC: 18,944
Message 63356 - Posted: 20 Jan 2021, 17:46:16 UTC

Are there any utilities for Windows that monitor how much the RAM is being accessed, number of cache misses etc? Something similar to Linux's perf. I can only find an Intel utility, which would obviously be useless for my Ryzen.


Task manager lets you see how much cache memory L1,L2,L3 each process is using. If it is maxed out that would give you a clue.

Not that surprised that disk activity isn't the bottleneck, I only raised it as a slight possibility.

I am sure I read somewhere about a utility to check on cache usage for Windows but it is not something I paid a lot of attention to not having had Windows on my computers this century.
Please do not private message myself or other moderators for help. This limits the number of people who are able to help and deprives others who may benefit from the answer.
ID: 63356 · Report as offensive
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 63357 - Posted: 20 Jan 2021, 18:22:29 UTC - in response to Message 63356.  

Are there any utilities for Windows that monitor how much the RAM is being accessed, number of cache misses etc? Something similar to Linux's perf. I can only find an Intel utility, which would obviously be useless for my Ryzen.
Task manager lets you see how much cache memory L1,L2,L3 each process is using. If it is maxed out that would give you a clue.
I thought it might, but I can't find that function anywhere.
ID: 63357 · Report as offensive
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 63358 - Posted: 20 Jan 2021, 18:43:00 UTC - in response to Message 63355.  

Are there any utilities for Windows that monitor how much the RAM is being accessed, number of cache misses etc? Something similar to Linux's perf. I can only find an Intel utility, which would obviously be useless for my Ryzen.

I'm not sure how useful it would be, but Windows Performance Monitor has more than one section on cache. Fooling around with it, it looked like more than I wanted to dig into.
ID: 63358 · Report as offensive
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 63359 - Posted: 20 Jan 2021, 19:23:41 UTC - in response to Message 63358.  

Are there any utilities for Windows that monitor how much the RAM is being accessed, number of cache misses etc? Something similar to Linux's perf. I can only find an Intel utility, which would obviously be useless for my Ryzen.
I'm not sure how useful it would be, but Windows Performance Monitor has more than one section on cache. Fooling around with it, it looked like more than I wanted to dig into.
Thanks, I didn't know that was in there. My Ryzen 3900XT with 23 of 24 threads running a CPDN each, and the other thread doing WCG, I'm getting a peak on the graph about every 40 seconds of 10000 cache faults per second (watch the weird scales that are the other way round than you'd expect), and inbetween that I see peaks of about 1000 cache faults per second every 10 seconds. I have absolutely no idea if that's bad or not.
ID: 63359 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4541
Credit: 19,039,635
RAC: 18,944
Message 63360 - Posted: 20 Jan 2021, 20:21:42 UTC - in response to Message 63359.  

This might help
Please do not private message myself or other moderators for help. This limits the number of people who are able to help and deprives others who may benefit from the answer.
ID: 63360 · Report as offensive
mikey

Send message
Joined: 18 Nov 18
Posts: 21
Credit: 6,635,794
RAC: 2,524
Message 63361 - Posted: 20 Jan 2021, 22:33:57 UTC - in response to Message 55515.  

The exact wording is important.
Does it say "Project has no work"?

********

There is a back off timer intended to limit hoarding of tasks, so that it's fairer to all users.
The timer is nominally 1 hour, but is actually 1 hour and (I think), 30 seconds.
Any contact with the server either starts the timer, or resets it to 1 hour, hence the "Don't keep clicking the Update button" advice.

And this latest lot was only 2 small batches, so the 10 thousand plus computers lying in wait for work have quickly grabbed all of them.

The Server Status page only updates about every 3 hours.


One of the problems for those of us getting work infrequently is the people with a large resource share and large cache, they fill their cache and the rest of us get the scraps BUT DON'T get me wrong...as long as they are producing valid results I have ZERO problems with this at all!!! It just means at other projects we get the units and they don't so it all works out in the end. For example I didn't get any units during the last year but I crunched ALOT of Covid units in that time frame!!!
ID: 63361 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4541
Credit: 19,039,635
RAC: 18,944
Message 63362 - Posted: 21 Jan 2021, 17:35:26 UTC - in response to Message 63348.  

The 3 Windows batches are showing early signs of failures, but no hard fails yet, so I can't see what the problem is.

One of my PCs grabbed one of that first batch. It errored out with a Signal 11 between the 5th and 6th trickles.


Each of the three most recent Windows batches have one success so far. Two on I5 machines, one on an I7 the last being one of Ian's machines.
Please do not private message myself or other moderators for help. This limits the number of people who are able to help and deprives others who may benefit from the answer.
ID: 63362 · Report as offensive
ProfileIain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1084
Credit: 7,879,198
RAC: 4,930
Message 63363 - Posted: 21 Jan 2021, 18:22:52 UTC - in response to Message 63362.  

Each of the three most recent Windows batches have one success so far. Two on I5 machines, one on an I7 the last being one of Ian's machines.

Yes! Painfully reconstructing new Windows 10 machine that finished one from batch #894 — and that was without the SSD that’s finally arrived (after being delayed by you-know-what).
ID: 63363 · Report as offensive
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 63364 - Posted: 21 Jan 2021, 19:04:13 UTC - in response to Message 63360.  

This might help
I can't see anything on cache in there. Disk cache yes, but not RAM cache.
ID: 63364 · Report as offensive
Previous · 1 . . . 54 · 55 · 56 · 57 · 58 · 59 · 60 . . . 91 · Next

Message boards : Number crunching : New work Discussion

©2024 cpdn.org