climateprediction.net home page
New work Discussion

New work Discussion

Message boards : Number crunching : New work Discussion
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 65 · 66 · 67 · 68 · 69 · 70 · 71 . . . 91 · Next

AuthorMessage
Bryn Mawr

Send message
Joined: 28 Jul 19
Posts: 147
Credit: 12,814,088
RAC: 261,385
Message 64759 - Posted: 2 Nov 2021, 1:46:46 UTC - in response to Message 64756.  
Last modified: 2 Nov 2021, 1:47:09 UTC

This project could easily do ten or twenty times as much work if they'd just make some improvements.
Only if it had ten or twenty times as many researchers asking Oxford to send work out for them.
Ok. This project could easily do ten or twenty times as much work per unit time if they'd just make some improvements.


The main problem with that is not owning the source code - they’re not allowed to make changes to most of it.
ID: 64759 · Report as offensive
Profile Bill F

Send message
Joined: 17 Jan 09
Posts: 116
Credit: 1,381,644
RAC: 1,646
Message 64760 - Posted: 2 Nov 2021, 13:41:21 UTC - in response to Message 64758.  

It's amazing how much math my generation can do in our heads compared to kids today that need a calculator to do the most rudimentry arithmetic.

At my engineering school, order-of-magnitude calculations were emphasized, to catch the mistakes that people did with more precise methods.
Also, it gave you a greater physical feel for the subject matter. I think many political mistakes are made by people who have not the slightest idea of the magnitude of what they are talking about.


Pretty profound, but it rings with truth.

Bill F
ID: 64760 · Report as offensive
Aurum
Avatar

Send message
Joined: 15 Jul 17
Posts: 94
Credit: 18,354,396
RAC: 6,590
Message 64761 - Posted: 2 Nov 2021, 20:55:05 UTC - in response to Message 64759.  

This project could easily do ten or twenty times as much work if they'd just make some improvements.
Only if it had ten or twenty times as many researchers asking Oxford to send work out for them.
Ok. This project could easily do ten or twenty times as much work per unit time if they'd just make some improvements.
The main problem with that is not owning the source code - they’re not allowed to make changes to most of it.
I assume the UK MetOffice owns the code. Or is it someone else?
The biggest problem I see is the CPU cache congestion problem. Running too many WUs on a computer slows it down to a snail's pace. I keep playing around trying to figure out the most CP work units I can run on a computer. I've tried disabling hyperthreading and that works better but I still can't run all CPUs because it still slows down. Besides if I can't run every CPU thread with CP then I'd like to support ARP etc. Right now as my older WUs complete I detach from CP and then reattach to sweep up the debris it leaves behind. Then I specify a max of two CPUs and under BOINC preferences use at most 33/36=92%. That leaves some headroom but it's still noticeably faster if I run only one CP WU. It's frustrating when I know I could be running 18 or more if not for the CPU Congestion Issue.
Last time I suggested this someone said they'd have to rewrite a million lines of Fortran. I'm not a coder but I would think they'd only need to modify aspects of the code.
https://www.ibm.com/docs/en/aix/7.2?topic=implementation-design-coding-effective-use-caches
"Repackaging techniques can yield significant improvements without recoding..."
https://hackernoon.com/programming-how-to-improve-application-performance-by-understanding-the-cpu-cache-levels-df0e87b70c90
This guy says his code ran 50x faster after optimizing for CPU cache usage.
I've even seen a book dedicated to efficient CPU cache coding.
ID: 64761 · Report as offensive
Aurum
Avatar

Send message
Joined: 15 Jul 17
Posts: 94
Credit: 18,354,396
RAC: 6,590
Message 64762 - Posted: 2 Nov 2021, 21:19:31 UTC - in response to Message 64757.  

What improvements do you have in mind?
Nothing even comes close to fixing the CPU cache issue but a few upgrades could make this project a whole lot more user-friendly.
I'd start by fixing the work delivery bugs. Several projects use the "Preferences for this project" page to allow the BOINCer to specify how many WUs of which project they'd like to download and maintain on their computer. Also fix the perpetual 60-minute project backoff. It makes no sense how work is delivered, it's just feast or famine. I either go days or weeks getting no WUs on a particular computer, even though the Server Status page says there's work available and another computer is getting work, or I get a year's worth of work in one delivery and must abort almost all of it. I can't think of another BOINC project that behaves this way.
16946 climateprediction.net 11/2/2021 2:14:19 PM update requested by user
16950 climateprediction.net 11/2/2021 2:14:25 PM Sending scheduler request: Requested by user.
16951 climateprediction.net 11/2/2021 2:14:25 PM Not requesting tasks: don't need (CPU: ; NVIDIA GPU: )
16952 climateprediction.net 11/2/2021 2:14:27 PM Scheduler request completed
16953 climateprediction.net 11/2/2021 2:14:27 PM Project requested delay of 3636 seconds
"Don't need" is not true. I have one 921 WU running and would like to run another. If I do get lucky and I'm blessed with a second WU I'd switch to "No new work" and switch back after one completed.

Then if it's at all possible make the checkpoints closer together.
ID: 64762 · Report as offensive
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4314
Credit: 16,378,503
RAC: 3,632
Message 64763 - Posted: 2 Nov 2021, 21:43:15 UTC - in response to Message 64762.  

or I get a year's worth of work in one delivery and must abort almost all of it.


I have never received close to even six months of work even when work cache set to maximum.

"Preferences for this project" page to allow the BOINCer to specify how many WUs of which project they'd like to download and maintain on their computer.
In the past, CPDN used to allow users to specify which types of task they could receive, N216, N144 etc though this was before those particular types of task made it onto the drawing board but you get what I mean. I and at least one or two of the other moderators would like this but we have been told it isn't going to be changed, at least in the short term.

I assume, I have never had some of the scheduling problems you have because I only run projects other than CPDN when there is no work available here.

Windows tasks all get snapped up within a couple of days of appearing or even less, so on that front the only way more work can be done is for more scientists who want to do the areas of research that is suited to that task type.
ID: 64763 · Report as offensive
Profile Alan K

Send message
Joined: 22 Feb 06
Posts: 484
Credit: 29,579,234
RAC: 4,572
Message 64764 - Posted: 2 Nov 2021, 23:42:51 UTC - in response to Message 64762.  



Then if it's at all possible make the checkpoints closer together.


In the computing preferences menu item in "Options" there is a box :-checkpoint at most every.... seconds".
ID: 64764 · Report as offensive
Bryn Mawr

Send message
Joined: 28 Jul 19
Posts: 147
Credit: 12,814,088
RAC: 261,385
Message 64765 - Posted: 3 Nov 2021, 3:46:50 UTC - in response to Message 64761.  

This project could easily do ten or twenty times as much work if they'd just make some improvements.
Only if it had ten or twenty times as many researchers asking Oxford to send work out for them.
Ok. This project could easily do ten or twenty times as much work per unit time if they'd just make some improvements.
The main problem with that is not owning the source code - they’re not allowed to make changes to most of it.
I assume the UK MetOffice owns the code. Or is it someone else?
The biggest problem I see is the CPU cache congestion problem. Running too many WUs on a computer slows it down to a snail's pace. I keep playing around trying to figure out the most CP work units I can run on a computer. I've tried disabling hyperthreading and that works better but I still can't run all CPUs because it still slows down. Besides if I can't run every CPU thread with CP then I'd like to support ARP etc. Right now as my older WUs complete I detach from CP and then reattach to sweep up the debris it leaves behind. Then I specify a max of two CPUs and under BOINC preferences use at most 33/36=92%. That leaves some headroom but it's still noticeably faster if I run only one CP WU. It's frustrating when I know I could be running 18 or more if not for the CPU Congestion Issue.
Last time I suggested this someone said they'd have to rewrite a million lines of Fortran. I'm not a coder but I would think they'd only need to modify aspects of the code.
https://www.ibm.com/docs/en/aix/7.2?topic=implementation-design-coding-effective-use-caches
"Repackaging techniques can yield significant improvements without recoding..."
https://hackernoon.com/programming-how-to-improve-application-performance-by-understanding-the-cpu-cache-levels-df0e87b70c90
This guy says his code ran 50x faster after optimizing for CPU cache usage.
I've even seen a book dedicated to efficient CPU cache coding.


Yes, it’s the Met office, not. CPDN or Boinc or the researchers we are helping.

The Met Office have no involvement in what we are doing and optimise their code to run on their mainframes. The licence we are using to run the code does not allow us to change it to suit our PCs
ID: 64765 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 64766 - Posted: 3 Nov 2021, 5:37:11 UTC

And the researchers are well aware that these models take a long time to run.
This "BOINC stuff" is only a small part of the research, more "a special treat", rather than "the main course(s)".
ID: 64766 · Report as offensive
Harri Liljeroos
Avatar

Send message
Joined: 9 Dec 05
Posts: 110
Credit: 12,038,780
RAC: 1,393
Message 64767 - Posted: 3 Nov 2021, 9:23:21 UTC - in response to Message 64764.  



Then if it's at all possible make the checkpoints closer together.


In the computing preferences menu item in "Options" there is a box :-checkpoint at most every.... seconds".

This option can only increase the time between checkpoints, not decrease it. The checkpoint interval is coded into the application, Boinc client can't force checkpoints to happen.
ID: 64767 · Report as offensive
Aurum
Avatar

Send message
Joined: 15 Jul 17
Posts: 94
Credit: 18,354,396
RAC: 6,590
Message 64768 - Posted: 3 Nov 2021, 11:57:45 UTC - in response to Message 64764.  

Then if it's at all possible make the checkpoints closer together.
In the computing preferences menu item in "Options" there is a box :-checkpoint at most every.... seconds".
That does nothing. Mine is set to 10 minutes.
ID: 64768 · Report as offensive
Aurum
Avatar

Send message
Joined: 15 Jul 17
Posts: 94
Credit: 18,354,396
RAC: 6,590
Message 64769 - Posted: 3 Nov 2021, 11:59:21 UTC

Not only does this project have the worst performing work server in all BOINCdom it's so rude. I just turned in 7 N144 completed tasks and they were recorded as Abandoned.
ID: 64769 · Report as offensive
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 925
Credit: 34,100,818
RAC: 11,270
Message 64770 - Posted: 3 Nov 2021, 12:03:33 UTC - in response to Message 64769.  

Not only does this project have the worst performing work server in all BOINCdom it's so rude. I just turned in 7 N144 completed tasks and they were recorded as Abandoned.
That'll be the fault of the server software supplied by BOINC, rather than anything CPDN has done.
ID: 64770 · Report as offensive
Aurum
Avatar

Send message
Joined: 15 Jul 17
Posts: 94
Credit: 18,354,396
RAC: 6,590
Message 64771 - Posted: 3 Nov 2021, 12:05:37 UTC - in response to Message 64763.  

or I get a year's worth of work in one delivery and must abort almost all of it.
I have never received close to even six months of work even when work cache set to maximum.

"Preferences for this project" page to allow the BOINCer to specify how many WUs of which project they'd like to download and maintain on their computer.
In the past, CPDN used to allow users to specify which types of task they could receive, N216, N144 etc though this was before those particular types of task made it onto the drawing board but you get what I mean. I and at least one or two of the other moderators would like this but we have been told it isn't going to be changed, at least in the short term.

I assume, I have never had some of the scheduling problems you have because I only run projects other than CPDN when there is no work available here.

Windows tasks all get snapped up within a couple of days of appearing or even less, so on that front the only way more work can be done is for more scientists who want to do the areas of research that is suited to that task type.

I've gotten a year's worth of work several times, most recently a couple of days ago.
The main point is to specify the number of WUs to send.
ID: 64771 · Report as offensive
Aurum
Avatar

Send message
Joined: 15 Jul 17
Posts: 94
Credit: 18,354,396
RAC: 6,590
Message 64772 - Posted: 3 Nov 2021, 12:09:25 UTC - in response to Message 64766.  

And the researchers are well aware that these models take a long time to run.
This "BOINC stuff" is only a small part of the research, more "a special treat", rather than "the main course(s)".
And it really shows by how poorly they run a BONIC server.
They're so lazy they don't even send out a Server Abort when they abandon a project. Last night I completed 7 N144 WUs and they called them Abandoned. That's shameless. That's about seven CPU months of work I could've done for a project that actually cares.
ID: 64772 · Report as offensive
Aurum
Avatar

Send message
Joined: 15 Jul 17
Posts: 94
Credit: 18,354,396
RAC: 6,590
Message 64773 - Posted: 3 Nov 2021, 12:10:54 UTC - in response to Message 64770.  

Not only does this project have the worst performing work server in all BOINCdom it's so rude. I just turned in 7 N144 completed tasks and they were recorded as Abandoned.
That'll be the fault of the server software supplied by BOINC, rather than anything CPDN has done.
Are you saying it's BOINC's fault that Oxford did not send out a Server Abort signal when they abandoned the N144 project???
ID: 64773 · Report as offensive
Aurum
Avatar

Send message
Joined: 15 Jul 17
Posts: 94
Credit: 18,354,396
RAC: 6,590
Message 64774 - Posted: 3 Nov 2021, 12:12:19 UTC
Last modified: 3 Nov 2021, 12:12:39 UTC

So I do I know that any of my work will actually be used??? How do I prevent wasting my time and money doing futile work???
ID: 64774 · Report as offensive
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1045
Credit: 16,506,818
RAC: 5,001
Message 64775 - Posted: 3 Nov 2021, 12:14:49 UTC - in response to Message 64767.  
Last modified: 3 Nov 2021, 12:19:08 UTC

This option can only increase the time between checkpoints, not decrease it. The checkpoint interval is coded into the application, Boinc client can't force checkpoints to happen.


Why would people want checkpoints closer together? If you have 8 Boinc tasks running and you could set the checkpoint interval to 8 minutes, you would be writing a checkpoint every minute on the average. How much load do you want to put on your disk system? I figure out how much I would want to re-run in case of problems. Since N216 tasks take me about a week, I would normally make the interval an hour or so.
ID: 64775 · Report as offensive
Bryn Mawr

Send message
Joined: 28 Jul 19
Posts: 147
Credit: 12,814,088
RAC: 261,385
Message 64776 - Posted: 3 Nov 2021, 13:07:42 UTC - in response to Message 64775.  

This option can only increase the time between checkpoints, not decrease it. The checkpoint interval is coded into the application, Boinc client can't force checkpoints to happen.


Why would people want checkpoints closer together? If you have 8 Boinc tasks running and you could set the checkpoint interval to 8 minutes, you would be writing a checkpoint every minute on the average. How much load do you want to put on your disk system? I figure out how much I would want to re-run in case of problems. Since N216 tasks take me about a week, I would normally make the interval an hour or so.


In the case of CPDN my systems checkpoint every 2 hours or so. If you don’t leave your system crunching 24/7 then you might well wish that to be a shorter period so that you loose less work each time you shut down.
ID: 64776 · Report as offensive
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1045
Credit: 16,506,818
RAC: 5,001
Message 64777 - Posted: 3 Nov 2021, 17:49:39 UTC - in response to Message 64776.  

In the case of CPDN my systems checkpoint every 2 hours or so. If you don’t leave your system crunching 24/7 then you might well wish that to be a shorter period so that you loose less work each time you shut down.


I did not think of people shutting their machines down often. Since I leave my machine up 24/7 except updates requiring reboots that I do every week or two.
ID: 64777 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 64780 - Posted: 3 Nov 2021, 20:40:46 UTC

From an old memory, I think that the climate models checkpoint at the end of each model year.
ID: 64780 · Report as offensive
Previous · 1 . . . 65 · 66 · 67 · 68 · 69 · 70 · 71 . . . 91 · Next

Message boards : Number crunching : New work Discussion

©2024 climateprediction.net