climateprediction.net home page
Tasks by application = hoarding

Tasks by application = hoarding

Message boards : Number crunching : Tasks by application = hoarding
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Aurum
Avatar

Send message
Joined: 15 Jul 17
Posts: 74
Credit: 17,409,387
RAC: 7,944
Message 64423 - Posted: 5 Sep 2021, 14:47:36 UTC

The Tasks by Application table at the bottom of the Server Status page https://www.cpdn.org/server_status.php shows that most of the WUs are being hoarded and not even running. Just sitting there going to waste. E.g., UK Met Office HadSM4 at N144 resolution says it has no Unsent WUs but 324 In Progress with 2 users in the last 24 hours. I have 7 of those WUs and they're all running. They only run the credits once a week so how can they know who is running what in the last 24 hours??? The applications page says this project has 221 GigaFLOPS average computing (over what period). My computers running these WUs have about 5 GFLOPs each for about 35 GFLOPs total. I wonder if over 38 other computers are running the other 317 WUs??? That would be on the order of 1585 GFLOPs so it implies that most of those WUs are sitting idle. Being hoarded when someone waiting for work could be running them now.
If you can't actually complete them in the next 2 weeks you should Abort them and let others run them.
ID: 64423 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 3204
Credit: 8,978,317
RAC: 7,027
Message 64424 - Posted: 5 Sep 2021, 16:27:04 UTC - in response to Message 64423.  

The "Users in the past 24 hours" figure is actually the number of users who have completed tasks in that time I believe as opposed to the number of computers that have returned the trickle up files that are concurrent with the zip files. (I base this on times when on the testing site, I have returned trickle up files and the testing site still shows no users in past 24 hours but once I complete a testing task, it then shows one user at next server update.

The problem is with the very long deadlines that CPDN uses. If tasks were sent out with a deadline of say three months instead of typically around one year, the problem would be greatly reduced. (This has been suggested to the project by moderators on more than one occasion."
ID: 64424 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 532
Credit: 8,370,709
RAC: 14,651
Message 64426 - Posted: 5 Sep 2021, 18:03:13 UTC - in response to Message 64423.  

The N216 tasks take me about 8 1/2 days each/ the N144 ones take much less. these are the hadam4 tasks.
Currently I have 6 of these tasks and five of them are running.
22028284 	12067898 	4 Sep 2021, 22:34:50 UTC 	18 Aug 2022, 3:54:50 UTC 	In progress 	--- 	--- 	--- 	UK Met Office HadAM4 at N216 resolution v8.52
i686-pc-linux-gnu
22142652 	12043851 	3 Sep 2021, 17:40:30 UTC 	16 Aug 2022, 23:00:30 UTC 	In progress 	--- 	--- 	--- 	UK Met Office HadAM4 at N216 resolution v8.52
i686-pc-linux-gnu
22141616 	12100818 	1 Sep 2021, 3:55:45 UTC 	14 Aug 2022, 9:15:45 UTC 	In progress 	--- 	--- 	3,230.53 	UK Met Office HadAM4 at N144 resolution v8.09
i686-pc-linux-gnu
22140418 	12068902 	29 Aug 2021, 19:14:03 UTC 	12 Aug 2022, 0:34:03 UTC 	In progress 	--- 	--- 	13,636.74 	UK Met Office HadAM4 at N216 resolution v8.52
i686-pc-linux-gnu
21996027 	12050242 	28 Aug 2021, 3:06:19 UTC 	10 Aug 2022, 8:26:19 UTC 	In progress 	--- 	--- 	20,375.94 	UK Met Office HadAM4 at N216 resolution v8.52
i686-pc-linux-gnu
22044908 	12067218 	27 Aug 2021, 3:20:04 UTC 	9 Aug 2022, 8:40:04 UTC 	In progress 	--- 	--- 	20,375.94 	UK Met Office HadAM4 at N216 resolution v8.52
i686-pc-linux-gnu

ID: 64426 · Report as offensive     Reply Quote
SolarSyonyk

Send message
Joined: 7 Sep 16
Posts: 20
Credit: 13,374,073
RAC: 31,494
Message 64427 - Posted: 6 Sep 2021, 16:31:27 UTC - in response to Message 64423.  

If you can't actually complete them in the next 2 weeks you should Abort them and let others run them.


Aw. :( Do you apply that to computers actually running workunits too? I've got a total of 22 CPDN workunits running, making perfectly sane progress, but most of them won't be done in the next 2 weeks, simply because I run them as I have surplus solar - and as we go into winter, that becomes less, so I make less progress each day on the WUs (I typically can get 8-10 hours of compute per calendar day, one box makes 24h but even that's going to start getting put to sleep at nights as the days get shorter and nights get longer). The stuff that estimate as around 14d takes me closer to 2 months to compute, but they do get done, and within the timeouts provided.

If the timeout were far shorter, I simply wouldn't be able to contribute to CPDN.

However, you recognize that the BOINC reporting framework isn't really "right" for the very long running sort of tasks CPDN runs, but then turn around and use it to demonstrate evidence of hoarding? One of the two can be true, but not both.
ID: 64427 · Report as offensive     Reply Quote
wateroakley

Send message
Joined: 6 Aug 04
Posts: 118
Credit: 17,920,192
RAC: 4,236
Message 64428 - Posted: 6 Sep 2021, 19:38:06 UTC - in response to Message 64423.  
Last modified: 6 Sep 2021, 19:42:12 UTC

If you can't actually complete them in the next 2 weeks you should Abort them and let others run them.
Hmm. This i7-based PC is running four tasks in an ubuntu VM. The timescales range from about 7 days for a hadam4 to 31 days for a hadam4h. In the early days of CPDN, tasks took around 3 months on an Intel P4. I'm quite used to nursing the long deadlines, even though M$ up-chucked today, crashing all bar one of the VM tasks. I can't afford the luxury of a 10th generation AMD or Intel CPU machine. Not sure how many of us would do anything if you want us to abort the long hadam4h's or set the deadline at 2 weeks?

I'd rather not waste my time looking for ET.
ID: 64428 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 3204
Credit: 8,978,317
RAC: 7,027
Message 64429 - Posted: 6 Sep 2021, 20:34:59 UTC

I can't afford the luxury of a 10th generation AMD or Intel CPU machine. Not sure how many of us would do anything if you want us to abort the long hadam4h's or set the deadline at 2 weeks?


I wouldn't worry about the deadline being set to two weeks because it ain't going to happen.

My personal preference would be to set it somewhere between three and six months though I am sure others have their own ideas of what would be ideal. My currently dead laptop would take about a month to complete the N216 tasks which my Ryzen7 gets through in about nine days if running 5 at once. I think I can get it down to just over 7 if I restrict how many are running even more. The lower resolution N144 tasks complete in about 3 days.

However, the real issue isn't slow computers but computers that are rarely switched on. Some tasks that come back past the deadline are still on fast computers
ID: 64429 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 532
Credit: 8,370,709
RAC: 14,651
Message 64430 - Posted: 6 Sep 2021, 23:07:59 UTC - in response to Message 64429.  

My machine has a 16-core Intel(R) Xeon(R) W-2245 CPU @ 3.90GHz proicessor, but the Boinc-Client is currently allowing only 9 cores to be used; in hot weather, that is set to 8. I normally allow only 4 cores to be used for ClimatePrediction, though I tried 5 for that for a couple of weeks. With four cores running CPDN, the N216 tasks take about 8 days to complete, and with five cores they take 8 1/2 to almost 9 days, so I have cut that down to 4 at a time again. Also, the boinc client sometimes does not run 4 of those tasks at once, since then the WGG tasks do not get enough time.

I do not have any N144 tasks at the moment, but my impression is that take a little over three days each.

My machine normally runs 24/7, but I reboot it every week or two to put in OS updates. I run Red Hat Enterprise Linux 8.4
ID: 64430 · Report as offensive     Reply Quote
KAMasud

Send message
Joined: 6 Oct 06
Posts: 151
Credit: 6,238,310
RAC: 8,678
Message 64436 - Posted: 9 Sep 2021, 2:37:19 UTC

I live in arid, hot South Asia. At night when I switch on my Air conditioning(to sleep), all my threads are running. In the morning I switch off the air conditioning and as the day heats up and my machines start heating up, I start suspending tasks. Clock speed increases.
If that is hoarding, then so be it.
Anyway, the project seems to be undergoing maintenance frequently and spends most of its time in dry-dock. Hoarding is a good idea. Now, any good ideas as to how to hoard WU's?
ID: 64436 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 3204
Credit: 8,978,317
RAC: 7,027
Message 64437 - Posted: 9 Sep 2021, 5:15:39 UTC - in response to Message 64436.  

I live in arid, hot South Asia. At night when I switch on my Air conditioning(to sleep), all my threads are running. In the morning I switch off the air conditioning and as the day heats up and my machines start heating up, I start suspending tasks. Clock speed increases.
If that is hoarding, then so be it.
Anyway, the project seems to be undergoing maintenance frequently and spends most of its time in dry-dock. Hoarding is a good idea. Now, any good ideas as to how to hoard WU's?


While it is possible to hoard tasks by say temporarily setting your number of cores available to far above what you use in practice, - counting virtual ones I have 16 cores but much of the time depending on task type run only 5, hoarding can increase the time taken for work units to be returned to the project which really doesn't help the science. However having enough work in the buffer to last ten or even twenty days at the rate you get through the tasks, isn't what I count as hoarding.

Also the major problem with this is computers that are either switched off or doing other work to the extent that tasks take six months or more which often renders the results useless for those doing PhD research.
ID: 64437 · Report as offensive     Reply Quote
KAMasud

Send message
Joined: 6 Oct 06
Posts: 151
Credit: 6,238,310
RAC: 8,678
Message 64447 - Posted: 10 Sep 2021, 3:21:27 UTC - in response to Message 64437.  

I live in arid, hot South Asia. At night when I switch on my Air conditioning(to sleep), all my threads are running. In the morning I switch off the air conditioning and as the day heats up and my machines start heating up, I start suspending tasks. Clock speed increases.
If that is hoarding, then so be it.
Anyway, the project seems to be undergoing maintenance frequently and spends most of its time in dry-dock. Hoarding is a good idea. Now, any good ideas as to how to hoard WU's?


While it is possible to hoard tasks by say temporarily setting your number of cores available to far above what you use in practice, - counting virtual ones I have 16 cores but much of the time depending on task type run only 5, hoarding can increase the time taken for work units to be returned to the project which really doesn't help the science. However having enough work in the buffer to last ten or even twenty days at the rate you get through the tasks, isn't what I count as hoarding.

Also the major problem with this is computers that are either switched off or doing other work to the extent that tasks take six months or more which often renders the results useless for those doing PhD research.

________________
I was feeling a bit feline. Anyway, without hoarding I can download and keep at least forty-eight tasks minimum but as I said, the ambient temperature forces me to decide how many tasks to run at any given time. So, hoarding is not possible or feasible for me.
I know people hoard and it is irritating to the extreme but human nature and it has been going on ever since this project started up. That is why I keep bringing up these year-long completion dates. They should be slashed to four months but that is a different story and except for banging your head against a granite wall, useless to even mention. Ask Les, he is the resident expert.
ID: 64447 · Report as offensive     Reply Quote
[SG]Felix

Send message
Joined: 4 Oct 15
Posts: 19
Credit: 1,562,953
RAC: 5,043
Message 64448 - Posted: 10 Sep 2021, 10:55:13 UTC

i think we have to diferentiate between Linux Tasks and Windows Tasks here.
As I am running mostly Linux Tasks in 2 VMs, I have downloadad the number of tasks, i can run at a time + 1 extra task for every VM. Why one extra Task? if one task is finished, it could happen, that I am in the one hour window between requests to the server. So with one extra Task per VM, the next task can start, before the finished one is reported.
When there were more Windows tasks available, I usually downloadad double or tripple the amount than I could run at a time, this way they mostly lasted till the next Batch became available.

The big difference for me is, I have never seen the server running out of Linux Tasks. So they are available at every time, except for when the Server is down due to maintenance or something like this.
ID: 64448 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 3204
Credit: 8,978,317
RAC: 7,027
Message 64449 - Posted: 10 Sep 2021, 12:35:29 UTC - in response to Message 64448.  

The big difference for me is, I have never seen the server running out of Linux Tasks. So they are available at every time, except for when the Server is down due to maintenance or something like this.


Though for many years, it was the other way around. (In the period between the current, "Lots for Linux" and the days when tasks would run on Linux, Windows or Mac. It is possible that there may be periods when it goes back to lots of Windows work but that depends almost entirely on universities away from Oxford where research is being done.
ID: 64449 · Report as offensive     Reply Quote
KAMasud

Send message
Joined: 6 Oct 06
Posts: 151
Credit: 6,238,310
RAC: 8,678
Message 64450 - Posted: 11 Sep 2021, 4:12:37 UTC

I was also thinking over the matter. I am running Linux tasks, so what is the argument about, Windows Tasks? With the amount of memory these Linux tasks use, a person not in his right mind may quite possibly be hoarding a few but that is about it.
As for Windows tasks when and if they are available I might shift back from World Community Grid. They have a Climate Africa Model which runs on GPU's.
ID: 64450 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7528
Credit: 23,789,883
RAC: 3,624
Message 64451 - Posted: 11 Sep 2021, 6:49:18 UTC

None of any of this may matter.
It's the researchers who decide what happens to generated data.
If some part(s) of their data set is not being returned when they need it, they can just re-issue it under a different label, and ignore the original request if/when it eventually gets returned, months after they have produced their results.

If you want your efforts to be useful, then get the work crunched and returned as fast as possible.
ID: 64451 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2035
Credit: 55,359,893
RAC: 20,056
Message 64452 - Posted: 11 Sep 2021, 7:28:16 UTC - in response to Message 64450.  


As for Windows tasks when and if they are available I might shift back from World Community Grid. They have a Climate Africa Model which runs on GPU's.

Not to my knowledge unless it started today. And I didn't get notified of a new beta. Their OPNG COVID tasks run on GPUs, but that's it I believe. Their climate model tasks run on linux/windows/mac in 32 bit and 64 bit.
ID: 64452 · Report as offensive     Reply Quote
Bryn Mawr

Send message
Joined: 28 Jul 19
Posts: 97
Credit: 7,727,978
RAC: 20,329
Message 64453 - Posted: 11 Sep 2021, 10:45:11 UTC

Is the impression of hoarding created by the very high number of tasks shown as in progress for applications that have not issued tasks for some time and where the active users shows zero?

This, surely, is an historical issue of failed tasks that have not been crossed off the list of tasks outstanding.

Would it be possible to synchronise the number shown as outstanding which the number of tasks that are still being processed?
ID: 64453 · Report as offensive     Reply Quote
bernard_ivo

Send message
Joined: 18 Jul 13
Posts: 396
Credit: 19,304,914
RAC: 13,356
Message 64465 - Posted: 14 Sep 2021, 17:27:48 UTC - in response to Message 64453.  
Last modified: 14 Sep 2021, 17:29:51 UTC

Is the impression of hoarding created by the very high number of tasks shown as in progress for applications that have not issued tasks for some time and where the active users shows zero?

This, surely, is an historical issue of failed tasks that have not been crossed off the list of tasks outstanding.

Would it be possible to synchronise the number shown as outstanding which the number of tasks that are still being processed?


Yes, there are ghost tasks. I have two out of 8 WUs in progress. One of the ghosts was issued in 2014 and its deadline is 2023. So yeah I run it for 7 years. Several times there have been requests to clean up the ghosts. Not much result. Yes detach, reattach from the project sometimes work, but not always.

And yes a shorter deadline circa 4-6 months is completely reasonable to accommodate older machines who run other projects as well.

Reissuing tasks might be useful for researches but I've crunched numerous times batches that were no longer of interest to anyone. Yeah my machines saved the last 3rd or 5th attempt of the WU after few years idling on someone's computer. Old batches are not always pulled out.

Sometimes I had to manually abort WUs no to waste resources on WUs of no interest. Shorter deadline could fix that as well, but hey it seems too much to ask every time this pops up.
ID: 64465 · Report as offensive     Reply Quote
KAMasud

Send message
Joined: 6 Oct 06
Posts: 151
Credit: 6,238,310
RAC: 8,678
Message 64468 - Posted: 16 Sep 2021, 4:30:22 UTC - in response to Message 64452.  


As for Windows tasks when and if they are available I might shift back from World Community Grid. They have a Climate Africa Model which runs on GPU's.

Not to my knowledge unless it started today. And I didn't get notified of a new beta. Their OPNG COVID tasks run on GPUs, but that's it I believe. Their climate model tasks run on linux/windows/mac in 32 bit and 64 bit.

_______________________
Most of my COVID run on ARM architecture which has no GPU. As to the laptops, I have allowed all types to run. They come, they go. Maybe I might catch which WU is making use of my GPU. Anyway, hoarding is a zero-sum game. I can get hold of a lot of Windows tasks in cache mode. My settings are such. 24 threads, ten plus ten days but I myself mark no further WU's after 36 WU's. It is useless, selfish to grab further WU's.
As to Linux tasks, it is useless to hoard those also.
However, I have a lot of ghost WU's on my account, of which I have no knowledge or the server has no knowledge. Maybe, lost in transmission. Which reminds me of a secret internet Black Hole. WU's enter it and then vanish. I wish someone would clean up our accounts pages. My account page shows I have twenty in progress but the fact is, I only have eight WU's. The rest, I have no idea except for the Black Hole theory.
P.S. I checked, COVID 19 is quietly using my GPU on the laptops.
ID: 64468 · Report as offensive     Reply Quote
WB8ILI

Send message
Joined: 1 Sep 04
Posts: 156
Credit: 69,564,362
RAC: 32,519
Message 64469 - Posted: 16 Sep 2021, 11:28:21 UTC
Last modified: 16 Sep 2021, 11:29:36 UTC

KAMasud -

As pointed out elsewhere, on WCC the Africa Rainfall Project does NOT have GPU tasks. Only the Open Pandemics Project does.

As far as your "ghost" tasks, I am not sure what you mean by your "accounts page".

According to the CPDN server your Computer #1 (the one with the GPU) has 1 task in progress, Computer #2 (I710750H) has 7 in progress, and Computer #3 (I7-8750H) has 6 in progress. All of these tasks were downloaded in the last month or two.

If the Tasks tab in the BOINC Manager on your computer shows more that these, you have "lost" or "ghost" tasks.

To clear this up, when you have NO tasks on your computer according to the CPDN Server, go to the Projects tab and Remove the project. Wait 10 minutes. Add the Project back. This has always worked for me.
ID: 64469 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 532
Credit: 8,370,709
RAC: 14,651
Message 64470 - Posted: 16 Sep 2021, 12:07:57 UTC - in response to Message 64465.  

Yeah my machines saved the last 3rd or 5th attempt of the WU after few years idling on someone's computer. Old batches are not always pulled out.


I notice this quite frequently, when I bother to look. I am always amazed to see four failures, and yet my machine has no trouble processing the work unit.
Most recently Workunit 12043851, Workunit 12100818,
ID: 64470 · Report as offensive     Reply Quote
1 · 2 · 3 · Next

Message boards : Number crunching : Tasks by application = hoarding

©2021 climateprediction.net