climateprediction.net home page
no credit awarded?

no credit awarded?

Message boards : Number crunching : no credit awarded?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 · Next

AuthorMessage
Brummig

Send message
Joined: 3 Nov 05
Posts: 26
Credit: 667,486
RAC: 1,046
Message 68244 - Posted: 10 Feb 2023, 13:28:44 UTC - in response to Message 68240.  

Jumping back to a checkpoint is usually very obvious (representing many hours of work lost), and I didn't notice that happening without obvious cause (such as a power failure or manual suspension). I wonder if it's something to do with hibernation causing a small jump back every evening, but not one that's big enough to notice on the progress? The longer a task runs, the more that would multiply up.

I have just taken a quick scan through the Universe@Home results for the host, and I can't see any sign that the virtual machine started running faster when the last CPDN task finished (unfortunately I can't go back very far, and other projects have an even shorter results record).
ID: 68244 · Report as offensive     Reply Quote
Yeti
Avatar

Send message
Joined: 5 Aug 04
Posts: 171
Credit: 10,090,895
RAC: 26,991
Message 68247 - Posted: 10 Feb 2023, 13:47:31 UTC - in response to Message 68244.  

Jumping back to a checkpoint is usually very obvious (representing many hours of work lost), and I didn't notice that happening without obvious cause (such as a power failure or manual suspension). I wonder if it's something to do with hibernation causing a small jump back every evening, but not one that's big enough to notice on the progress? The longer a task runs, the more that would multiply up.

I have just taken a quick scan through the Universe@Home results for the host, and I can't see any sign that the virtual machine started running faster when the last CPDN task finished (unfortunately I can't go back very far, and other projects have an even shorter results record).
Or you started every day with the same checkpoint because you didn't reach the next one


Supporting BOINC, a great concept !
ID: 68247 · Report as offensive     Reply Quote
Brummig

Send message
Joined: 3 Nov 05
Posts: 26
Credit: 667,486
RAC: 1,046
Message 68248 - Posted: 10 Feb 2023, 17:06:07 UTC - in response to Message 68247.  

Then I wouldn't have made any progress :)
ID: 68248 · Report as offensive     Reply Quote
solskinn

Send message
Joined: 6 Sep 05
Posts: 24
Credit: 21,529
RAC: 0
Message 68262 - Posted: 11 Feb 2023, 16:11:15 UTC
Last modified: 11 Feb 2023, 16:12:25 UTC

A question for you, but is the number for that of thread viewed increased or altered for that of non logged-in users?

Here reading the thread for accessing first, but had to login for that of posting, so here guessing that the thread counter was increased for already accessing it.
ID: 68262 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 804
Credit: 13,570,391
RAC: 6,523
Message 68263 - Posted: 11 Feb 2023, 17:14:43 UTC - in response to Message 68244.  
Last modified: 11 Feb 2023, 17:15:01 UTC

Jumping back to a checkpoint is usually very obvious (representing many hours of work lost), and I didn't notice that happening without obvious cause (such as a power failure or manual suspension). I wonder if it's something to do with hibernation causing a small jump back every evening, but not one that's big enough to notice on the progress? The longer a task runs, the more that would multiply up
It's not much repeated (or 'lost') work for the OpenIFS tasks. It's about 5-15mins depending on your CPU. It won't be obvious in the %age progress because it makes a tiny difference.

As hibernate puts the contents of RAM to swap, yes, that will push the model out of memory (true for any boinc task) causing it to do a restart from checkpoint when the machine wakes up (I usually 'suspend'). However, if you only hibernate once a day, that's not going to make much difference. The task will still do plenty of work whilst the machine is awake. To run as slow as you noticed, it's got to be frequently dropping out of RAM, or, you have alot of CPU contention and the task is barely running. Perhaps try watching it on top or htop and see how much of the machine resource it gets?
ID: 68263 · Report as offensive     Reply Quote
bullschuck

Send message
Joined: 22 May 21
Posts: 37
Credit: 496,060
RAC: 1,924
Message 68360 - Posted: 16 Feb 2023, 13:47:53 UTC

Back to the original question for this thread. Any news on awarding credit for completed HADCM3 tasks?

Further news on proposed Mac Intel IFS tasks would be appreciated as well.

Thanks!
ID: 68360 · Report as offensive     Reply Quote
bullschuck

Send message
Joined: 22 May 21
Posts: 37
Credit: 496,060
RAC: 1,924
Message 68451 - Posted: 24 Feb 2023, 19:27:15 UTC - in response to Message 68360.  

Back to the original question for this thread. Any news on awarding credit for completed HADCM3 tasks?

Further news on proposed Mac Intel IFS tasks would be appreciated as well.

Thanks!


Still nothing?
ID: 68451 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 804
Credit: 13,570,391
RAC: 6,523
Message 68464 - Posted: 25 Feb 2023, 12:04:34 UTC - in response to Message 68451.  

Back to the original question for this thread. Any news on awarding credit for completed HADCM3 tasks?
Further news on proposed Mac Intel IFS tasks would be appreciated as well.
Thanks!
Still nothing?
I can bring this up at the next CPDN tech meeting. Is it just your HadCM3 tasks? I can't see your computers so I can't get the task ids you've run.

I know they have had problems with the credit script a while ago. I'm not sure if that's the reason. I'll see if I can get an answer for you.

As for the mac Intel OpenIFS tasks, I'm working on this week. It'll go to testing first though before appearing in production.
---
CPDN Visiting Scientist
ID: 68464 · Report as offensive     Reply Quote
bullschuck

Send message
Joined: 22 May 21
Posts: 37
Credit: 496,060
RAC: 1,924
Message 68469 - Posted: 25 Feb 2023, 19:32:38 UTC - in response to Message 68464.  

I can bring this up at the next CPDN tech meeting. Is it just your HadCM3 tasks? I can't see your computers so I can't get the task ids you've run.


Yes. I only run cdpn on some older Intel Macs so only HadCM3 tasks.

I know they have had problems with the credit script a while ago. I'm not sure if that's the reason. I'll see if I can get an answer for you.


Thanks loads! This is for all the tasks I've completed since November of last year.

As for the mac Intel OpenIFS tasks, I'm working on this week. It'll go to testing first though before appearing in production.


Sweet! I'm looking forward to trying that out. Any OS limitations? I've seen some BOINC projects that have Intel-based tasks that will run on M1 Macs. Any chance that these OpenIFS tasks will?

Again, thanks loads for bringing this up again at a tech meeting. That's as much as I could ask for.

Bull
ID: 68469 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 804
Credit: 13,570,391
RAC: 6,523
Message 68470 - Posted: 25 Feb 2023, 21:38:31 UTC - in response to Message 68469.  

As for the mac Intel OpenIFS tasks, I'm working on this week. It'll go to testing first though before appearing in production.
Sweet! I'm looking forward to trying that out. Any OS limitations? I've seen some BOINC projects that have Intel-based tasks that will run on M1 Macs. Any chance that these OpenIFS tasks will?
I only have an Intel iMac running High Sierra to develop on. I assume that the M1/2 macs will use Rosetta, but until we try I honestly don't know if the code will run ok or not. It's usually a case of how good the low level system support is (for things like filesystem functions, cpu time, etc). If it doesn't run and it's not something I can fix in a week, we'll probably leave it as it's really not the highest priority.
---
CPDN Visiting Scientist
ID: 68470 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4342
Credit: 16,502,925
RAC: 5,640
Message 68476 - Posted: 26 Feb 2023, 9:35:41 UTC - in response to Message 68475.  

Will this work unit ever get validated, or does it need an admin to intervene?
It may need intervention to get the credit when Andy gets a chance but validation isn't used by CPDN. Credit is based on the trickle up files that generally go at the same time as the zips are uploaded.
ID: 68476 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 943
Credit: 34,192,107
RAC: 7,153
Message 68483 - Posted: 26 Feb 2023, 14:17:56 UTC - in response to Message 68476.  

Credit is based on the trickle up files that generally go at the same time as the zips are uploaded.
That's the way it always used to be, but something seems to have slipped in the last three months or so.

There was a credit run last last night or very early this morning (UTC, 25/26 Feb), just as I was finishing up the last of my batch 993 tasks. One task reported at 23:50 has been awarded full credit, the next reported at 04:27 still shows zero. In the 'trickle' days, that one would have received credit for the trickles received before, say, midnight.

Another strange thing: my event log has an entry for

26-Feb-2023 00:51:38 [climateprediction.net] [sched_op] handle_scheduler_reply(): got ack for task oifs_43r3_01i7_2019110100_123_993_12215389_0
That's task 22316800, which the server says is still in progress. The event log timing (also UTC) suggests that it was reported right in the middle of the period when I'm suggesting the credit script was running. Could that have interfered with the status update?

There have been suggestions on the message boards that we currently have two different credit scripts running on different servers, an old one and a new one. But it seems to be more complicated than that. I quite understand that the project team have had their hands full with the testing and launch of the new apps, and delivering the results to the commissioning scientists in spite of problems with the upload servers. But there will come a time when - I hope - they will be able to take a step back and review the health of the project as a whole.
ID: 68483 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 804
Credit: 13,570,391
RAC: 6,523
Message 68486 - Posted: 26 Feb 2023, 18:07:33 UTC - in response to Message 68483.  

There have been suggestions on the message boards that we currently have two different credit scripts running on different servers, an old one and a new one. But it seems to be more complicated than that. I quite understand that the project team have had their hands full with the testing and launch of the new apps, and delivering the results to the commissioning scientists in spite of problems with the upload servers. But there will come a time when - I hope - they will be able to take a step back and review the health of the project as a whole.
The 'two scripts' is a reference to the dev & production sites running different versions. The 'old' version is on the production site and the 'new' one is active on the dev site. They are not both active together. CPDN want to roll out the 'new' one to production but it will completely alter how credit is computed, so want to prepare something to go out to users first.

That's as much as I know. Richard, I suspect you know more about the differences between the 'new' and 'old' boinc credit scripts than I do. I'm sure I've seen you talk about it in other posts.
---
CPDN Visiting Scientist
ID: 68486 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 943
Credit: 34,192,107
RAC: 7,153
Message 68488 - Posted: 26 Feb 2023, 19:30:51 UTC - in response to Message 68486.  

The 'two scripts' is a reference to the dev & production sites running different versions. The 'old' version is on the production site and the 'new' one is active on the dev site. They are not both active together. CPDN want to roll out the 'new' one to production but it will completely alter how credit is computed, so want to prepare something to go out to users first.

That's as much as I know. Richard, I suspect you know more about the differences between the 'new' and 'old' boinc credit scripts than I do. I'm sure I've seen you talk about it in other posts.
Yes, those were the references I was alluding to (one script on each server, but different).

But the question - in reference to bullschuck's question - becomes "How old is old?". His machines (1526736, 1519502) clearly show a problem. For tasks completed in July, trickles were displayed on the result pages, and credit was awarded - including partial credit according to the trickle reached, for tasks which didn't complete. But tasks completed in December or later aren't showing their trickles, and aren't getting any credit, either.

But IFS tasks are getting credit on the production site, for completed tasks at least - even though they aren't showing their trickles. And tasks on the dev site are showing their trickles for both IFS and Hadley tasks. So we seem to have at least three scripts in play: should we call them old, middle-aged, and young?

I did do some work for Milo Thurston back in the day, when we had a RAC problem on one particular application. But any knowledge I gained on that occasion is positively geriatric by comparison. That's why I'm suggesting that the time has come (subject to other constraints, which come first) for a thorough re-examination of the current situation. I'm happy to lend a hand in that process, if it would help.
ID: 68488 · Report as offensive     Reply Quote
[SG]Felix

Send message
Joined: 4 Oct 15
Posts: 34
Credit: 9,069,332
RAC: 14,637
Message 68489 - Posted: 26 Feb 2023, 20:24:47 UTC - in response to Message 68464.  

Is it just your HadCM3 tasks?


For me, its everything except openIFS, and since at least the 4th of december last year.
ID: 68489 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2167
Credit: 64,489,576
RAC: 4,654
Message 68493 - Posted: 26 Feb 2023, 21:53:40 UTC - in response to Message 68483.  
Last modified: 26 Feb 2023, 21:56:48 UTC

Another strange thing: my event log has an entry for

26-Feb-2023 00:51:38 [climateprediction.net] [sched_op] handle_scheduler_reply(): got ack for task oifs_43r3_01i7_2019110100_123_993_12215389_0
That's task 22316800, which the server says is still in progress. The event log timing (also UTC) suggests that it was reported right in the middle of the period when I'm suggesting the credit script was running. Could that have interfered with the status update?

Absolutely. I started seeing that behavior sometimes early in the hadam4h era. I've lost getting a status for several completed tasks over the last 3 or 4 years if they reported during the credit run. It doesn't always happen, but occasionally.

There are others who posted about this situation as well but those posts are probably scattered among several threads.
ID: 68493 · Report as offensive     Reply Quote
AndreyOR

Send message
Joined: 12 Apr 21
Posts: 247
Credit: 11,849,481
RAC: 20,641
Message 68495 - Posted: 27 Feb 2023, 1:09:31 UTC - in response to Message 68476.  

Will this work unit ever get validated, or does it need an admin to intervene?
It may need intervention to get the credit when Andy gets a chance but validation isn't used by CPDN. Credit is based on the trickle up files that generally go at the same time as the zips are uploaded.

Even though there's no cross-task validation as in other projects, validation does seem to happen. I've seen tasks just reported show up as Validation Pending for a very brief period of time, under 30 sec. Perhaps some internal checks get done to make sure the result is valid and isn't tampered or corrupt in some way. That task may not have been checked for some reason or the check wasn't registered?
ID: 68495 · Report as offensive     Reply Quote
AndreyOR

Send message
Joined: 12 Apr 21
Posts: 247
Credit: 11,849,481
RAC: 20,641
Message 68496 - Posted: 27 Feb 2023, 2:01:04 UTC - in response to Message 68488.  

That's why I'm suggesting that the time has come (subject to other constraints, which come first) for a thorough re-examination of the current situation.

It's most definitely time. It's been 3 months since Hadley models stopped getting credit. From what I've been able to gather, the problem started at the end of November. There's also the RAC problem, which has been ongoing for weeks now. CPDN has a relatively small and patient user base. I'd be willing to bet that almost everyone likes credit and to a small or large degree cares about getting it. It kind of seems a bit neglectful to the user base to let credit problems be anything but a short term problems. It's the only tangible/visible thing users get out of volunteering. I know that CPDN runs on minimal resources, at the same time, when do we as users become high enough priority?
ID: 68496 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 804
Credit: 13,570,391
RAC: 6,523
Message 68503 - Posted: 27 Feb 2023, 13:49:21 UTC - in response to Message 68496.  

That's why I'm suggesting that the time has come (subject to other constraints, which come first) for a thorough re-examination of the current situation.

It's most definitely time. It's been 3 months since Hadley models stopped getting credit. From what I've been able to gather, the problem started at the end of November. There's also the RAC problem, which has been ongoing for weeks now. CPDN has a relatively small and patient user base. I'd be willing to bet that almost everyone likes credit and to a small or large degree cares about getting it. It kind of seems a bit neglectful to the user base to let credit problems be anything but a short term problems. It's the only tangible/visible thing users get out of volunteering. I know that CPDN runs on minimal resources, at the same time, when do we as users become high enough priority?
To be rather blunt, they have had other priorities trying to get OpenIFS projects run for paying customers, battling with cloud providers, battling with university IT provision (a long story I'm not at liberty to divulge). Andy is their only IT guy, and he manages the lot. I appreciate credit matters to some, but you can't spend it, can't eat it and can't take it with you, so I'd rather they spent time getting a working system that's attractive for scientists to use it. Otherwise, no-one will.

Unfortunately there are no tech CPDN meetings this week or next due to interviews for MSc students and prep for the international BOINC meeting. I will bring it up when I get the chance to get an answer.
---
CPDN Visiting Scientist
ID: 68503 · Report as offensive     Reply Quote
Brummig

Send message
Joined: 3 Nov 05
Posts: 26
Credit: 667,486
RAC: 1,046
Message 68517 - Posted: 1 Mar 2023, 9:01:26 UTC - in response to Message 68503.  

When you hold the door open for someone, it's only polite for that person to say "thank you". You can't spend it, can't eat it and can't take it with you, but if someone doesn't take the trouble to thank you, then the next time around you'll likely be letting the door slam shut in their face. That's why the door to my computer is currently shut to CPDN, and those paying customers can go elsewhere. And since they are paying, how about them paying for the expensive electricity that powers all those BOINC hosts?
ID: 68517 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 · Next

Message boards : Number crunching : no credit awarded?

©2024 climateprediction.net