climateprediction.net home page
New work Discussion

New work Discussion

Message boards : Number crunching : New work Discussion
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 51 · 52 · 53 · 54 · 55 · 56 · 57 . . . 66 · Next

AuthorMessage
Bryn Mawr

Send message
Joined: 28 Jul 19
Posts: 89
Credit: 7,036,779
RAC: 7,277
Message 63121 - Posted: 18 Dec 2020, 13:17:01 UTC - in response to Message 62986.  

Bryn Mawr -

Sorry to read you can't get any work. I have 5 or 6 computers and it has worked for me every time.

You have the same OS and AMD processors as me.There must be something that I haven't encountered yet.


Frustrating but that’s life.


Further to this, I have been looking through the server’s scheduler code on github and it seems to me that there are two conditions where it blocks the send of work and logs the fact but does not appear to return an error message to the user.

During a work request it sets a lock file on the host id. During the next work request it finds the lock file still exists so exits.

It receives an unrecognised code sign key.


Now, obviously, I cannot check the server for an uncleared loch file but is there any way I can change my host id and is there any way I can resend my code sign key?
ID: 63121 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 3183
Credit: 8,795,408
RAC: 7,129
Message 63122 - Posted: 18 Dec 2020, 14:29:29 UTC - in response to Message 63121.  

Now, obviously, I cannot check the server for an uncleared loch file but is there any way I can change my host id and is there any way I can resend my code sign key?


Not an area I have experience in. I would try removing the project using BOINC manager and then try re-attaching if you haven't already tried this. If no joy, my next step would be to ask over on the BOINC forums
Please do not private message myself or other moderators for help. This limits the number of people who are able to help and deprives others who may benefit from the answer.
ID: 63122 · Report as offensive     Reply Quote
Bryn Mawr

Send message
Joined: 28 Jul 19
Posts: 89
Credit: 7,036,779
RAC: 7,277
Message 63123 - Posted: 18 Dec 2020, 15:26:27 UTC - in response to Message 63122.  
Last modified: 18 Dec 2020, 15:27:32 UTC

Now, obviously, I cannot check the server for an uncleared loch file but is there any way I can change my host id and is there any way I can resend my code sign key?


Not an area I have experience in. I would try removing the project using BOINC manager and then try re-attaching if you haven't already tried this. If no joy, my next step would be to ask over on the BOINC forums


I’ve tried that many times. The last thing I tried was to create a new user id and attach to that but still no joy :-(

I’ll ask at Boinc and see what they say.
ID: 63123 · Report as offensive     Reply Quote
lazlo_vii

Send message
Joined: 11 Dec 19
Posts: 108
Credit: 3,012,142
RAC: 0
Message 63124 - Posted: 18 Dec 2020, 16:36:05 UTC - in response to Message 63121.  
Last modified: 18 Dec 2020, 16:54:17 UTC

Bryn Mawr -

...Now, obviously, I cannot check the server for an uncleared loch file...


Try:
ls /var/run/lock
ID: 63124 · Report as offensive     Reply Quote
Bryn Mawr

Send message
Joined: 28 Jul 19
Posts: 89
Credit: 7,036,779
RAC: 7,277
Message 63125 - Posted: 18 Dec 2020, 18:45:24 UTC - in response to Message 63124.  

Bryn Mawr -

...Now, obviously, I cannot check the server for an uncleared loch file...


Try:
ls /var/run/lock


This would be a lock file on the server set after it has looked up my user id and before it starts updating the dB.
ID: 63125 · Report as offensive     Reply Quote
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1147
Credit: 21,159,011
RAC: 9
Message 63126 - Posted: 18 Dec 2020, 19:08:34 UTC

YES! For the first time in about 6 month I got 6 new windows tasks. Hopefully this marks the return of work for computers running Windows..
ID: 63126 · Report as offensive     Reply Quote
KAMasud

Send message
Joined: 6 Oct 06
Posts: 151
Credit: 5,994,432
RAC: 5,199
Message 63129 - Posted: 19 Dec 2020, 3:35:13 UTC

Yes but I wish there was a way to attach hooters to CPDN. They should start hooting before work is sent by the server. Last night when new WU's were dispatched my machine was already happily munching on Linux WU's.
L3 cache keeps appearing in various threads. L3 has a story, poor L3. I have an i7 well both are i7's. 9Mb L3 cache divided amongst six physical cores and six Hyper-Threaded Virtual cores, on top of which I had a VM running with three Linux tasks. Which makes fifteen WU's scrambling after 9Mb. (On one very stable machine, I have switched off Hyper-Threading so six physical cores to fight over 9Mb L3). Last night work landed on the un-switched off machine. By the time the propeller sound switched to the turbo-prop mode and warned me about the shenanigans going on, sad but two WU's had errored out :'(.
Any ideas as to how to fit air-raid sirens which should scream when CPDN is in the mood?
ID: 63129 · Report as offensive     Reply Quote
Bryn Mawr

Send message
Joined: 28 Jul 19
Posts: 89
Credit: 7,036,779
RAC: 7,277
Message 63158 - Posted: 23 Dec 2020, 4:32:25 UTC - in response to Message 63123.  

Now, obviously, I cannot check the server for an uncleared loch file but is there any way I can change my host id and is there any way I can resend my code sign key?


Not an area I have experience in. I would try removing the project using BOINC manager and then try re-attaching if you haven't already tried this. If no joy, my next step would be to ask over on the BOINC forums


I’ve tried that many times. The last thing I tried was to create a new user id and attach to that but still no joy :-(

I’ll ask at Boinc and see what they say.


I might, just might, have resolved this.

I found that I had no alt platform set in the cc config file. I removed that and re-read config and it made no never mind but 3 days later I rebooted and promptly started receiving work.

Later this morning I’ll try rebooting the other machine and see if that one wakes up as well.
ID: 63158 · Report as offensive     Reply Quote
Bryn Mawr

Send message
Joined: 28 Jul 19
Posts: 89
Credit: 7,036,779
RAC: 7,277
Message 63159 - Posted: 23 Dec 2020, 10:44:05 UTC - in response to Message 63158.  



I might, just might, have resolved this.

I found that I had no alt platform set in the cc config file. I removed that and re-read config and it made no never mind but 3 days later I rebooted and promptly started receiving work.

Later this morning I’ll try rebooting the other machine and see if that one wakes up as well.


I think that confirms it, within an hour of rebooting I have work.

So, changing no_alt_platform from 1 to 0 followed by a reboot appears to have fixed the fault.
ID: 63159 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 3183
Credit: 8,795,408
RAC: 7,129
Message 63160 - Posted: 23 Dec 2020, 12:39:18 UTC - in response to Message 63159.  
Last modified: 23 Dec 2020, 12:41:19 UTC

Well done!

Next question is why do some clean installs put it in and others not or were these all carrying over old cc_config.xml files?

Edit:Only just noticed that this was a carry over from running WCG.
Please do not private message myself or other moderators for help. This limits the number of people who are able to help and deprives others who may benefit from the answer.
ID: 63160 · Report as offensive     Reply Quote
Bryn Mawr

Send message
Joined: 28 Jul 19
Posts: 89
Credit: 7,036,779
RAC: 7,277
Message 63161 - Posted: 23 Dec 2020, 13:00:29 UTC - in response to Message 63160.  

Well done!

Next question is why do some clean installs put it in and others not or were these all carrying over old cc_config.xml files?

Edit:Only just noticed that this was a carry over from running WCG.


All we need now is to see whether others suffering the same problem are using the same setting.

Les Bayliss is one possibility :-)
ID: 63161 · Report as offensive     Reply Quote
Profile Bill F
Avatar

Send message
Joined: 17 Jan 09
Posts: 93
Credit: 1,062,655
RAC: 0
Message 63162 - Posted: 23 Dec 2020, 14:20:50 UTC

Can this be shared on the BOINC message boards... It sounds like something that both Development and perhaps WCG might want to look at.

Bill F
In October 1969 I took an oath to support and defend the Constitution of the United States against all enemies, foreign and domestic;
There was no expiration date.


ID: 63162 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 3183
Credit: 8,795,408
RAC: 7,129
Message 63163 - Posted: 23 Dec 2020, 15:17:01 UTC - in response to Message 63162.  

Can this be shared on the BOINC message boards... It sounds like something that both Development and perhaps WCG might want to look at.

Bill F


It is here on the BOINC forums.
Please do not private message myself or other moderators for help. This limits the number of people who are able to help and deprives others who may benefit from the answer.
ID: 63163 · Report as offensive     Reply Quote
Bryn Mawr

Send message
Joined: 28 Jul 19
Posts: 89
Credit: 7,036,779
RAC: 7,277
Message 63164 - Posted: 23 Dec 2020, 15:27:59 UTC - in response to Message 63162.  

Can this be shared on the BOINC message boards... It sounds like something that both Development and perhaps WCG might want to look at.

Bill F


Aye, it would be very helpful if Boinc put out a message giving the reason for no task sent, even if only as part of the debug stream.
ID: 63164 · Report as offensive     Reply Quote
Iceberg

Send message
Joined: 28 Dec 17
Posts: 16
Credit: 998,481
RAC: 0
Message 63165 - Posted: 23 Dec 2020, 21:31:41 UTC

I managed to snag four WUs of the new SAFR50s from HAPPI, but sadly two of them so far got computational errors (signal 11, segment stuff yet again). I would really like to help with this project given its importance, but I think the other two will eventually fail, too. My computer isn't overtaxed with processes, but it is old. Not sure what you would advise. Hopefully the people who take over my WUs have success.

:(
ID: 63165 · Report as offensive     Reply Quote
Bryn Mawr

Send message
Joined: 28 Jul 19
Posts: 89
Credit: 7,036,779
RAC: 7,277
Message 63166 - Posted: 23 Dec 2020, 22:21:35 UTC - in response to Message 63165.  

I managed to snag four WUs of the new SAFR50s from HAPPI, but sadly two of them so far got computational errors (signal 11, segment stuff yet again). I would really like to help with this project given its importance, but I think the other two will eventually fail, too. My computer isn't overtaxed with processes, but it is old. Not sure what you would advise. Hopefully the people who take over my WUs have success.

:(


It looks like your computer is suspending processing quite frequently. This has been known to cause computation errors.

Try setting you preferences to not suspend and to keep work in memory if it does suspend.
ID: 63166 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 3183
Credit: 8,795,408
RAC: 7,129
Message 63167 - Posted: 23 Dec 2020, 22:49:09 UTC - in response to Message 63165.  

I managed to snag four WUs of the new SAFR50s from HAPPI, but sadly two of them so far got computational errors (signal 11, segment stuff yet again). I would really like to help with this project given its importance, but I think the other two will eventually fail, too. My computer isn't overtaxed with processes, but it is old. Not sure what you would advise. Hopefully the people who take over my WUs have success.

:(

Last time I looked, 14 had completed successfully, I think the number that had hard failed, i.e. all three attempts had failed was 15. It was certainly more than the successes. That suggests the failures have little to do with the computer in question but more to do with the particular tasks. This issue has been seen by one of the moderators who has reported it back to the project.
Please do not private message myself or other moderators for help. This limits the number of people who are able to help and deprives others who may benefit from the answer.
ID: 63167 · Report as offensive     Reply Quote
Profile Iain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1049
Credit: 6,463,915
RAC: 2
Message 63168 - Posted: 23 Dec 2020, 22:54:53 UTC - in response to Message 63165.  

I managed to snag four WUs of the new SAFR50s from HAPPI, but sadly two of them so far got computational errors (signal 11, segment stuff yet again). I would really like to help with this project given its importance, but I think the other two will eventually fail, too. My computer isn't overtaxed with processes, but it is old. Not sure what you would advise. Hopefully the people who take over my WUs have success.

:(

This batch, 890, has a very high error rate, so I wouldn’t worry about the machine.

Some models in the batch have finished, so please persist with any model that is still running — but don’t be surprised if the model doesn’t finish. Mine all crashed on all my machines.
ID: 63168 · Report as offensive     Reply Quote
KAMasud

Send message
Joined: 6 Oct 06
Posts: 151
Credit: 5,994,432
RAC: 5,199
Message 63169 - Posted: 24 Dec 2020, 8:05:28 UTC

Could someone please tell me after reading my results page, why have 100% of my WU's errored out? This is embarrassing plus I do not see any reason behind the why?
ID: 63169 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 3183
Credit: 8,795,408
RAC: 7,129
Message 63170 - Posted: 24 Dec 2020, 9:22:41 UTC - in response to Message 63169.  

Could someone please tell me after reading my results page, why have 100% of my WU's errored out? This is embarrassing plus I do not see any reason behind the why?


If you are talking about the latest safr batch, it is as noted in other posts in this thread it is a problem with the batch. I just looked, Only 20 have so far succeeded and well over that have hard failed which means all three computers they have run on have failed them.
Please do not private message myself or other moderators for help. This limits the number of people who are able to help and deprives others who may benefit from the answer.
ID: 63170 · Report as offensive     Reply Quote
Previous · 1 . . . 51 · 52 · 53 · 54 · 55 · 56 · 57 . . . 66 · Next

Message boards : Number crunching : New work Discussion

©2021 climateprediction.net