Message boards :
Number crunching :
VANISHING WU'S
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · Next
Author | Message |
---|---|
Send message Joined: 13 Jan 06 Posts: 1498 Credit: 15,613,038 RAC: 0 |
Right, but aborting them seems of little use as long as they are picked up by someone else immediately. True, I had the same experience with one of mine. I've raised it with Andy. I'm a volunteer and my views are my own. News and Announcements and FAQ |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Alex These models are set to run five times until there's a success of 1, so someone's going to have to run them. There's still a lot of "serial killers" out there, plus all of the "set and forget, and never look at the forums" people, so let them run this lot and get them off the queue. Just abort them if you find any on your machine. Backups: Here |
Send message Joined: 3 Sep 04 Posts: 126 Credit: 26,363,193 RAC: 0 |
It looks like they crash anyway, so aborting them saves processor time. |
Send message Joined: 9 Sep 04 Posts: 228 Credit: 30,287,282 RAC: 2,285 |
aborted, and forget about them |
Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,100,600 RAC: 2,970 |
The number of �tasks in progress� is back up to 48054. When I looked at it before the shutdown it was at about 43000. So it would seem that the bogus release pumped about 5000 old WU�s into the stream. That�s about 25,000 aborts to get rid of all of them! |
Send message Joined: 30 Aug 06 Posts: 27 Credit: 1,518,823 RAC: 1,419 |
If I keep on aborting these jobs will my machines get blacklisted? |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
If you mean will one of us ask the programmers to minus your computer(s) so that you can't receive new tasks until you've posted on the forum and sorted out the problem, then no, of course not. When we come across computers that appear to be serial model crashers the first thing we do is look in detail at a selection of the models to see what the problem seems to consist of. We also look at whether the problem of crashed models has been going on for quite a while without the owner doing anything about it and without posting on the forum for advice. The last thing we want to do is penalise people for one-off or short-term problems which any of us could have to deal with from time to time. You and your computers don't fall into these serial-crasher categories. In fact it's rare for members who post on the forum to be serial model crashers. If instead you were wondering whether the server will reduce the number of tasks you're allowed per core per day, then yes, I'd expect the server to treat each aborted model like a crashed task. This is the BOINC rule. However, even if you abort so many models that your daily quota is reduced to zero new models per core, after your midnight the quota will be increased to one per core per day again. I've aborted quite a few of these defective models myself and advise everybody else to do the same. I believe that these models are likely to crash anyway at some stage so there's nothing to be gained through wasting electricity and computer power on crunching that cannot be of any help to the researchers. Cpdn news |
Send message Joined: 30 Aug 06 Posts: 27 Credit: 1,518,823 RAC: 1,419 |
Thanks mo.v I had in mind the latter, not the former, but didn't think the process through. On my "value" machines and with my workday it would be rare indeed if there was more than one or two WUs to abort per day, let alone per CPU. Energise main phasers, All weapons to full power. Target that WU (hadcm3n_7k6p_1980_40_008437444_1) Phaser one, fire. Phaser two, fire. Target destroyed. Confirmed. |
Send message Joined: 22 Feb 06 Posts: 485 Credit: 29,638,939 RAC: 3,372 |
I'd picked up 7. hadcm3n_7l9l, _7aou, _7aol, _7mok, _7gl5,_7c00 and _7c07. 7l9l still runing, rest all aborted and looks as if someone else has got them. All had been marked no resubmission:-(( |
Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,100,600 RAC: 2,970 |
I see that the hadcm3n_7xxx WU�s are still with us. I aborted 2 of them this morning. Judging from the �tasks in progress� I�d say that there are about 2K of them still the hopper. One reason it is taking so long to get rid of them is that with all the download problems recently with of hadam3p and the need to abort all the hadcm3n �7�s� my daily download quota is probably about 1 or 2 WU�s per day per machine. Most of the rest of us are probably in the same condition. Is there some way that the project staff can do to increase these quotas so we can clear them faster. I don�t mind downloading and aborting them (all you can eat bandwidth in America from most ISP�s), but, I am only getting about 1 per machine per day. |
Send message Joined: 22 Feb 06 Posts: 485 Credit: 29,638,939 RAC: 3,372 |
7l9l completed OK for me yesterday without any apparent problems. Mine was the fourth go at it. |
Send message Joined: 16 Jan 10 Posts: 1081 Credit: 7,026,771 RAC: 4,684 |
There are also some legitimate resends and the project staff may be reluctant to tinker with the general BOINC settings to sort out a specific problem. I got one valid model in among the 7-series. [Edit: by "valid" I mean not from the 7-series.] These reissues and a number of other problems may perhaps come from the project's original and creditable desire to "reward" participants by retaining job information that on other projects would be rapidly deleted. However, the graphics that used to provide some interest in the longer jobs have gone, partly through the chopping up of jobs into smaller pieces (at the request of participants) and partly through neglect. Rather than fix bugs in user facilities the project invariably seems to opt for deleting them instead! There needs, I think, to be a major design correction: either adopt the more common BOINC "easy come, easy go" philosophy or seriously invest in the kind of reliability and user experience that would make the big jobs a pleasure to complete. |
Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,100,600 RAC: 2,970 |
I am always careful to check whether a download is one of the 7 series before I abort it. The problem is that all of these aborts have lowered my daily quota so much that when real WU�s come along I may not get any. My daily quota may have already been filled for that day with �7�. One of my machines has not been able to download any real (i.e. non-7 WU�s) for several day now and is completely out ot CPDN work. It is running WCG projects 24/7. Every day or two I delete 1 (or 2) 7 series WU�s. |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
Every night at midnight your time you'll still have your work quota put back to one model per core per day. While waiting for more good models we just have to get on with work from other projects, of which we should all be able to choose several we think are worth spending computer time on. The WCG projects are all highly respected. Cpdn news |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Hi Jim The best way to avoid these bad tasks is to set the project for No new tasks, and just keep an eye on the News thread and the Server Status page. |
Send message Joined: 16 Oct 11 Posts: 254 Credit: 15,954,577 RAC: 1 |
I have a work unit: hadcm3n_7vwx_1980_40_008452644_1 It has been running successfully and is at about 25% complete....should I abort it?? |
Send message Joined: 7 Aug 04 Posts: 2169 Credit: 64,553,422 RAC: 6,017 |
I have a work unit: hadcm3n_7vwx_1980_40_008452644_1 That was a resend of a task that was sent out originally in September and never completed. So, it wasn't one of the bad batch and you should probably continue crunching it. |
Send message Joined: 16 Oct 11 Posts: 254 Credit: 15,954,577 RAC: 1 |
Will do...its the only WU I have and now at 33% so I'll keep crunching. Art |
Send message Joined: 15 Feb 06 Posts: 137 Credit: 33,485,532 RAC: 4,393 |
With the recent backup of the server and reset of the updating there were a few tasks available this morning. Some of them were the "No Resubmission" variety. I noticed that those had a completion year of 2023, so they should be easy to recognize in your Task list. |
Send message Joined: 16 Oct 11 Posts: 254 Credit: 15,954,577 RAC: 1 |
Any idea when there will be some actual new work units?? I have three machines working on other projects, but just wondering why we are seeing such a long gap in new work units. |
©2024 climateprediction.net