climateprediction.net home page
Intel I7 Woes....No successful completion since April 2015

Intel I7 Woes....No successful completion since April 2015

Message boards : Number crunching : Intel I7 Woes....No successful completion since April 2015
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 5 · Next

AuthorMessage
Art Masson
Avatar

Send message
Joined: 16 Oct 11
Posts: 254
Credit: 15,954,577
RAC: 19
Message 53579 - Posted: 5 Mar 2016, 19:42:23 UTC

I have a number of machines processing files...but my Intel I7 based machine running Windows 7 has not successfully completed processing a data file since April 2015. For the heck of it, I recently tried to reload an old version of the Boinc client (7.0.44)..which was the last version that I processed a file successfully ... but I'm still getting errors/failures. I'm about to give up, unless someone has a bright idea.
ID: 53579 · Report as offensive     Reply Quote
Profile astroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 53580 - Posted: 5 Mar 2016, 20:51:03 UTC - in response to Message 53579.  

Hi, Art,

Walked thru a couple pages of your i7's failed tasks. Diagnostics are similar throughout, though a few tasks produced at least one Trickle before crashing.

My i7 went crazy after M$ sent an upgrade. Boinc 6/2.19 wouldn't restart, wouldn't "repair," wouldn't reinstall nor would a later ver.6. On a whim, rather than try ver.7, an old copy of boinc 5.10.45 was tried -- successfully. So, I'm running an i7-4790 with an antique boinc version in Win.10 upgrade of Upgrade Version (which boinc reports as Vista)!

I'm not trying to steal your Thread, Art, merely trying to grasp what is probably a chimera. I mention my flaky i7-box because I wonder about the possibility of some strange interaction between/among i7/other hardware/OS/boinc configurations. In my case, suspicion rests heavily on Win.10 upgrade. (I was considering reinstalling Win.7 when CPDN workload is depleted but now wonder about that step...) The CPU is on a Gigabyte MB, with 2*8GB RAM, no add-on graphics board, runs only five simultaneous CPDN copies and does nothing else.

Does anyone else have strange results from an i7 machine, some problem which began after a history of succes?
"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 53580 · Report as offensive     Reply Quote
Art Masson
Avatar

Send message
Joined: 16 Oct 11
Posts: 254
Credit: 15,954,577
RAC: 19
Message 53582 - Posted: 5 Mar 2016, 21:56:04 UTC - in response to Message 53580.  

Thanks...Doesn't seem to be Win 7/Win 10 upgrade related because I've been on Win 7 SP 1 consistently through all of this. Something happened in the April 2015 time-frame on my machine or there is some strange data-driven problem which affected only my Intel I7 box (machine ID 1266353) in that time-frame. Could have been a MS OS update, but idea.

At this point I'm probably going to remove Climatepredction.net processing from this machine, since it's just wasting cycles unless someone can help diagnose this. Haven't tried going back to boinc ver 5, do you think that might actually work??

Art Masson
St. Charles, IL
ID: 53582 · Report as offensive     Reply Quote
Art Masson
Avatar

Send message
Joined: 16 Oct 11
Posts: 254
Credit: 15,954,577
RAC: 19
Message 53583 - Posted: 5 Mar 2016, 21:59:55 UTC - in response to Message 53582.  

One more bit of info. My CPU is and Intel Core I7-3770 running at 3.4Gz.
Art Masson
ID: 53583 · Report as offensive     Reply Quote
Profile astroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 53584 - Posted: 6 Mar 2016, 0:52:08 UTC - in response to Message 53582.  

Haven't tried going back to boinc ver 5, do you think that might actually work??


Possibly. In my case I consider it sheer dumb luck.

The box had three HadCM3n tasks, none of which showed CPU time, percent completed, or time remaining, nor was the graphics option available. (On other machines [Win.10 & Vista], if not everything normal, at least graphics were available.) One Task crashed. Trickles show in my account, though, and eight Trickles showed for that Task -- cause of death, common for this series: "INVALID THETA." One of the three downloaded to that machine currently has 32 Trickles. (Fingers crossed.)

(All my machines run at stock speed, expecting Intel to 'do right' with its current accelerate-under-load technology.)
"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 53584 · Report as offensive     Reply Quote
Art Masson
Avatar

Send message
Joined: 16 Oct 11
Posts: 254
Credit: 15,954,577
RAC: 19
Message 53590 - Posted: 6 Mar 2016, 15:14:01 UTC - in response to Message 53584.  

I've downloaded and installed version 5.8.16 from the Boinc site, and will see what happens. At the moment there are no tasks available so no processing is occurring. Will advise.
ID: 53590 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 53591 - Posted: 6 Mar 2016, 15:48:40 UTC
Last modified: 6 Mar 2016, 15:56:05 UTC

Some of it is just bad luck, when you look at all the work units that errored out on other machines also. But still, some should have completed successfully, since the WAH2 and Australia-New Zealand ones are fairly robust. It could be other programs running on your PC. I have found problems with some early versions of VirtualBox causing problems on other programs, though not necessarily CPDN. It could be an anti-virus problem also; the exclusions don't always help, but at least exclude the BOINC program and data folders. And your disk drive may have trouble keeping up with the high write rates of some tasks; try running only one at a time. You will find it eventually.
ID: 53591 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2167
Credit: 64,403,322
RAC: 5,085
Message 53593 - Posted: 6 Mar 2016, 16:25:32 UTC

I'm with Jim1348. Possibly a anti-virus/anti-malware problem. Perhaps a change or upgrade to such software last year resulted in problems with boinc and cpdn?
ID: 53593 · Report as offensive     Reply Quote
Art Masson
Avatar

Send message
Joined: 16 Oct 11
Posts: 254
Credit: 15,954,577
RAC: 19
Message 53594 - Posted: 6 Mar 2016, 17:24:13 UTC - in response to Message 53593.  

Anything is possible. I run the same version of Norton 360 on 3 other machines with no problems, however and it seems to be allowing updates to the BOINC data. If I can get a CPDN work unit downloaded on version 5.8.16, I'll limit the work to a single work unit and see what happens.

My new machine number ID on version 5.8.16 is 1392340

For whatever it's worth this machine processes other BOINC projects (SETI@HOME, MILKYWAY@HOME, EINSTEIN, and a couple others) with no problems. It's only the CPDN work units which never complete successfully (since April 2015).....

Will report results on version 5.8.16 when I can get a work unit!

Art Masson
St. Charles, IL
ID: 53594 · Report as offensive     Reply Quote
Art Masson
Avatar

Send message
Joined: 16 Oct 11
Posts: 254
Credit: 15,954,577
RAC: 19
Message 53610 - Posted: 8 Mar 2016, 12:27:30 UTC - in response to Message 53594.  

Getting this error using version 5.8.16 (can't download work unit)
Any advice appreciated:

3/8/2016 6:24:44 AM|climateprediction.net|[file_xfer] Started download of file CRED_SIC_rcp85_a50_1939_1950.gz
3/8/2016 6:24:45 AM|climateprediction.net|[file_xfer] Temporarily failed download of CRED_SIC_rcp85_a50_1939_1950.gz: http error
3/8/2016 6:24:45 AM|climateprediction.net|Backing off 3 hr 41 min 43 sec on download of file CRED_SIC_rcp85_a50_1939_1950.gz
3/8/2016 6:25:06 AM|climateprediction.net|[file_xfer] Started download of file CRED_SIC_rcp85_a50_1939_1950.gz
3/8/2016 6:25:07 AM|climateprediction.net|[file_xfer] Temporarily failed download of CRED_SIC_rcp85_a50_1939_1950.gz: http error
3/8/2016 6:25:07 AM|climateprediction.net|Backing off 2 hr 39 min 18 sec on download of file CRED_SIC_rcp85_a50_1939_1950.gz
ID: 53610 · Report as offensive     Reply Quote
Art Masson
Avatar

Send message
Joined: 16 Oct 11
Posts: 254
Credit: 15,954,577
RAC: 19
Message 53615 - Posted: 9 Mar 2016, 5:19:06 UTC - in response to Message 53610.  

I'm trying a different approach,since I couldn't download a work unit on Boinc 5.8.16. I've reloaded Boinc 7.6.23 and have suspended all work units and all projects except for a single CPDN work unit. I'll let this single work unit run and see if it will complete as the only running task.

Art Masson
ID: 53615 · Report as offensive     Reply Quote
Art Masson
Avatar

Send message
Joined: 16 Oct 11
Posts: 254
Credit: 15,954,577
RAC: 19
Message 53686 - Posted: 18 Mar 2016, 10:42:32 UTC - in response to Message 53615.  

First completion in a year! Ran to completion -- with only one work unit running (and no other projects!). Now will try running BOINC with one CPDN work unit but allowing BOINC to run with other projects simultaneously. Work Unit which completed is as follows:

Name wah2_sas50_fdct_201412_13_348_010324402_0
Workunit 10324402
Created 23 Feb 2016 12:47:05 UTC
Sent 26 Feb 2016 9:24:47 UTC
Received 18 Mar 2016 7:31:51 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x0)
Computer ID 1266353
Report deadline 7 Feb 2017 14:44:47 UTC
Run time 259,668.46
CPU time 259,215.70
Validate state Initial
Claimed credit 0.00
Granted credit 2,299.53
application version Weather At Home 2 (wah2) v7.08
Stderr show hide
Trickle Click here
Latest Trickles Received
Time Sent (UTC) Host ID Result ID Result Name Phase Timestep CPU Time (sec) Average (sec/TS)
14 Mar 2016 18:34:34 1266353 19303979 wah2_sas50_fdct_201412_13_348_010324402_0 1 138,539 239,327 1.7275
14 Mar 2016 18:34:34 1266353 19303979 wah2_sas50_fdct_201412_13_348_010324402_0 1 127,019 219,603 1.7289
14 Mar 2016 18:34:34 1266353 19303979 wah2_sas50_fdct_201412_13_348_010324402_0 1 115,499 199,816 1.7300
14 Mar 2016 18:34:34 1266353 19303979 wah2_sas50_fdct_201412_13_348_010324402_0 1 103,979 179,918 1.7303
14 Mar 2016 18:34:34 1266353 19303979 wah2_sas50_fdct_201412_13_348_010324402_0 1 92,459 159,741 1.7277
14 Mar 2016 18:34:34 1266353 19303979 wah2_sas50_fdct_201412_13_348_010324402_0 1 80,939 139,458 1.7230
14 Mar 2016 18:34:34 1266353 19303979 wah2_sas50_fdct_201412_13_348_010324402_0 1 69,419 119,401 1.7200
10 Mar 2016 11:25:58 1266353 19303979 wah2_sas50_fdct_201412_13_348_010324402_0 1 57,899 99,361 1.7161
10 Mar 2016 05:57:28 1266353 19303979 wah2_sas50_fdct_201412_13_348_010324402_0 1 46,379 79,424 1.7125
09 Mar 2016 22:07:38 1266353 19303979 wah2_sas50_fdct_201412_13_348_010324402_0 1 34,859 59,465 1.7059
09 Mar 2016 16:45:19 1266353 19303979 wah2_sas50_fdct_201412_13_348_010324402_0 1 23,339 39,858 1.7078
09 Mar 2016 11:14:50 1266353 19303979 wah2_sas50_fdct_201412_13_348_010324402_0 1 11,819 20,324 1.7196
ID: 53686 · Report as offensive     Reply Quote
Art Masson
Avatar

Send message
Joined: 16 Oct 11
Posts: 254
Credit: 15,954,577
RAC: 19
Message 53688 - Posted: 18 Mar 2016, 23:16:11 UTC - in response to Message 53686.  

Next WU failed. Trying again with all other projects suspended and only one CPDN WU processing....
ID: 53688 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 53689 - Posted: 19 Mar 2016, 2:44:11 UTC - in response to Message 53688.  
Last modified: 19 Mar 2016, 2:44:50 UTC

That is a strange amount of memory you have - 14293.39 MB. Have you done a memtest? Also, Win7 has a built-in "Memory Diagnostics Tool" if you want to give it a test.
ID: 53689 · Report as offensive     Reply Quote
Art Masson
Avatar

Send message
Joined: 16 Oct 11
Posts: 254
Credit: 15,954,577
RAC: 19
Message 53690 - Posted: 19 Mar 2016, 4:54:39 UTC - in response to Message 53689.  

Memory tests normally. Windows reports 14GB of memory.
ID: 53690 · Report as offensive     Reply Quote
_Ryle_

Send message
Joined: 17 Aug 05
Posts: 21
Credit: 15,091,561
RAC: 22,127
Message 53692 - Posted: 19 Mar 2016, 7:53:02 UTC - in response to Message 53690.  

Hello Art, out of curiosity how hot does your cpu run? And is there well enough airflow inside your case?
Is it a stock cpu cooler? I dont use windows myself anymore, but there do exist some apps for temp measurements. If your cpu is higher than say, 65 degrees Celsius, it could interfere with stability IMO. Mine is just under 50 (its an older i7 i admit), but seems rock stable.
ID: 53692 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 53693 - Posted: 19 Mar 2016, 9:47:12 UTC

Are you using the internal Intel graphics adapter? I found a problem (posted on this board) a couple of years ago on my I5-3550 machine (Biostar Z77 motherboard), where all the CPDN work units errored out unless I disabled the internal graphics adapter in the BIOS, and use only a PCIe card (Nvidia GTX 970 now, but it probably does not matter which).

That was before WAH2, but the error rate was 100% on all the work units at the time, so I presume it still applies. Whether that is true on all hardware is another matter, but if you have an external card, I would give it a try.
ID: 53693 · Report as offensive     Reply Quote
Art Masson
Avatar

Send message
Joined: 16 Oct 11
Posts: 254
Credit: 15,954,577
RAC: 19
Message 53696 - Posted: 19 Mar 2016, 15:15:43 UTC - in response to Message 53692.  

I have an external graphics adapter -- NVIDIA GT620
ID: 53696 · Report as offensive     Reply Quote
Art Masson
Avatar

Send message
Joined: 16 Oct 11
Posts: 254
Credit: 15,954,577
RAC: 19
Message 53697 - Posted: 19 Mar 2016, 15:28:21 UTC - in response to Message 53692.  

My CPU's (all 8) run between 42 and 50 degrees Centigrade.

As Background, I've been running BOINC for years on this machine. I run 5 different projects managed by BOINC across the 8 processors on my Intel I7. The projects I run are CPDN, Einstein@Home, Enigma@Home, SETI@Home, and Milkyway@Home.

If I enable all the projects and let them run, everything runs fine -- except for the CPDN projects, which inevitably all fail (since approx April 2015). All other projects run fine. What I've done (so far) is demonstrate that if I suspend all other projects except for a single work unit in CPDN, the single CPDN work unit will finish. I'm trying that one more time on a single CPDN work unit to verify. After that I will try multiple CPDN work units (with all other projects suspended). I suspect that (for some reason) starting in April 2015, something happened which started causing failures in CPDN processing. This feels like some kind of interaction problem within BOINC -- but whatever it is, it only affects CPDN work units...more later after more testing. (This could also be any number of other things including some strange Windows 7 interaction with an update in March/April 2015). I'll continue to try to see if I can determine which combination of project BOINC processing causes the CPDN work units to fail...

I'm currently running BOINC 7.6.29....but I've had the same problem back to 7.0.44 as best I can tell. I tried to go back to 7.0.44 to see if the problem goes away, but it surprisingly did not...

Art Masson
St. Charles, IL
USA
ID: 53697 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 53698 - Posted: 19 Mar 2016, 15:40:18 UTC - in response to Message 53697.  
Last modified: 19 Mar 2016, 15:41:20 UTC

That is a good, methodical approach. I have had Folding on the GPU interfere with some of the older CPDN tasks, but not with WAH2. And I don't know whether the BOINC GPU projects you are running could do it too. You may be the first to find out. (Einstein, POEM and GPUGrid are no problems on the GPU for me though).
ID: 53698 · Report as offensive     Reply Quote
1 · 2 · 3 · 4 . . . 5 · Next

Message boards : Number crunching : Intel I7 Woes....No successful completion since April 2015

©2024 climateprediction.net