climateprediction.net home page
Task ... exited with zero status but no 'finished' file
Task ... exited with zero status but no 'finished' file
log in

Advanced search

Message boards : Number crunching : Task ... exited with zero status but no 'finished' file

Author Message
Profile BigDave67
Avatar
Send message
Joined: 27 Nov 08
Posts: 2
Credit: 306,339
RAC: 673
Message 48097 - Posted: 5 Feb 2014, 0:42:42 UTC

Hello everyone,

I have been getting the following error messages for a while now. For some reason the wu's have been exiting and restarting. These are the only tasks running on my host at this time. This is a two CPU machine, so each task gets its own CPU. Computing preferences are set so that the wu's run even while my computer is in use.

Any ideas as to what is going on? Should I reset?


2/1/2014 7:36:06 PM | | cc_config.xml not found - using defaults
2/1/2014 7:36:07 PM | | Starting BOINC client version 7.2.33 for windows_x86_64
2/1/2014 7:36:07 PM | | log flags: file_xfer, sched_ops, task
2/1/2014 7:36:07 PM | | Libraries: libcurl/7.25.0 OpenSSL/1.0.1 zlib/1.2.6
2/1/2014 7:36:07 PM | | Data directory: C:\ProgramData\BOINC
2/1/2014 7:36:07 PM | | Running under account s67bigdave
2/1/2014 7:36:07 PM | | No usable GPUs found
2/1/2014 7:36:07 PM | | Host name: db001
2/1/2014 7:36:07 PM | | Processor: 2 AuthenticAMD AMD Athlon(tm) 64 X2 Dual Core Processor 4800+ [Family 15 Model 107 Stepping 2]
2/1/2014 7:36:07 PM | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 htt pni cx16 syscall nx lm svm rdtscp 3dnowext 3dnow
2/1/2014 7:36:07 PM | | OS: Microsoft Windows 7: Ultimate x64 Edition, Service Pack 1, (06.01.7601.00)
2/1/2014 7:36:07 PM | | Memory: 7.75 GB physical, 20.49 GB virtual
2/1/2014 7:36:07 PM | | Disk: 309.51 GB total, 183.96 GB free
2/1/2014 7:36:07 PM | | Local time is UTC -8 hours
2/1/2014 7:36:07 PM | | VirtualBox version: 4.3.6
2/1/2014 7:36:07 PM | | Config: GUI RPCs allowed from:
2/1/2014 7:36:07 PM | | 192.168.137.1
2/1/2014 7:36:07 PM | | 192.168.137.2
2/1/2014 7:36:07 PM | | 192.168.137.3
2/1/2014 7:36:07 PM | climateprediction.net | URL http://climateprediction.net/; Computer ID 1279988; resource share 100
2/1/2014 7:36:07 PM | Einstein@Home | URL http://einstein.phys.uwm.edu/; Computer ID 7109305; resource share 100
2/1/2014 7:36:07 PM | Milkyway@Home | URL http://milkyway.cs.rpi.edu/milkyway/; Computer ID 514459; resource share 100
2/1/2014 7:36:07 PM | SETI@home | URL http://setiathome.berkeley.edu/; Computer ID 6978254; resource share 100
2/1/2014 7:36:07 PM | Cosmology@Home | URL http://www.cosmologyathome.org/; Computer ID 190654; resource share 100
2/1/2014 7:36:07 PM | SETI@home | General prefs: from SETI@home (last modified 04-Dec-2013 16:58:57)
2/1/2014 7:36:07 PM | SETI@home | Computer location: home
2/1/2014 7:36:07 PM | SETI@home | General prefs: no separate prefs for home; using your defaults
2/1/2014 7:36:07 PM | | Reading preferences override file
2/1/2014 7:36:07 PM | | Preferences:
2/1/2014 7:36:07 PM | | max memory usage when active: 3967.68MB
2/1/2014 7:36:07 PM | | max memory usage when idle: 7141.83MB
2/1/2014 7:36:07 PM | | max disk usage: 37.00GB
2/1/2014 7:36:07 PM | | max download rate: 256000 bytes/sec
2/1/2014 7:36:07 PM | | (to change preferences, visit a project web site or select Preferences in the Manager)
2/1/2014 7:36:07 PM | | Not using a proxy
2/1/2014 7:36:29 PM | climateprediction.net | Restarting task hadcm3n_84yl_1980_40_008464385_2 using hadcm3n version 607 in slot 0
2/1/2014 7:36:29 PM | climateprediction.net | Restarting task hadcm3n_703u_1980_40_008411395_2 using hadcm3n version 607 in slot 1
....
2/4/2014 10:00:02 AM | climateprediction.net | Task hadcm3n_703u_1980_40_008411395_2 exited with zero status but no 'finished' file
2/4/2014 10:00:02 AM | climateprediction.net | If this happens repeatedly you may need to reset the project.
2/4/2014 10:00:02 AM | climateprediction.net | Restarting task hadcm3n_703u_1980_40_008411395_2 using hadcm3n version 607 in slot 1
2/4/2014 10:01:10 AM | climateprediction.net | Task hadcm3n_84yl_1980_40_008464385_2 exited with zero status but no 'finished' file
2/4/2014 10:01:10 AM | climateprediction.net | If this happens repeatedly you may need to reset the project.
2/4/2014 10:01:10 AM | climateprediction.net | Restarting task hadcm3n_84yl_1980_40_008464385_2 using hadcm3n version 607 in slot 0
2/4/2014 10:01:54 AM | climateprediction.net | Task hadcm3n_703u_1980_40_008411395_2 exited with zero status but no 'finished' file
2/4/2014 10:01:54 AM | climateprediction.net | If this happens repeatedly you may need to reset the project.
2/4/2014 10:01:54 AM | climateprediction.net | Restarting task hadcm3n_703u_1980_40_008411395_2 using hadcm3n version 607 in slot 1
2/4/2014 10:03:07 AM | climateprediction.net | Task hadcm3n_84yl_1980_40_008464385_2 exited with zero status but no 'finished' file
2/4/2014 10:03:07 AM | climateprediction.net | If this happens repeatedly you may need to reset the project.
2/4/2014 10:03:07 AM | climateprediction.net | Restarting task hadcm3n_84yl_1980_40_008464385_2 using hadcm3n version 607 in slot 0
2/4/2014 10:04:23 AM | climateprediction.net | Task hadcm3n_703u_1980_40_008411395_2 exited with zero status but no 'finished' file
2/4/2014 10:04:23 AM | climateprediction.net | If this happens repeatedly you may need to reset the project.
2/4/2014 10:04:23 AM | climateprediction.net | Restarting task hadcm3n_703u_1980_40_008411395_2 using hadcm3n version 607 in slot 1
2/4/2014 10:04:42 AM | climateprediction.net | Task hadcm3n_84yl_1980_40_008464385_2 exited with zero status but no 'finished' file
2/4/2014 10:04:42 AM | climateprediction.net | If this happens repeatedly you may need to reset the project.
2/4/2014 10:05:20 AM | climateprediction.net | Task hadcm3n_703u_1980_40_008411395_2 exited with zero status but no 'finished' file
2/4/2014 10:05:20 AM | climateprediction.net | If this happens repeatedly you may need to reset the project.
2/4/2014 10:05:20 AM | climateprediction.net | Restarting task hadcm3n_703u_1980_40_008411395_2 using hadcm3n version 607 in slot 1

____________
I live in and am from the Mojave Desert in Southern California.

Les Bayliss
Volunteer moderator
Send message
Joined: 5 Sep 04
Posts: 6909
Credit: 20,843,205
RAC: 108
Message 48098 - Posted: 5 Feb 2014, 2:14:23 UTC - in response to Message 48097.

That message is usually benign, although a nuisance if it it keeps happening.
I think that it was intended for other projects.

Anyway, just keep plodding on, and you should finish them.


____________
Backups: Here

Profile MikeMarsUK
Volunteer moderator
Avatar
Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,546,884
RAC: 10,274
Message 48099 - Posted: 5 Feb 2014, 14:34:35 UTC

Yes, like Les says, just ignore them.


They can be caused by a number of things - something taking a lot of CPU time on the PC, or interruptions to internet access (if the boinc manager makes a DNS request which takes a long time, then these messages will appear).
____________
I'm a volunteer and my views are my own.
News and Announcements and FAQ

Profile BigDave67
Avatar
Send message
Joined: 27 Nov 08
Posts: 2
Credit: 306,339
RAC: 673
Message 48100 - Posted: 5 Feb 2014, 15:40:35 UTC
Last modified: 5 Feb 2014, 15:42:52 UTC

Well it's kind of moot now, both tasks errored out this morning.

Task hadcm3n_84yl_1980_40_008464385_2 was about 43% complete and task hadcm3n_703u_1980_40_008411395_2 was about 56% complete when I posted my message last night.

BOINC is now downloading wu's for other projects.

...
2/5/2014 4:55:57 AM | climateprediction.net | Task hadcm3n_84yl_1980_40_008464385_2 exited with zero status but no 'finished' file
2/5/2014 4:55:57 AM | climateprediction.net | If this happens repeatedly you may need to reset the project.
2/5/2014 4:55:57 AM | climateprediction.net | Restarting task hadcm3n_84yl_1980_40_008464385_2 using hadcm3n version 607 in slot 0
2/5/2014 4:56:48 AM | climateprediction.net | Task hadcm3n_703u_1980_40_008411395_2 exited with zero status but no 'finished' file
2/5/2014 4:56:48 AM | climateprediction.net | If this happens repeatedly you may need to reset the project.
2/5/2014 4:56:48 AM | climateprediction.net | Restarting task hadcm3n_703u_1980_40_008411395_2 using hadcm3n version 607 in slot 1
2/5/2014 4:59:28 AM | climateprediction.net | Task hadcm3n_84yl_1980_40_008464385_2 exited with zero status but no 'finished' file
2/5/2014 4:59:28 AM | climateprediction.net | If this happens repeatedly you may need to reset the project.
2/5/2014 4:59:28 AM | climateprediction.net | Computation for task hadcm3n_703u_1980_40_008411395_2 finished
2/5/2014 4:59:28 AM | climateprediction.net | Output file hadcm3n_703u_1980_40_008411395_2_2.zip for task hadcm3n_703u_1980_40_008411395_2 absent
2/5/2014 4:59:28 AM | climateprediction.net | Output file hadcm3n_703u_1980_40_008411395_2_3.zip for task hadcm3n_703u_1980_40_008411395_2 absent
2/5/2014 4:59:28 AM | climateprediction.net | Output file hadcm3n_703u_1980_40_008411395_2_4.zip for task hadcm3n_703u_1980_40_008411395_2 absent
2/5/2014 4:59:28 AM | climateprediction.net | Restarting task hadcm3n_84yl_1980_40_008464385_2 using hadcm3n version 607 in slot 0
2/5/2014 5:01:16 AM | climateprediction.net | Computation for task hadcm3n_84yl_1980_40_008464385_2 finished
2/5/2014 5:01:16 AM | climateprediction.net | Output file hadcm3n_84yl_1980_40_008464385_2_3.zip for task hadcm3n_84yl_1980_40_008464385_2 absent
2/5/2014 5:01:16 AM | climateprediction.net | Output file hadcm3n_84yl_1980_40_008464385_2_4.zip for task hadcm3n_84yl_1980_40_008464385_2 absent
2/5/2014 5:05:14 AM | climateprediction.net | Sending scheduler request: To report completed tasks.
2/5/2014 5:05:14 AM | climateprediction.net | Reporting 2 completed tasks
2/5/2014 5:06:18 AM | climateprediction.net | Scheduler request completed
...

Why does BOINC report tasks as completed, when they have obviously errored out?
Tasks are also reported as complete when downloads fail. (I know I should ask these questions on the BOINC forum, and will.)

climateprediction.net needs better download resuming, for me downloads will often error out and not complete. This doesn't happen with the other projects I'm running, they resume downloading fine.

BigDave
____________
I live in and am from the Mojave Desert in Southern California.

Les Bayliss
Volunteer moderator
Send message
Joined: 5 Sep 04
Posts: 6909
Credit: 20,843,205
RAC: 108
Message 48101 - Posted: 5 Feb 2014, 20:17:04 UTC - in response to Message 48100.

BOINC is only a "traffic controller", for uploading and downloading.
All that it knows is that a model has stopped running. It doesn't know why, and it doesn't NEED to know why. That's the job of the software on the project's server.

It also doesn't know why output files are missing.
It was told at the start what output files to expect, and where to send them. But when the time came to tidy up, it realised that it couldn't find some of these files, and it's informing you of this. (The reason that they're missing is because the model crashed before the program got to the point in time of creating them.)

As for the download failures, this gets talked about in the Download failures thread every time it happens.
There is a defect in BOINC (as far as this program is concerned), where after a long period of time, if a task hasn't met it's max # of error/total/success tasks limits, it tries to re-issue it.
But by that time, only the main file is still on the server; all of the auxiliary files have long since been removed. And they get removed for various reasons, the main one for big batches of tasks is that they were not needed for some reason.
It's these auxiliary files that ties up BOINC in the endless download failures, as it tries to find the files. But eventually various time outs are reached. Or the person has looked at the forum and found either a News and Announcements post saying Abort them, or the thread: MORE DOWNLOAD ERRORS, where the same advice gets posted.

There are currently 2 posts in News and Announcements about tasks that need to be aborted.



____________
Backups: Here

DadX
Send message
Joined: 30 Aug 06
Posts: 23
Credit: 1,150,223
RAC: 1
Message 48124 - Posted: 8 Feb 2014, 3:29:27 UTC

On my only PNW WU from the most recent batch I got a boat load of these messages:
2/7/2014 9:35:43 PM | climateprediction.net | Starting task hadam3p_pnw_ucto_2007_1_008509672_1 using hadam3p_pnw version 722 in slot 2
2/7/2014 9:35:57 PM | climateprediction.net | Task hadam3p_pnw_ucto_2007_1_008509672_1 exited with zero status but no 'finished' file
2/7/2014 9:35:57 PM | climateprediction.net | If this happens repeatedly you may need to reset the project.


Then 13 messages ( one for each zip) like:
2/7/2014 10:00:21 PM | climateprediction.net | Output file hadam3p_pnw_ucto_2007_1_008509672_1_1.zip for task hadam3p_pnw_ucto_2007_1_008509672_1 absent

The the WU rolled over and died. Any guess as to why?

Windows 7 64bit
12GB memory with 7GB available.
AMD A6 running 2 WCG taks and a Linux session in VirtualBox
CPU utilization at 75% on average before the PNW task

Les Bayliss
Volunteer moderator
Send message
Joined: 5 Sep 04
Posts: 6909
Credit: 20,843,205
RAC: 108
Message 48125 - Posted: 8 Feb 2014, 4:18:21 UTC - in response to Message 48124.

Any guess as to why?

Not yet. It's being discussed in this thread.

DJStarfox
Send message
Joined: 27 Jan 07
Posts: 287
Credit: 1,948,877
RAC: 4,461
Message 48408 - Posted: 15 Mar 2014, 0:53:01 UTC - in response to Message 48097.

BigDave67,
In your Computing Preferences, be sure to set "Leave tasks in memory while suspended?" = Yes. This will pause climate models when necessary but won't force them to unload from memory. This setting can save time and reduce the possibilities of errors if your machine switches from idle to busy frequently. Works for me; YMMV.

Profile Ananas
Volunteer moderator
Send message
Joined: 31 Oct 04
Posts: 336
Credit: 3,316,482
RAC: 0
Message 48413 - Posted: 15 Mar 2014, 16:08:23 UTC
Last modified: 15 Mar 2014, 16:32:56 UTC

A very nice solution would be if the BOINC user could create a file in the project folder that serves as a flag to disable the heartbeat checking in the BOINC project API for a specific project.

As this is not supported from Berkeley, it has to be re-done with each update on the API version. But ... as this is not supported from Berkeley, they cannot remove it from the sources either ;-)

In this case, the user himself would be reliable for identifying dead (e.g. looping or stuck) tasks, but for people who know what they are doing it would help a lot.

p.s.: Leaving tasks in memory is always a good idea, especially for CPDN, it makes the results run smoother. It does not help much when the BOINC core client is too busy to send the heartbeat though, as the project API enforces an exit in this case.

On one of my machines, that has a slow HDD, unpacking a CPDN workunit keeps the core client busy for about 1.5 minutes, but the project API allows only 30 seconds. Trouble with the name to IP resolution (usually an ISP problem on client side) has the potential for keeping the core client busy for several minutes too.

p.p.s.: On Windows, you can probably improve your HDD speed by disabling the windows index service for the HDD where you have your BOINC files - and don't use file system compression. And - if BOINC has its own partition - you could try FAT32 instead of NTFS. NTFS is less likely to be damaged when a power failure occurs but it is slower.

Profile Ananas
Volunteer moderator
Send message
Joined: 31 Oct 04
Posts: 336
Credit: 3,316,482
RAC: 0
Message 48414 - Posted: 15 Mar 2014, 19:57:16 UTC - in response to Message 48413.
Last modified: 15 Mar 2014, 20:07:51 UTC

Developers options (users cannot do that) :

Switching off the heartbeat check : http://boinc.berkeley.edu/trac/wiki/OptionsApi

Works fine in the wrapper for RNA-World :-)

Message boards : Number crunching : Task ... exited with zero status but no 'finished' file


Main page · Your account · Message boards


Copyright © 2019 climateprediction.net