climateprediction.net home page
Workunit "stuck" in the middle of calculation.

Workunit "stuck" in the middle of calculation.

Questions and Answers : Unix/Linux : Workunit "stuck" in the middle of calculation.
Message board moderation

To post messages, you must log in.

AuthorMessage
Letharion

Send message
Joined: 29 Aug 04
Posts: 2
Credit: 1,373,302
RAC: 0
Message 46861 - Posted: 24 Aug 2013, 22:51:32 UTC

I have a workunit, that's been stuck at 24.944% for a very long time.

Every time I start the computer up, the WU is at 24.944%, and 134 hours. Today I left it running 12 hours straight. Hours predictably increased to 146, but the % was still the same. Rebooting my computer, the WU is still at the same %, and goes back to 134 elapsed hours.

Should I just abort the task? Any idea what's wrong. This isn't the first time this has happened to me, previously I assumed it was a one time thing and didn't post here, but now that it recurred, I figured it might be useful to get some input.
ID: 46861 · Report as offensive     Reply Quote
Belfry

Send message
Joined: 19 Apr 08
Posts: 179
Credit: 4,306,992
RAC: 0
Message 46862 - Posted: 24 Aug 2013, 23:40:45 UTC - in response to Message 46861.  

Not sure about this, but it shouldn't hurt anything. Stop BOINC and 'chown -R user:group /path/to/data/directory'. User and group should be your user name if you're running BOINC stand alone, or "boinc" if installed from repository.
ID: 46862 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2167
Credit: 64,403,322
RAC: 5,085
Message 46863 - Posted: 24 Aug 2013, 23:44:05 UTC
Last modified: 24 Aug 2013, 23:44:57 UTC

Sounds like one of the "25%" problems, where a task gets stuck at 25% (or 50 or 75) and doesn't go past it. On some PCs, the task just crashes at that point, on others it just stops making progress.

I would abort the task.

Another user had some suggestions for setting in Linux that may help out with the 25% problems here. I made the changes on my Linux PCs and haven't had crashes at these 25% marks since then. But the Linux PCs I run are pretty much dedicated to crunching and I seldom stop Boinc or restart the PCs

Edit...but try Belfry's suggestion first, just in case.
ID: 46863 · Report as offensive     Reply Quote
Letharion

Send message
Joined: 29 Aug 04
Posts: 2
Credit: 1,373,302
RAC: 0
Message 46868 - Posted: 25 Aug 2013, 12:05:53 UTC
Last modified: 25 Aug 2013, 12:06:51 UTC

Thank you for your replies. :)

I checked, and everything already appeared to be owned by boinc, which is correct, since it's installed from repo. I ran chown anyway, just to be sure, restarted, and tried again.

Unfortunately that didn't help, so the task was aborted.

While I could mess with the swap settings, that doesn't seem like much of a solution. Especially since this is on a SSD drive, sluggish disk response should be the least of my concerns.

I'll just not run this project on that particular system until the issue gets a permanent fix.
ID: 46868 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4314
Credit: 16,377,675
RAC: 3,657
Message 46870 - Posted: 25 Aug 2013, 20:13:09 UTC

I'll just not run this project on that particular system until the issue gets a permanent fix.


That might be some time from what I have read on the different fora. It would appear that the problem with a file becoming corrupted doesn't actually happen at the 25/50/75/100% points but that is when it is picked up presumably as the zip file is created. The problem seems to occur on all platforms, not just nix. Whether it is the same work units for each platform that fail with it??????????

My impression is that it is happening less often on my machine than it did.
ID: 46870 · Report as offensive     Reply Quote
Bob Knippel

Send message
Joined: 3 Mar 13
Posts: 2
Credit: 13,423,511
RAC: 0
Message 47526 - Posted: 11 Nov 2013, 0:52:49 UTC

I have had the same issue, stuck at 52.195% somewhere around the 700 hour mark, however, as the time remaining continued to count down, I let it run, assuming it was still crunching. Now I am not so sure. Today it finally reached zero hours remaining at 1435 hours, but is still running about six hours later. I have no idea if this is corrupted or not, but I intend to give it a couple of days more and see what happens.
ID: 47526 · Report as offensive     Reply Quote
Profile Greg van Paassen

Send message
Joined: 17 Nov 07
Posts: 142
Credit: 4,271,370
RAC: 0
Message 47527 - Posted: 11 Nov 2013, 1:50:33 UTC - in response to Message 47526.  

Bob, the 25% / 50% / 75% problem is one where the tasks crash and terminate themselves at those points. They may leave behind directories named after themselves in the boinc-client/projects/climateprediction.net directory. That particular problem is less common than it was in 2012.

Your problem sounds like a "zombie" task. It's dead, but won't lie down. In all cases that I know of, the only possible action for tasks that stop advancing is to terminate them.
ID: 47527 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 47529 - Posted: 11 Nov 2013, 4:18:33 UTC - in response to Message 47526.  

There's only one reliable way to see what the model is doing, and that's to look at the data provided on the graphics page for each model.

Click the Show graphics button to get there. And if, like a lot of Linux users, you can't get that to work, then the next best thing to see if it's completed, is to look at the list of trickles.

Here is the page for one of mine, so that you can see what the last one is, and work out the 25% points.

ID: 47529 · Report as offensive     Reply Quote
Profile tullio

Send message
Joined: 6 Aug 04
Posts: 264
Credit: 965,476
RAC: 0
Message 47530 - Posted: 11 Nov 2013, 5:15:14 UTC - in response to Message 47529.  

I can see an Earth on my Linux box, but the graphic window seems to be transparent, contrarily to what happens at Einstein@home on another Linux box.
Maybe the window's parameters are not set right.
Tullio
ID: 47530 · Report as offensive     Reply Quote
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,053,321
RAC: 4,417
Message 47531 - Posted: 11 Nov 2013, 6:20:48 UTC

That�s what you get for running Linux. Everyone knows that Windows runs flawlessly. Don�t believe anything you read on those other threads that say different. ;-)
ID: 47531 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 47532 - Posted: 11 Nov 2013, 6:38:25 UTC - in response to Message 47530.  

Tullio

That's more than I've got so far on my Linux box, (The button fades for 1 second, then pops back to normal, with no window), but I haven't had much chance to try different things yet. Perhaps update drivers.

ID: 47532 · Report as offensive     Reply Quote
Profile tullio

Send message
Joined: 6 Aug 04
Posts: 264
Credit: 965,476
RAC: 0
Message 47533 - Posted: 11 Nov 2013, 12:36:23 UTC - in response to Message 47532.  

Thanks Les. Anyway, I am running a hadcm3n model alongside 2 Astropulse units from SETI@home and a Gravitation unit from Albert@home, which is a Beta project of Einstein@home. This on my Sun WS of 2008 vintage, while I have confined my Test4Theory@home, SETI@home and Einstein@home on my newer HP laptop. All this on Linux, obviously. I am also running Virtual Box 4.2.18 on the HP. which is needed by Test4Theory@home, But in this moment I am mostly struck by the tragedy of Philippines Islands. We have many Filipinos in Italy and they are honest, hardworking people, always on the cell phone talking to their relatives at home.
Tullio
ID: 47533 · Report as offensive     Reply Quote
Eirik Redd

Send message
Joined: 31 Aug 04
Posts: 391
Credit: 219,888,554
RAC: 1,481,373
Message 47558 - Posted: 13 Nov 2013, 10:24:47 UTC - in response to Message 47529.  

My experience when a task stops just short of a decade (or quarter-way) point, and stops trickling at its usual rate --
Either:
kill it, and let it be re-issued.
Or, if you have a good, clean backup, and no other models or projects running, go back to your latest good clean backup before the previous decade point.

There's only one reliable way to see what the model is doing, and that's to look at the data provided on the graphics page for each model.

Click the Show graphics button to get there. And if, like a lot of Linux users, you can't get that to work, then the next best thing to see if it's completed, is to look at the list of trickles.

Here is the page for one of mine, so that you can see what the last one is, and work out the 25% points.



If the "show graphics" on Linux don't work -- what has worked for me is a sometimes annoyingly slow process of doing
    cpdn@ilex:~$ ldd BOINC/projects/climateprediction.net/hadcm3n_graphics_6.07_i686-pc-linux-gnu 
    	linux-gate.so.1 =>  (0xf77bd000)
    	libm.so.6 => /lib/i386-linux-gnu/libm.so.6 (0xf7740000)
    	libc.so.6 => /lib/i386-linux-gnu/libc.so.6 (0xf7588000)
    	libpthread.so.0 => /lib/i386-linux-gnu/libpthread.so.0 (0xf7568000)
    	libGL.so.1 => /usr/lib/i386-linux-gnu/mesa/libGL.so.1 (0xf7508000)
    	libX11.so.6 => /usr/lib/i386-linux-gnu/libX11.so.6 (0xf73d0000)
    	libXext.so.6 => /usr/lib/i386-linux-gnu/libXext.so.6 (0xf73b8000)
    	libXt.so.6 => /usr/lib/i386-linux-gnu/libXt.so.6 (0xf7358000)
    	libXmu.so.6 => /usr/lib/i386-linux-gnu/libXmu.so.6 (0xf7338000)
    	libXi.so.6 => /usr/lib/i386-linux-gnu/libXi.so.6 (0xf7320000)
    	libjpeg.so.62 => /usr/lib/i386-linux-gnu/libjpeg.so.62 (0xf72f8000)
    	libz.so.1 => /lib/i386-linux-gnu/libz.so.1 (0xf72d8000)
    	/lib/ld-linux.so.2 (0xf7798000)
    	libglapi.so.0 => /usr/lib/i386-linux-gnu/libglapi.so.0 (0xf72c0000)
    	libXdamage.so.1 => /usr/lib/i386-linux-gnu/libXdamage.so.1 (0xf72b8000)
    	libXfixes.so.3 => /usr/lib/i386-linux-gnu/libXfixes.so.3 (0xf72b0000)
    	libX11-xcb.so.1 => /usr/lib/i386-linux-gnu/libX11-xcb.so.1 (0xf72a8000)
    	libxcb-glx.so.0 => /usr/lib/i386-linux-gnu/libxcb-glx.so.0 (0xf7290000)
    	libxcb-dri2.so.0 => /usr/lib/i386-linux-gnu/libxcb-dri2.so.0 (0xf7288000)
    	libxcb.so.1 => /usr/lib/i386-linux-gnu/libxcb.so.1 (0xf7260000)
    	libXxf86vm.so.1 => /usr/lib/i386-linux-gnu/libXxf86vm.so.1 (0xf7258000)
    	libdrm.so.2 => /usr/lib/i386-linux-gnu/libdrm.so.2 (0xf7248000)
    	libdl.so.2 => /lib/i386-linux-gnu/libdl.so.2 (0xf7240000)
    	libSM.so.6 => /usr/lib/i386-linux-gnu/libSM.so.6 (0xf7230000)
    	libICE.so.6 => /usr/lib/i386-linux-gnu/libICE.so.6 (0xf7210000)
    	libXau.so.6 => /usr/lib/i386-linux-gnu/libXau.so.6 (0xf7208000)
    	libXdmcp.so.6 => /usr/lib/i386-linux-gnu/libXdmcp.so.6 (0xf7200000)
    	libuuid.so.1 => /lib/i386-linux-gnu/libuuid.so.1 (0xf71f8000)


If any of the shared libs lib***.so.* (DLL's for Windows users, and others) shows "not found" that's the 32-bit library you need to get from your distro. Getting it, or getting a more recent version, can be a small PITA (kind of middle eastern pan bread, best with garlic) because it's sometimes not trivial to find the 32-bit package that contains the shared library you need. Most of us are running 64-bit these days, 32-bit on 64-bit is still not perfectly supported, and the distros make it less (or more) easy to look up what package has the 32-bit version needed for the CPDN 32-bit graphics libs (NOT meaning 32-bit graphics - that's something else again)
But the slow tedious process of finding and installing/upgrading those graphics libs has worked for me, when I've had graphics problems on recent Linux.


ID: 47558 · Report as offensive     Reply Quote
Profile tullio

Send message
Joined: 6 Aug 04
Posts: 264
Credit: 965,476
RAC: 0
Message 47565 - Posted: 14 Nov 2013, 1:04:37 UTC

I finished one hadcm3n task on my Linux box and started another. Show graphics shows an Earth in a transparent window but it is stable. My Linux is SuSE 12.1 on this box, SuSE 12.3 on another running Test4Theory@home with VirtualBox 4.2.18.I had trouble running hadcm3n tasks while VirtualBox and T4T were running on this box, and selected the HADAM3P models. But since they are not available I got one hadcm3n task, finished it and started another.
Tullio
ID: 47565 · Report as offensive     Reply Quote
Profile Ron Crouch
Avatar

Send message
Joined: 24 Feb 05
Posts: 45
Credit: 11,332,534
RAC: 0
Message 47569 - Posted: 14 Nov 2013, 4:12:20 UTC

For those trying to understand why the graphics for Boinc will work on some projects on some distros and not others. Please refer to this.
6,000?? Give it a rest.

G�bekli Tepe is more than 10,000 years old. And quite intricate I might add.

Explain that!
ID: 47569 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 47571 - Posted: 14 Nov 2013, 5:09:09 UTC

This is drifting a bit off topic, but ...

There are several reasons why my graphics don't work:
1) It's a new machine, using Mint, with Cinnamon as the desktop. The install DVD did everything, with just a few questions. Then it couldn't find BOINC in a repository, so I went to the BOINC site, where I was offered the current version.
2) After it was downloaded, there was a pop up screen, with a button saying Install. So I clicked it, and Mint put it into a root directory. And, I think, installed it as a system program.
3) The processor is an i7-3770K, and I'm using the built in display chip.

So, an unknown system version, (although it sees the full 16 Gigs of ram, and the desktop is 32 bit), an unknown BOINC version, unknown chipset drivers, and a system install.

Worst though is;
4) There's no models with which to test changes.

I've deleted BOINC and started again, watching closely, and putting things where I want them. This time BOINC is in /home/Leslie. So now I can look at the various folders and files without being root. Or having to type those long strings for the directories.

Still, the first time through, everything was dead easy to get set up. And no dreaded: "... requires that you be familiar with the UNIX command-line interface".

And knowing what others have done to fix things is always good to know.


ID: 47571 · Report as offensive     Reply Quote
Profile tullio

Send message
Joined: 6 Aug 04
Posts: 264
Credit: 965,476
RAC: 0
Message 47577 - Posted: 14 Nov 2013, 16:13:57 UTC

I am still using BOINC 6.10,58 on 6 BOINC projects, including Test4Theory@home which requires a 7.x.y client to run its latest version, but they are still maintaining the old version for those unwilling or unable to update their BOINC client. The result is that I am able to use Virtual Box 4.2.18 while the BOINC 7.x.y users must stop at VBox 4.2.16. So upgrading BOINC is a mixed blessing.
Tullio
ID: 47577 · Report as offensive     Reply Quote
WB8ILI

Send message
Joined: 1 Sep 04
Posts: 161
Credit: 81,421,805
RAC: 1,225
Message 47586 - Posted: 15 Nov 2013, 18:17:10 UTC

Graphics don't work -

I had the problem of graphics not working. I am using a UBUNTU distribution, so this may or may not help.

1) Open the Terminal
2) Start the BOINC Manager after navigating to the appropriate directory
3) Go to the BOINC Manager, select a task, click on Show Graphics
4) Go back to the Terminal and all of the missing libraries will be shown

You may have to repeat steps 3 and 4



ID: 47586 · Report as offensive     Reply Quote

Questions and Answers : Unix/Linux : Workunit "stuck" in the middle of calculation.

©2024 climateprediction.net