climateprediction.net home page
Posts by Pilot_51

Posts by Pilot_51

1) Questions and Answers : Unix/Linux : Trying to get tasks to not crash Linux client, now not receiving tasks (Message 57279)
Posted 1 Nov 2017 by Pilot_51
Post:
I received a couple HadCM3 tasks a few hours ago and they both failed with repeated segmentation violation crashes within a minute. It appears to be another batch that is failing on Linux and so far working fine on Windows and Mac.
2) Questions and Answers : Unix/Linux : BOINC crashes when running CPDN (Message 57054)
Posted 5 Oct 2017 by Pilot_51
Post:
I kind of doubt they can restrict WUs by BOINC version, though that would be an ideal solution.

If I understand correctly, only the PNW models were crashing. Couldn't they isolate those to Windows and let Mac/Linux get the rest of the models that aren't known to crash or are OS restrictions an application-wide thing?
3) Questions and Answers : Unix/Linux : Trying to get tasks to not crash Linux client, now not receiving tasks (Message 57053)
Posted 5 Oct 2017 by Pilot_51
Post:
On any recent version of Ubuntu, I just run this and it takes care of everything.

sudo  apt-get  install  lib32ncurses5  lib32z1  gcc-4.7-multilib


I'm sure it installs some items that aren't strictly necessary for getting cpdn running on 64 bit distributions, but it works.


My server is Debian and gcc-4.7-multilib isn't available in the repo. I would think that if all dependencies are satisfied according to ldd, as accomplished by installing lib32z1 in this case, nothing more would be needed.
4) Questions and Answers : Unix/Linux : Trying to get tasks to not crash Linux client, now not receiving tasks (Message 57037)
Posted 4 Oct 2017 by Pilot_51
Post:
Unfortunately, the remaining two tasks failed with the same error. Once the second of the three tasks failed, I restarted boinc-client to reload everything in hopes of saving the last task, but it didn't work.

It would be great if BOINC checked dependencies before starting a task, displaying a warning if they aren't satisfied and requiring the user to resolve it before the task starts. It's a waste of resources to spend 15 days on a task that was doomed to fail from the beginning.
5) Questions and Answers : Unix/Linux : Trying to get tasks to not crash Linux client, now not receiving tasks (Message 56993)
Posted 28 Sep 2017 by Pilot_51
Post:
Doh! A bit off topic, but I'll use this opportunity for a reminder. I lost one on my server because it didn't have libz.so.1. I think it happened at the very end as it was wrapping up. I made sure to install dependencies on my main system and forgot to do it on my server. I just did (lib32ncurses5 and lib32z1) and verified with ldd, so that should prevent the same thing occurring to the remaining two tasks with about 1.5 and 8 days remaining.

For anyone getting started, remember to check/install dependencies on all systems!
6) Questions and Answers : Unix/Linux : Trying to get tasks to not crash Linux client, now not receiving tasks (Message 56960)
Posted 25 Sep 2017 by Pilot_51
Post:
Thanks for the heads-up, that helps clear up what was going on.

Interestingly, all 3 tasks given to my Debian server are still going great, currently at 41%, 58%, and 70%.

It would appear that at least one bad batch was pnw25, since that is not running on my server and it was always running on my main system when the client crashed, including the very last task which was running alone. The second-to-last task that got further along and ran out of storage was cam25. All the earlier tasks were running alongside several others including two pnw25 tasks.

So, I think it's safe to say that the location of the data dir had nothing to do with the crashes, and I honestly don't know how it could have. Without knowing exactly what was causing the crash in pnw25, I doubt there was anything that could be done short of using WINE to prevent it from crashing. If I were to receive more WUs with what I know now, assuming the issue wasn't fixed, I'd just abort any pnw25 tasks.
7) Questions and Answers : Unix/Linux : Trying to get tasks to not crash Linux client, now not receiving tasks (Message 56955)
Posted 24 Sep 2017 by Pilot_51
Post:
Yeah, I noticed fewer than 300 unsent WUs this morning and now it's 0, so now it's a wait for more to become available.

I know I could use WINE and do use it for the occasional Windows-only game, but I'd rather not make that compromise and reduce the importance of them making things work correctly on Linux. If there's one thing I like less than running Windows-only software in WINE, it's running cross-platform software in WINE because the native build is broken or buggy, so I'd either deal with the bugs or not use it at all. I know, I'm weird.

Once things get going again and I'm receiving WUs, I'll make sure it completes a task on the main partition and then see if simply moving the data dir breaks the next task. I'm still quite determined to find a stable solution that lets me store the CPDN data on another drive, though probably won't go as far as reinstalling the OS or changing the location of /var.
8) Questions and Answers : Unix/Linux : Trying to get tasks to not crash Linux client, now not receiving tasks (Message 56929)
Posted 23 Sep 2017 by Pilot_51
Post:
I completely wiped and reinstalled boinc-client and kept the data folder in the default location, making a backup copy of the fresh directory just in case. I also managed to free up about 7GB of space by uninstalling some software I hadn't used in a while, giving BOINC 9GB to work with and a 1GB margin. Unfortunately, that didn't fix the 0 tasks issue.

I think the developers are going to do something next week. It might be deprecating Mac and Linux apps until the cpdn problem is found, or creating a win only app for the problem task sets. Hopefully we'll have some news up next week on what the path forward will be.

That sounds very plausible and I hope the lack of tasks is intentional in an effort to prevent and ultimately fix the crash issue. Can any other Linux/Mac users confirm whether they've received new WUs since a day or two ago? I suppose it's possible the server just didn't like how all 17 tasks it sent this computer failed, 16 of which were abandoned.

For now, I'll stick with WCG and continue checking CPDN for tasks, as well as keeping an eye out for any news on the crash issue.
9) Questions and Answers : Unix/Linux : Trying to get tasks to not crash Linux client, now not receiving tasks (Message 56919)
Posted 22 Sep 2017 by Pilot_51
Post:
I recently started using BOINC again (previously used it in 2003-2004, other programs until 2013) to contribute to climate modeling. Unfortunately, I've been having a lot of trouble getting CPDN tasks to work properly on my main PC running Linux Mint 18.2.

Here's the task list.

I installed the BOINC client and manager from the Mint/Ubuntu repository and moved /var/lib/boinc-client to another partition with much more space, making sure to change the BOINC_DIR line in /etc/default/boinc-client. That seemed to work with up to 8 CPDN tasks running, but in the morning I found that the boinc client had crashed and would crash immediately after starting it again.

This was what appeared in syslog when it crashed initially:
Sep 19 05:29:38 mark-main systemd[1]: boinc-client.service: Main process exited, code=exited, status=193/n/a
Sep 19 05:29:40 mark-main systemd[1]: boinc-client.service: Unit entered failed state.
Sep 19 05:29:40 mark-main systemd[1]: boinc-client.service: Failed with result 'exit-code'.

The exact same errors occurred each time I tried restarting the client.

I tried modifying client_state.xml and deleting files to clear the problem task, but whatever I did didn't help and appeared to cause the remaining tasks to go into an error state at the next client start/crash. I then removed all references to the project I could find and moved the data directory back to /var/lib/boinc-client and reverted BOINC_DIR, thinking maybe it didn't like that I moved the directory. The client started and I started one task which appeared to get farther along, but that ran out of space as I was doing something else that used a lot of /tmp and caused a computation error in the task. I moved the /var/lib/boinc-client directory back to the other partition as before, but this time just used a symbolic link without changing BOINC_DIR. I also made sure to chown boinc:boinc on the moved directory.

I started one task again, but again it reached around 10% and crashed the client. I found that it appeared to be trying to send a result around the same time, so I deleted just the result part from client_state.xml and that allowed the client to restart. However, even though just about every other reference to the task was automatically removed and I even reset the project, I wasn't receiving more tasks and the website kept the failed task as 'In Progress' until I removed and readded the project. I was going to try suspending network activity to see if that would prevent it from crashing before completion, but CPDN hasn't sent any new tasks for about a day now, I just keep getting this in the log:
Fri 22 Sep 2017 03:16:29 PM EDT | climateprediction.net | Sending scheduler request: To fetch work.
Fri 22 Sep 2017 03:16:29 PM EDT | climateprediction.net | Requesting new tasks for CPU
Fri 22 Sep 2017 03:16:31 PM EDT | climateprediction.net | Scheduler request completed: got 0 new tasks
Fri 22 Sep 2017 03:16:31 PM EDT | climateprediction.net | No tasks sent


I can see in the server status that there are still plenty of unsent tasks in wah2, the same application that I was receiving before.

Because of these issues and since the request_delay (communication deferred) time is so long, I've started contributing to WCG to fill the time, but I'd really prefer my resources go toward helping us understand the climate. I have not had a single issue with WCG after about 100 tasks.

Fortunately, all is not lost for my CPDN efforts. I have an Intel NUC server running Debian that has so far been crunching without issue on 3 of its 4 cores, currently 23-39% between the tasks.

Any help in resolving this would be appreciated.




©2024 climateprediction.net