Message boards : Number crunching : Tasks available, but I am not getting them.
Message board moderation
Author | Message |
---|---|
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
A couple of days ago, my Linux machine received a CPDN task, which it has not done in quite a while. Task 22463632 Name wah2_nz25_2296_209805_25_1019_012300899_1 Workunit 12300899 Created 22 Jul 2024, 18:59:57 UTC Sent 22 Jul 2024, 19:00:03 UTC Report deadline 30 Oct 2024, 19:00:03 UTC Server state In progress Client state New Exit status 0 (0x00000000) Computer ID 1511241 It seems to be running just fine and has already uploaded 10 trickles. My app_config file allows two of these to run at a time. Now there are a lot more of these available on the server ready for download, but when my machine tries to get some, I get: Wed 24 Jul 2024 07:24:44 PM EDT | climateprediction.net | Sending scheduler request: To fetch work. Wed 24 Jul 2024 07:24:44 PM EDT | climateprediction.net | Requesting new tasks for CPU Wed 24 Jul 2024 07:24:47 PM EDT | climateprediction.net | Scheduler request completed: got 0 new tasks Wed 24 Jul 2024 07:24:47 PM EDT | climateprediction.net | No tasks sent Wed 24 Jul 2024 07:24:47 PM EDT | climateprediction.net | Project requested delay of 3636 seconds Is this a prolem of my client, or is the server refusing me for some reason? Is it the server refusing to send me more until I successfully send one completed task? It has been so long that I do not remember how this works. Application details for host 1511241 contains as the last entry this. Weather At Home 2 (wah2) 8.27 i686-pc-linux-gnu Number of tasks completed 0 Max tasks per day 4 Number of tasks today 0 Consecutive valid tasks 0 Average turnaround time 0.00 days |
Send message Joined: 12 Apr 21 Posts: 318 Credit: 15,000,104 RAC: 9,568 |
These new runs use v 8.32 which is only for Windows. On the side note, I didn't realize WAH2 is available for Linux now. The Linux version is 8.27, which looks like 1019 and maybe earlier batches used. |
Send message Joined: 29 Oct 17 Posts: 1051 Credit: 16,655,437 RAC: 10,602 |
I'm not sure how you got that task. The linux app should be disabled and not used for Weather@Home. I will check with CPDN. All weather@Home batches should be Windows only. Let it finish. It'll be ok. After fixing a few things in the code, the Linux version of W@H works fine. But before using it for batches, CPDN need to assess the differences in results to the Windows version. --- CPDN Visiting Scientist |
Send message Joined: 14 Sep 08 Posts: 127 Credit: 43,257,301 RAC: 72,605 |
In case you need more data for investigation, I got two for my linux host as well. They are both from older batches but sent a few hours apart. https://www.cpdn.org/workunit.php?wuid=12296077 https://www.cpdn.org/workunit.php?wuid=12300636 |
Send message Joined: 29 Oct 17 Posts: 1051 Credit: 16,655,437 RAC: 10,602 |
CPDN have found the problem. The linux app was deprecated but accidentally got re-enabled when the new wah2-ri v8.32 was installed. It's been deprecated again and should stop any more linux tasks going out. But let the tasks complete normally - there's no need to abort them. --- CPDN Visiting Scientist |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
I will let it finish. It has now completed 13 trickles, a little over 53% complete. Task 22463632 Name wah2_nz25_2296_209805_25_1019_012300899_1 Workunit 12300899 Created 22 Jul 2024, 18:59:57 UTC Sent 22 Jul 2024, 19:00:03 UTC Report deadline 30 Oct 2024, 19:00:03 UTC This work unit looks like this. Mine is the run still in progress. 22463632 1511241 22 Jul 2024, 19:00:03 UTC 30 Oct 2024, 19:00:03 UTC In progress --- --- 10,789.80 Weather At Home 2 (wah2) v8.27 i686-pc-linux-gnu 22456746 1542760 24 Jun 2024, 18:35:02 UTC 22 Jul 2024, 18:59:56 UTC Error while computing |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
It finished for me and completed successfully on Linux, machine 1511241. It failed for my wingman (who got it first) on Windows. 22463632 1511241 22 Jul 2024, 19:00:03 UTC 27 Jul 2024, 23:11:04 UTC Completed 446,355.27 440,722.90 20,729.77 Weather At Home 2 (wah2) v8.27 i686-pc-linux-gnu 22456746 1542760 24 Jun 2024, 18:35:02 UTC 22 Jul 2024, 18:59:56 UTC Error while computing 178,741.50 83,465.81 4,163.15 Weather At Home 2 (wah2) v8.24 windows_intelx86 |
Send message Joined: 29 Oct 17 Posts: 1051 Credit: 16,655,437 RAC: 10,602 |
There was a bug in version 8.24 of WaH which tended to make it crash when the model was restarted. This was fixed in version 8.27 which ran on your linux machine. That's why it worked. Glad to know it completed the task. --- CPDN Visiting Scientist |
Send message Joined: 6 Jul 06 Posts: 147 Credit: 3,615,496 RAC: 420 |
Mine didn't, after the 20th trickle about when it was finishing it then failed on File Transfer Unable to load library wah2_se_8.27_i686-pc-linux-gnu.so dlopen error: libnsl.so.1: cannot open shared object file: No such file or directory I must not have had any 32 bit libraries installed and so it could not find it Have now installed the file it is complaining about, even though I probably wont need it as I wanted to just have 63 bit applications running. Conan |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
Unable to load library wah2_se_8.27_i686-pc-linux-gnu.so On my machine, it does not seem to need libnsl.so. although I do have it on my machine $ locate libnsl.so.1 /usr/lib/libnsl.so.1 /usr/lib64/libnsl.so.1 $ ls -l /usr/lib/libnsl.so.1 /usr/lib64/libnsl.so.1 lrwxrwxrwx. 1 root root 14 Apr 26 14:27 /usr/lib64/libnsl.so.1 -> libnsl-2.28.so lrwxrwxrwx. 1 root root 14 Apr 26 14:26 /usr/lib/libnsl.so.1 -> libnsl-2.28.so [/var/lib/boinc/projects/climateprediction.net] $ ldd wah2_8.27_i686-pc-linux-gnu linux-gate.so.1 (0xf7f52000) libpthread.so.0 => /lib/libpthread.so.0 (0xf7f1c000) libdl.so.2 => /lib/libdl.so.2 (0xf7f17000) libstdc++.so.6 => /lib/libstdc++.so.6 (0xf7d84000) libm.so.6 => /lib/libm.so.6 (0xf7cb2000) libgcc_s.so.1 => /lib/libgcc_s.so.1 (0xf7c95000) libc.so.6 => /lib/libc.so.6 (0xf7aec000) /lib/ld-linux.so.2 (0xf7f54000) |
Send message Joined: 29 Oct 17 Posts: 1051 Credit: 16,655,437 RAC: 10,602 |
The error message is coming from libnsl, it doesn't mean it can't find libnsl. i.e. libnsl can't find the dynamically loaded file: wah2_se_8.27_i686-pc-linux-gnu.so. Unable to load library Unable to load library wah2_se_8.27_i686-pc-linux-gnu.so --- CPDN Visiting Scientist |
Send message Joined: 6 Jul 06 Posts: 147 Credit: 3,615,496 RAC: 420 |
I didn't have libnsl.so.1 on my computer so I have now loaded it in case I need it later, Conan |
Send message Joined: 31 May 18 Posts: 53 Credit: 4,725,987 RAC: 9,174 |
I only run Windows in a VM but have been able to get work in the past. Currently the server simply refuses to send me any work for this instance. I've tried several times, left it over night to sort itself out, reset the project, any number of reboots, and it still simply will not give me any work on the Windows instance. I managed to get a few Linux units on the host machine (OpenIFS work), those that have completed have been successful, but the Windows machine just sits idle. Has something changed in the requirements? Have I not allocated enough RAM (4GB per core, four cores)? I get nothing from the logs in any useful time frame thanks to the hour backoff from the server every time. |
Send message Joined: 15 May 09 Posts: 4541 Credit: 19,039,635 RAC: 18,944 |
I only run Windows in a VM but have been able to get work in the past. Currently the server simply refuses to send me any work for this instance. That you can get OIFS work demonstrates it is not the issue with your residing somewhere that has an IP address that is blacklisted. (We have had a couple of those with the project over the years.) Have you checked the disk space settings? Both for the VM and for BOINC running in the VM? I have more than once when setting up a VM found this to be an issue that has stopped me getting work. - Currently, I am just running tasks using BOINC under WINE though I have a VM set up as well. |
Send message Joined: 31 May 18 Posts: 53 Credit: 4,725,987 RAC: 9,174 |
I've expanded the virtual hdd and added another 60GB to the filesystem so I'll see how that goes. |
Send message Joined: 29 Oct 17 Posts: 1051 Credit: 16,655,437 RAC: 10,602 |
I have increased the disk space requirement for Weather@Home tasks recently as we prepare to roll out a new version of the app which puts all the task files into the slot directory instead of the project directory. So it's quite possible it's a limit on the disk space. But you should see a message in the BOINC messages log that the server is unable to give you a task because there's not enough space? |
Send message Joined: 15 May 09 Posts: 4541 Credit: 19,039,635 RAC: 18,944 |
But you should see a message in the BOINC messages log that the server is unable to give you a task because there's not enough space?Yes, I have always seen that when its a problem. Guessing that isn't it as the machine in question has been in contact with the server since and still has no work. I assume you are not hitting the update button for the project before the one hour back off time after requesting new work? If it isn't that, I am running out of ideas. |
Send message Joined: 31 May 18 Posts: 53 Credit: 4,725,987 RAC: 9,174 |
I'm avoiding hitting the update button precisely because of the backoff issue resetting every time. Haven't seen any specific messages about storage space, and I got no work overnight. This morning I wiped the client, rebooted, deleted all residual files, and installed fresh. I enabled work_fetch_debug 4/08/2024 20:41:27 | | [work_fetch] ------- start work fetch state ------- 4/08/2024 20:41:27 | | [work_fetch] target work buffer: 432000.00 + 432000.00 sec 4/08/2024 20:41:27 | | [work_fetch] --- project states --- 4/08/2024 20:41:27 | climateprediction.net | [work_fetch] REC 0.000 prio 0.000 can't request work: scheduler RPC backoff (2729.13 sec) 4/08/2024 20:41:27 | | [work_fetch] --- state for CPU --- 4/08/2024 20:41:27 | | [work_fetch] shortfall 3456000.00 nidle 4.00 saturated 0.00 busy 0.00 4/08/2024 20:41:27 | climateprediction.net | [work_fetch] share 0.000 4/08/2024 20:41:27 | | [work_fetch] ------- end work fetch state ------- I have the project set to 1000 priority. Previously it was set to 100. I don't recall ever setting this project to zero. |
Send message Joined: 31 May 18 Posts: 53 Credit: 4,725,987 RAC: 9,174 |
Finally got some units. The last thing I did was to set the work cache to 10 and 10 Don't know if that was the winning combination or if someone did something at the server end. |
Send message Joined: 29 Oct 17 Posts: 1051 Credit: 16,655,437 RAC: 10,602 |
I've had to do that before, never understood why. Once the tasks complete I could set it back down to my usual values. Maybe it's something to do with the client having never seen the tasks before and doesn't know how long they take in reality? Definitely wasn't anything at the scheduler end. Not on a Sunday evening! --- CPDN Visiting Scientist |
©2024 cpdn.org