climateprediction.net home page
Posts by PDW

Posts by PDW

1) Message boards : Number crunching : What does "Didn't need" mean on work-unit status webpage? (Message 68197)
Posted 4 Feb 2023 by Profile PDW
Post:
Usually "Didn't need" is because a valid result(s) has been received back by the server and it marks unstarted and therefore unwanted tasks as "Didn't need".

It was created 21 Dec 2022, 9:57:22 UTC so I'm guessing it never made it out to a host for processing given the problems around that time.
Why it didn't escape when networking problems got resolved is a mystery but since it went past its 1 month deadline it has a new chance to create life.
2) Message boards : Number crunching : w/u failed at the 89th zip file (Message 68104)
Posted 29 Jan 2023 by Profile PDW
Post:
Well BOINC code comments say:

// If we already found a finish file, abort the app;
// it must be hung somewhere in boinc_finish();

I do find this comment thought-provoking:

// process is still there 5 min after it wrote finish file.
// abort the job
// Note: actually we should treat it as successful.
// But this would be tricky.
3) Message boards : Number crunching : How to Prevent OpenIFS Download (Message 67999)
Posted 23 Jan 2023 by Profile PDW
Post:
Is it valid to set <max_concurrent> for each of the OpenIFS apps to 0 (I can set it to 1 and try again once the hadam4 has completed)? The official manual doesn't say.
0 is used to indicate no limit, so will try and run as many as <project_max_concurrent> allows or as many as the client thinks it can run if that isn't set (or is also set to 0).
4) Questions and Answers : Unix/Linux : Help requested - using new hard disk under Linux Mint 21 [SOLVED] (Message 67908)
Posted 19 Jan 2023 by Profile PDW
Post:
I was using the UUID of the underlying hardware, when I should have been using the UUID of the formatted partition.
What was the UUID if it wasn't "174E3A15-CC2D-47DE-8C5A-AB698A1E8AAF" ?
5) Message boards : Number crunching : The uploads are stuck (Message 67802)
Posted 17 Jan 2023 by Profile PDW
Post:
As you didn't make an effort to change the second OS drive to look different from the first when you installed Boinc it came up with the same (or possibly very similar) identifier that it defined for that new host. When the host was attached to CPDN it was recognised as the same host that you had been using, resulting in abandonment of the old results. Much like running multiple clients on the same disk without using the allow_multiple_clients flag in cc_config.xml.
6) Message boards : Number crunching : The uploads are stuck (Message 67795)
Posted 17 Jan 2023 by Profile PDW
Post:
I think the problem is that CPDN is treating 2 hosts, as a single host.
eg: L-7113-1 and L-7113-2 are two different hosts. But CPDN see's just 1 host. If I swap out the hard drive, all CPDN does is change the hostname of this device, rather than see it as a seperate device (host).
The link you give is just to show ONLY in progress tasks, this link shows all tasks for that host: https://www.cpdn.org/results.php?hostid=1535374
There is a server setting that doesn't allow multiple clients. The way you have your 2 drives setup means BOINC sees them as the same so when you swap them over the tasks will get abandoned as shown in your full list.
This should work - it's equivalent to running two clients on the same machine, and just shutting one down whilst the 2nd drive is in the machine. It's perfectly possible to run 2 clients on the same host for CPDN (I do it), but there must be two separate client ids. CPDN's server does not see the mac address, only your external (router) IP.

To swap out the disks you'd need to have created a new client instance on the 2nd disk, whilst keeping the original client on the first disk without detaching from the project. That way, CPDN's server will see two client, one for each disk and that should work. If each disk's boinc client datadir has the same client id (check the 'client_state.xml' file) then I suspect you'll get the behaviour you describe.

I didn't know how CPDN was using the setting, I do know there is one, I wasn't going to try running multiple clients to test it before posting.

As I said, "The way you have your 2 drives setup means BOINC sees them as the same" so ncoded could change their setup to make it work if you say it works for you.
7) Message boards : Number crunching : The uploads are stuck (Message 67790)
Posted 17 Jan 2023 by Profile PDW
Post:
Dave, Richard, et al..

Can I ask, where have these completed tasks gone?

https://www.cpdn.org/results.php?hostid=1535374&offset=0&show_names=0&state=1&appid=

It says In-Progress but most (if not all) of these have already been completed, uploaded, and reported?

eg I just uploaded and reported one task just a few minutes ago which I downloaded 15 hours ago, but there is nothing showing in the list, just 'in-progress'.

I think the problem is that CPDN is treating 2 hosts, as a single host.

eg: L-7113-1 and L-7113-2 are two different hosts. But CPDN see's just 1 host. If I swap out the hard drive, all CPDN does is change the hostname of this device, rather than see it as a seperate device (host).

The two disks are completely separate. Both have a full install of Ubuntu and BOINC. Only one drive is inserted into the server at any one time.

I have ran this server on many BOINC projects, at different times with each drive, and all projects (except CPDN) see it as 2 different hosts.

The link you give is just to show ONLY in progress tasks, this link shows all tasks for that host: https://www.cpdn.org/results.php?hostid=1535374
There is a server setting that doesn't allow multiple clients. The way you have your 2 drives setup means BOINC sees them as the same so when you swap them over the tasks will get abandoned as shown in your full list.

Go and try GPUGrid, that does not allow multiple clients and their results will be abandoned if you swap your drives over whilst tasks are still active on the drive you swap out.
8) Message boards : Number crunching : no credit awarded? (Message 67759)
Posted 15 Jan 2023 by Profile PDW
Post:
If they have a credit script running once a week and not using the default Boinc process do they also run a RAC script to update the RAC as well ? If so, it might need a kick.
9) Message boards : Number crunching : The uploads are stuck (Message 67570)
Posted 11 Jan 2023 by Profile PDW
Post:
All mine have uploaded this afternoon and new ZIP files are disappearing and not being held up.
10) Questions and Answers : Unix/Linux : Help requested - using new hard disk under Linux Mint 21 [SOLVED] (Message 67434)
Posted 8 Jan 2023 by Profile PDW
Post:
The 42Gb free is what's left to use on the ADATA drive.
If the fstab mount request worked then everything created/appearing below /hdd (or /wibble) would be using the space on the new SSD.
It's like a symlink that points to somewhere else, in this case a completely separate drive.
11) Questions and Answers : Unix/Linux : Help requested - using new hard disk under Linux Mint 21 [SOLVED] (Message 67432)
Posted 8 Jan 2023 by Profile PDW
Post:
At the moment, with the new hardware present but not active, the machine looks like this:
<snipped>
Now I look at it, that needs some work, but I need to get a clearer understanding of what belongs to what. I can't find the label 'hdd' anywhere, and I have no idea where 'File system' lives - but it's got lots of bytes in/on it.

The 'hdd' isn't a label, it is a directory you created in the root of the ADATA drive (using the 'sudo mkdir /hdd', you could make another directory called for example wibble, 'sudo mkdir /wibble').
I'm assuming it is still there if your restoration session just involved editing the ftsab file.
The fstab entry is using that /hdd (or /wibble) as the mount point where all the disk space and files can be put/found.

Good to know that the drive is found and works when you mount it manually.
12) Questions and Answers : Unix/Linux : Help requested - using new hard disk under Linux Mint 21 [SOLVED] (Message 67430)
Posted 8 Jan 2023 by Profile PDW
Post:
I'm assuming you read all the comments on the page as you are using the UUID format.

When did it brick the system when you did the mount command or when you rebooted ?
Did you check the fstab file by doing "sudo mount -av" before rebooting ?

What are the permissions and ownership set to for the /hdd directory ?
13) Questions and Answers : Unix/Linux : Help requested - using new hard disk under Linux Mint 21 [SOLVED] (Message 67428)
Posted 8 Jan 2023 by Profile PDW
Post:
From that command in the link you are making a directory called hdd at the top level so if you cd to / you'd see everything in the top level including your hdd directory.
You can call it ssd if you wanted to, the /hdd isn't part of the command just the name (of the mount point) to be used.
14) Questions and Answers : Unix/Linux : Help requested - using new hard disk under Linux Mint 21 [SOLVED] (Message 67425)
Posted 8 Jan 2023 by Profile PDW
Post:
Mount point /hdd already created and got the right permissions ?
15) Message boards : Number crunching : Tasks failing on Ubuntu 22 (Message 67408)
Posted 6 Jan 2023 by Profile PDW
Post:
Slot 14, so yes.
Okay :)
Glenn has his answer.
16) Message boards : Number crunching : Tasks failing on Ubuntu 22 (Message 67405)
Posted 6 Jan 2023 by Profile PDW
Post:
Ok, just ls through all slots until I found CPDN.
There's one srf* file in the slot.

That's good, you only want 1 per slot, the latest one, older ones take up space.

Was it in a slot with a number >9 ?
17) Message boards : Number crunching : Tasks failing on Ubuntu 22 (Message 67403)
Posted 6 Jan 2023 by Profile PDW
Post:
You have 1 or more tasks running at the moment ?
18) Message boards : Number crunching : Tasks failing on Ubuntu 22 (Message 67401)
Posted 6 Jan 2023 by Profile PDW
Post:
Try
sudo ls -l ?/ | grep srf
19) Message boards : Number crunching : OpenIFS tasks : make sure boinc client option 'Leave non-GPU tasks in memory' is selected! (Message 67357)
Posted 5 Jan 2023 by Profile PDW
Post:
Reporting certainly, whether it was actually checkpointing as it should if it was supposed to I don't know.
How often? If it's once per second, it's a false report. Real checkpoints happen every few minutes.
Every second. In this instance it should have been every 10 hours.
20) Message boards : Number crunching : OpenIFS tasks : make sure boinc client option 'Leave non-GPU tasks in memory' is selected! (Message 67354)
Posted 5 Jan 2023 by Profile PDW
Post:
Reporting certainly, whether it was actually checkpointing as it should if it was supposed to I don't know.


Next 20

©2024 climateprediction.net