climateprediction.net home page
Posts by Belfry

Posts by Belfry

41) Questions and Answers : Windows : Can't Re-Attach To Project In BOINC (Message 46524)
Posted 26 Jun 2013 by Belfry
Post:
Nice.
42) Message boards : climateprediction.net Science : Misconfiguration e-mail (Message 46444)
Posted 18 Jun 2013 by Belfry
Post:
This user's AMD machines error-out all the time with a variety of stderr messages.

He/she's our #2 RAC contributor. Would be nice if a few of those machines were producing better science.
43) Message boards : Cafe CPDN : Team Minnesota stuff! (Message 46295)
Posted 28 May 2013 by Belfry
Post:
Welcome tlsh0 and congratulations bearcatrp on 1,000,000!

Unfortunately with the loss of the PHP board I can't add crazy GIF's and excessive emoticons. =:(
44) Questions and Answers : Wish list : Smaller Work Units (Message 46204)
Posted 13 May 2013 by Belfry
Post:
It is viable to have smaller work units without comprising the integrity of the data.... Setting up the modeling in this way would increase the server traffic by the factor proportional to the decrease in work unit time (maybe a factor of 100 would be ideal), which might put a strain on the server hardware.



Hi Gene, welcome to the forum. I think model integrity and network traffic are less of a concern than the increased time it would take to complete since many pieces would end up with unstable and frequently turned-off machines. Anyway, since the time we both joined hadcm3's have been halved and hadam3's divided by three. And with newer processors turning hadcm3's around in one to two weeks at stock clocks this has become less of an issue for many users.
45) Questions and Answers : Preferences : CPDN hogging disk (Message 46107)
Posted 29 Apr 2013 by Belfry
Post:

...
It'll need to:
Check that each climate model is nowhere near a checkpoint.

I don't think this is necessary, although it's nice to pause them for electricty considerations. With eight or more CPDN tasks running something is always near checkpointing.
46) Questions and Answers : Preferences : CPDN hogging disk (Message 46106)
Posted 29 Apr 2013 by Belfry
Post:
When I crunched on Windows I used UK_Nick's vbs backup (unfortunately I can't link to climateprediction.net), which nimbly waited for each task to checkpoint, paused them, then stopped BOINC. Of course it didn't do anything with crashed models.

Here's my Linux Bash backup script. It won't wait for checkpoints, but when combined with a Cron job like "30 10 * * * PATH=$PATH:/sbin /etc/scripts/boinc-backup.sh" it will make a backup every fourth day at 10:30 AM.

#!/bin/bash

/bin/pidof boinc > /dev/null 2>&1 || exit

[ $(( ($(date +%s)/86400) % 4 )) -ne 0 ] && exit  # run only every fourth day
find /var/lib/boinc-backup/*.tgz -mtime +5 | xargs rm  # delete 2nd to last backup

/etc/init.d/boinc-client stop

nice -n 1 tar -czpf /var/lib/boinc-backup/boinc-client_backup-$(date +%d%m%Y).tgz -C /var/lib/boinc-client $(ls /var/lib/boinc-client)

/etc/init.d/boinc-client start
47) Questions and Answers : Preferences : CPDN hogging disk (Message 46104)
Posted 29 Apr 2013 by Belfry
Post:
As I understand it, hadcm3n's most often process through errors and then crash at 25, 50, 75 and 100% points. So stopping times alone won't be enough to tell interrupted tasks from buggy ones. In any case some of the functionality you ask for already happens at the project level: crashed tasks simply get reissued to different computers, with the general hope that hardware and OS diversity increases the chances for completion.
48) Questions and Answers : Preferences : CPDN hogging disk (Message 46091)
Posted 28 Apr 2013 by Belfry
Post:
Automatic backups would require some way of discriminating between host-machine causes and buggy parameters/models. Some tasks are just born to crash, and we're seeing this right now with some recent hadcm3n's.
49) Message boards : Number crunching : Task ID Details Not Available? (Message 46072)
Posted 27 Apr 2013 by Belfry
Post:
It is at least possible to see the trickles:

http://climateapps2.oerc.ox.ac.uk/cpdnboinc/trickle.php?resultid=[task id here]
50) Message boards : Number crunching : Sound playback choppy with all cores crunching. (Message 45720)
Posted 24 Mar 2013 by Belfry
Post:
I don't think memory is the problem: I run four CPDN tasks on 4GB without issue (with an integrated GPU). Is this Clarkdale, Sandy, or Ivy? What's the disribution?

Maybe 2D acceleration is the culprit. Try enabling SNA (it'll work on Clarkdale). Here's a couple of links:
https://wiki.archlinux.org/index.php/Intel_Graphics
http://www.webupd8.org/2012/10/how-to-enable-intel-sna-acceleration-in.html

This really shouldn't be happening on modern hardware, but Linux drivers can be tricky to get right.
51) Message boards : Number crunching : Sound playback choppy with all cores crunching. (Message 45703)
Posted 23 Mar 2013 by Belfry
Post:
Does it only happen with internet content? What about a local video file? How about just an audio file?
52) Message boards : Number crunching : Sound playback choppy with all cores crunching. (Message 45690)
Posted 22 Mar 2013 by Belfry
Post:
... I assume that is what I am running it as.


Open a terminal, type top [enter] and read the "NI" column, [q] to quit. But yes, I believe as long as you're starting via the script it should be 19.

If you're already using Pulseaudio you might want to check if any other applications (like Skype) are configured to use the sound card via Alsa.

Hope you're able to get it working. I can say my AMD laptop and desktop machines with 2.6 kernels and Pulseaudio produce great sound with BOINC tasking all cores. I'll be trying kernels 3.2 and 3.5 soon and I'll report back if anything sounds funny.
53) Message boards : Number crunching : Sound playback choppy with all cores crunching. (Message 45682)
Posted 22 Mar 2013 by Belfry
Post:
Hi Dave, just to throw out a quick guess--is BOINC running with nice=19?

Is BOINC installed via the distribution? Is it a laptop or desktop?
54) Questions and Answers : Unix/Linux : Running on an Opteron processor (Message 45513)
Posted 28 Jan 2013 by Belfry
Post:
I don't think any compatibility libraries are needed...


I think the confusion arises because someone installs a new operating system and then soon after installs something that pulls in 32-bit libraries--like Adobe Reader. Then they install BOINC and everything runs splendidly. "Who are these geeks who wail about 32-bit libraries?", they think. But the person who installs BOINC right after a new install will crash all 32-bit tasks until the cows come home. Later the two meet in a bar and get in fist-fight over it; worse they're Americans with concealed-carry permits (yes, there are states which allow guns in bars).

At least with distribution packages the BOINC people should make the 32-bit libraries a dependency. Lives may be at stake.
55) Questions and Answers : Unix/Linux : Running on an Opteron processor (Message 45507)
Posted 27 Jan 2013 by Belfry
Post:
That's perfectly normal, as there is no work right now on the server: link. The work comes out sporadically. If you leave network activity enabled in BOINC you may catch a reissued task.

As to why you're receiving some message saying your machine isn't suitable--I can't see why: your processors and memory all seem up to the standards. Maybe check to see if you have the 32-bit compatiblity libraries installed: link.
56) Message boards : Number crunching : hadcm3n affecting other projects, computer crash if running a long time (Message 45448)
Posted 12 Jan 2013 by Belfry
Post:
Definitely a permissions problem. Those commands I wrote might help.

Joe, just how are you starting BOINC? Installations through package managers generally require root to start-up via the /etc/init.d/boinc-client script. In Ubuntu, "sudo /etc/init.d/boinc-client start" works. I'm not sure about Mageia (although it should be Red Hat-like: "su - ; systemctl start boinc-client").
57) Message boards : climateprediction.net Science : Misconfiguration e-mail (Message 45446)
Posted 12 Jan 2013 by Belfry
Post:
Mac: 1192040. Maybe that Mac + BOINC 6.12 issue again?
58) Message boards : Number crunching : hadcm3n affecting other projects, computer crash if running a long time (Message 45438)
Posted 11 Jan 2013 by Belfry
Post:
Hi Joe, from the links to your other projects it looks like you're still crashing tasks, even though no CPDN tasks are on your machine. Is the partition containing the BOINC data directory mounted with restrictive options or ACL's? Can you run 2 threads of mprime for 24 hours?

You can try these commands (suspend BOINC network activity, shutdown BOINC and backup the data directory; run as root and replace italics with particulars from your install):

cd path-to-boinc-data-directory
chown -R boinc-user:boinc-user ./
chmod 755 ./
find . -type d -exec chmod 0771 {} \;


If this doesn't help then you might try detaching and reattaching to each project.

Good luck.
59) Message boards : Number crunching : hadcm3n affecting other projects, computer crash if running a long time (Message 45418)
Posted 7 Jan 2013 by Belfry
Post:
I don't mean two projects running at the same time would cause the problem, rather look at the BOINC memory settings and/or work fetch queing.

Math in one project won't interfere with that of another--unless said math is overheating your processor. I looked at some of your other projects' tasks, and they seem to error on file accesses. You should see if the user running BOINC has full read/write file permissions throughout the BOINC data directory.
60) Message boards : Number crunching : hadcm3n affecting other projects, computer crash if running a long time (Message 45408)
Posted 1 Jan 2013 by Belfry
Post:
Hi Joe, My hadcm3n's run fine alongside WCG and Rosetta on Ubuntu. Since your machine runs two tasks at a time, you should check the "leave applications in memory while suspended" box in the advanced memory settings of BOINC Manager. Another thing that might be happening is you're downloading too many of the other projects' tasks and they end up running high priority, which can end up competing with file managers and other applications for disk access. I turn off work fetching for CPDN and set my work buffer to two days when I fetch work for WCG and Rosetta.


Previous 20 · Next 20

©2024 climateprediction.net