climateprediction.net home page
Posts by old_user596405

Posts by old_user596405

21) Message boards : Number crunching : HadCM3 Full Resolution model low credits (Message 41933)
Posted 8 Apr 2011 by old_user596405
Post:
Have been aware of this since downloading 5 of these 40 year models so this thread prompted me to analyse my own stats data.

Results are derived from a sample of 493 completed and active models in 3 closely performing systems (i7 950, i7 920, Q6600 - all @ 3.20) which run 24/7.

  1. HadAM3P (2 years): average credit per day per model = 908, average CPU days = 2.6, completed credits = 2,082
  2. FAMOUS (200 years): av credit pd = 823, days = 7.8, credits = 6,176
  3. HadAM3P (1 year regional): av credit pd = 637, days = 4.0, credits = 2,244 to 3,006
  4. HadCM3L (80 years): av credit pd = 570, days = 45.0, credits = 26,127
  5. HadSM3 (45 years): av credit pd = 245, days = 9.8, credits = 7,146
  6. HadSM3MH (60 years): av credit pd = 202, days = 12.4, credits = 9,527
  7. HadCM3N (40 years): av credit pd = 163, days = 20.0, credits = 3,250


Clearly credits awarded per model type are inconsistent. People who "like" credits may ignore low payers! HadCM3N is obviously way out of line.

Currently, I'm running 4 HadCM3L models in one machine and 5 of the HadCM3N models in another. The affect on RAC is obvious.
Compare HadCM3N with HadAM3P (regional). In my i7 920, it will take at least 4 times longer to yield a similar credit! :)

Looking at the above table I would suggest that the HadCM3L application delivers a fair credit return. Perhaps other model credits could be adjusted?
There is a precedence. In the past, both the original HadCM3 160 year types as well as HadSM3 (I think) had credits uplifted across the board including historical records.

Footnote: Just noticed that there are no HadCM3N models in the pool. Next up?

22) Message boards : Number crunching : TRICKLE CANNOT UPLOAD, BUT, SERVER SHOWS IT ALREADY HAS (Message 41551)
Posted 28 Jan 2011 by old_user596405
Post:
Thanks for all the hard work, Milo

Seconded :)

But, although my backlog of regional models has completely cleared, there are 16 zip files from Famous models (in 3 machines)
which will not budge - despite trying to force (with "Retry now") or closing and restarting BOINC which sometimes help shift sticky files.

These are numbered from 14 to 20. Assume that these will eventually upload?
23) Message boards : Cafe CPDN : Scotland team (Message 41268)
Posted 12 Dec 2010 by old_user596405
Post:
Claire started on Friday and is using a high-end mobile CPU but no trickles yet from 3 FAMOUS and 1 regional model.
24) Message boards : Cafe CPDN : Scotland team (Message 41111)
Posted 20 Nov 2010 by old_user596405
Post:
iansm wrote:
But did spot David who joined on the 14th.

We also had this David who joined in August.

Wonder if he is the same David starting all over again as the first attempt never got going?
25) Message boards : Cafe CPDN : Scotland team (Message 41108)
Posted 20 Nov 2010 by old_user596405
Post:
Latest new member is Gerry and another today - pinky1 !

Our team leader must be on a recruitment drive. Three in the last few days. :)
26) Message boards : Cafe CPDN : Scotland team (Message 41076)
Posted 18 Nov 2010 by old_user596405
Post:
Hey, while I wasn't paying attention (sorry, just a bit busy!) we've not only passed 50,000,000 credits in CPDN but also acquired a new team member - does anyone know who?


Am very much in hands-off mode these days myself, just letting projects run...and run...! But did spot David who joined on the 14th.
27) Message boards : Number crunching : several crash right after initialization (Message 41007)
Posted 10 Nov 2010 by old_user596405
Post:
Milo had just removed HadCM from the work queue.

Good. Just wasted time watching three going down the pan.
CM3 version 6.05 is also crashing right after starting in Beta.
May be a while before CM3 gets sorted. The scientists will have to wait.
28) Message boards : Number crunching : Beta credits duplicated in two CPDN accounts (Message 40990)
Posted 8 Nov 2010 by old_user596405
Post:
Beta testing credits are being posted to my two CPDN accounts.

Account #1 - active - accidentally created a year ago - current CPDN work + 9 other active projects.
Account #2 - dormant - earlier CPDN work.

This double crediting was triggered as a consequence of restarting Beta testing after an absence of two years.
The earlier credit total was 126,962. Not only has this been duplicated but all new credits as well on a daily basis.
Obviously there are different email addresses for the two accounts.

Very good - 2 for the price of 1. :) Not bothered about credits but my team leader may be happy
that the team is receiving extra free credits!

What did concern me is the following side effect also noticed the day after restarting Beta testing (last Thursday).

All my 10 active projects (ignoring BBC and SAP) were somehow split into two BOINC groups.
A new cross-project id was created and about 5 or 6 projects changed ids. Leaving the original for the remainder.
It has taken me 2 or 3 days to merge them all back into one cross-project id. WCG is a bit of an oddball as
it seems to keep swapping from the designated id to another one, like on a daily basis.

Ok, two issues as a consequence of returning to Beta testing!

Would be concerned if the double credit posting is fixed but cross-project ids get messed up again. :(

Incidentally, although it was possible to change BBC's CP Id to the new shared one (in that project's account page),
it remains detached from the rest (in BOINC and the CPDN list). This was determined earlier this year when my fellow team member, Strathpeffer, had attempted to do similar. Apparently, BBC data is no longer transferred to CPDN.

However, we can no longer access SAP account pages. A few pages are still available (e.g. About). Just wondered if
the SAP database could be accessed again even for a short period? Strangely, SAP is still listed in BOINC as "live"
rather than "retired".
29) Message boards : Number crunching : Credits not updating (Message 40760)
Posted 24 Sep 2010 by old_user596405
Post:
I suppose I have a different take on this credit business. As long as my units run, and whether they complete or crash is immaterial. I look upon it as my tiny contribution to help further the understanding of the underlying science. For me credits are meaningless (you can have my air miles too).

Tend to agree.

Ban credits! If that happened, the projects would lose many participants (i.e. credit hunters) but, at least, those who remained would be revealed as being 100% committed to the project's objectives.

My team has custom stats which counts completed models. Over 4k to date. Am very happy with my own contribution in contrast to the controversial measure of credits!

Also, imo, another more meaningful number is equivalent SM3 years.

Credits are even more controversial when considering combined BOINC stats. Apart from the obvious variations between CPU and GPU projects, there is not even a level playing field between CPU projects. CPDN does not "pay" as well as many other projects. So what?

Anyway, Milo certainly deserves his holiday. The only concern with the lack of cover since Tolu's departure is what would happen if there was another major (or even minor) server problem?



30) Message boards : Number crunching : Iceworld Appeal (Message 40694)
Posted 17 Sep 2010 by old_user596405
Post:
The server maintains a queue of 100 available models. There were probably no slabs in this queue. Milo has, I think, run the transitioner and the problem of no available slabs should now be fixed.

Good show. Now got a batch. Thanks!

Have to ask Iain if he will accept new iceworlds? There will be a few with over 120k available :)
31) Message boards : Number crunching : Iceworld Appeal (Message 40690)
Posted 17 Sep 2010 by old_user596405
Post:
The slab is dead! Long live the slab!

Keep 'em coming.

Three cheers ... but they are not being delivered here.
One machine has one core free and two will be within next 12-18 hours so tried to grab 3 slabs. 2 days buffer. No joy.
Don't wish to try for a Famous as I want a slab!!!

The usual...
17/09/2010 13:08:16	climateprediction.net	Message from server: No work sent
17/09/2010 13:08:16	climateprediction.net	Message from server: No work available for the applications you have selected.  Please check your settings on the web site.

32) Message boards : Number crunching : Upload problem (Message 40653)
Posted 10 Sep 2010 by old_user596405
Post:

Good, learned something new - remote servers (outside Oxford) may cause hold-ups (as servers do) but not reported on the status page.


I've added that one to the status page, although I can't guarantee that the result will always be accurate.

Thanks, Milo.
33) Message boards : Number crunching : Upload problem (Message 40641)
Posted 9 Sep 2010 by old_user596405
Post:
Thanks, Les.

Now located the upload server URL lines in every zip file (in the file_info blocks). Should have known this.

The URL in the "file upload_handler" line (for the stuck task) does not appear to be included in the server status page.

viz. <url> http://boinc1.coas.oregonstate.edu/cpdn_cgi_main/file_upload_handler </url>

Did further searches using the "file upload_handler" string. Examples found:

<url>http://cpdn-upload1.comlab.ox.ac.uk/cgi-bin/file_upload_handler</url>
<url>http://uploader.oerc.ox.ac.uk/cpdn_cgi/file_upload_handler</url>
<url>http://uploader1.atm.ox.ac.uk/cpdn_cgi/file_upload_handler</url>
<url>http://climateapps1.oucs.ox.ac.uk/cgi-bin/file_upload_handler</url>

That's 4 from the 7 upload servers listed. All Oxford servers.

So zip files may go elsewhere then. Directly to the customer.

Next, I entered the entire URL in the browser and got this
<data_server_reply>
    <status>1</status>
    <message>no command</message>
</data_server_reply>

which may suggest the server is "awake"?

Then forced a file transfer...and it's going again...now finished. 3 cheers. Another one bites the dust.

Good, learned something new - remote servers (outside Oxford) may cause hold-ups (as servers do) but not reported on the status page.

Thanks again for pointing me to the server url lines.
34) Message boards : Number crunching : Upload problem (Message 40638)
Posted 9 Sep 2010 by old_user596405
Post:
A server is out of disk space - when trying to upload final zip file #13 for task "v5q2". Error first appeared 7 hours ago.

But note that an intermediate zip file (#3) uploaded ok for task "v5me" presumably to a different server.

09/09/2010 07:29:04	climateprediction.net	Sending scheduler request: To send trickle-up message.
09/09/2010 07:29:04	climateprediction.net	Not reporting or requesting tasks
09/09/2010 07:29:05	climateprediction.net	Started upload of hadam3p_pnw_v5me_1998_1_006723343_0_3.zip
09/09/2010 07:29:05	climateprediction.net	Scheduler request completed
09/09/2010 07:30:04	climateprediction.net	Finished upload of hadam3p_pnw_v5me_1998_1_006723343_0_3.zip
09/09/2010 07:31:42	climateprediction.net	Started upload of hadam3p_pnw_v5q2_1985_1_006681048_0_13.zip
09/09/2010 07:31:43	climateprediction.net	[error] Error reported by file upload server: Server is out of disk space
09/09/2010 07:31:43	climateprediction.net	Temporarily failed upload of hadam3p_pnw_v5q2_1985_1_006681048_0_13.zip: transient upload error
09/09/2010 07:31:43	climateprediction.net	Backing off 2 hr 25 min 55 sec on upload of hadam3p_pnw_v5q2_1985_1_006681048_0_13.zip

Is the target upload server name recorded somewhere in the BOINC files?

Pity there is no way to suspend transfers only without having to disable the network. Even better, suspending individual files so active servers can still receive other files (as in above example).
Server status all green (otherwise would have known which server is down!).


35) Message boards : Number crunching : Regional HADAM3P Failures (Message 40622)
Posted 8 Sep 2010 by old_user596405
Post:
These regional models don't generate the same THETA and PRESSURE errors as some FAMOUS and should almost always complete successfully. Unfortunately yours is the only model given out from that workunit so you can't see whether another computer with Intel and Windows has completed it. (Usually one can only compare directly with machines that have the same CPU type and OS.)

-161 appears to be a Boinc error. Here's what it means. And here's Signal 11. In your case I don't think those explanations help much; we only know that something went wrong. But it does look as if something happened.

The computer is processing pretty fast. 1.7 sec/TS for that model. My quad 6600 has two European models running alongside two FAMOUS; they're running at 2.51 sec/TS. If you do get more regional crashes I'd do the stability tests and, as you suggest, start taking backups. If you restore a backup because the task crashed and the restore gets past the failure point you do know for sure that the problem lies within the computer.


Many thanks for info. Ok, so we should not expect model issues with these regional ones. Thought so (else we would have been advised!).

As stated, this machine has a good track record. The OC level has been well tuned and tested some time ago but I suppose it is worth doing occasional checks. Even recently dusted out the system when replacing the PSU. Temps are fine - same as always. GPUGRid is running in a modest, cool, low power card and for many months without any hiccups.

Am puzzled about the 1.7 s/TS though? Including the failure, this machine has completed or is still running 8 other models. s/TS ranges from 1.86 to 2.26 (the latter a pair of PNW versions). As seen in trickle pages. My other quad running at the stock 2.4 has just started its first regional model. Running at 2.8 s/TS.

Back to backups then. Hate losing models. At least, unlike FAMOUS, it will be worth taking backups for regional models.

Thanks again.
36) Message boards : Number crunching : Regional HADAM3P Failures (Message 40620)
Posted 8 Sep 2010 by old_user596405
Post:
Are we expecting random crashes with the new models - like FAMOUS negative thetas?

Got first failure last night. This one (11871074) crashed after 4 trickles (> 33%).

Failed on a slighty overclocked Q6600 / 4gb / Win 7 64 system (probably with the best record for completions) that had already finished 4 assorted regional models.

It would have been interesting to see if a restore would have been successful but I no longer bother taking backups for today's shorter model types. If any more fail will start backing up again just to find out.

Apart from anticipated FAMOUS thetas and SM3 iceballs, the only other failures experienced over past year or so would have been caused by occasional power supply problems.

Just curious.
37) Message boards : Number crunching : Computer wasting multiple models (Message 40429)
Posted 26 Aug 2010 by old_user596405
Post:
This member is in our team - we like to see newcomers getting smoothly off the mark! When posting the computer id earlier, had forgotten that our leader gets email addresses. Subsequently the member has now been emailed with the BOINC exit warning as a first suggestion.

Hopefully we shall get a response from the newcomer soon but will keep monitoring anyway. Will update this thread if we can help the member resolve the issue.

Meantime, thanks anyway for your and the team's support!
38) Message boards : Number crunching : Computer wasting multiple models (Message 40426)
Posted 26 Aug 2010 by old_user596405
Post:
It's a laptop, and I think that error 1 is: turned off the computer without first exiting from BOINC. Possibly: closed the lid and hibernated.


Maybe depends on hibernation settings. Just tested closing the lid. No issue (Win XP). But pulling the plug would almost certainly mess up the models.

Noted that the member appears to be running Malaria in rotation (being a single core CPU) and has also lost WUs there with exit 1. Did complete two WUs though!

Times of last trickle for each of the 4 crashed CPDN tasks are at random times during the day.




39) Message boards : Number crunching : Computer wasting multiple models (Message 40422)
Posted 26 Aug 2010 by old_user596405
Post:
This newcomer may need assistance. Started on the 21st but has lost the first 4 models after just a few trickles. All with the general exit code 1.

Computer Id = 1095430
40) Message boards : Number crunching : Lost BBC Credits (Message 40373)
Posted 15 Aug 2010 by old_user596405
Post:
When experimenting with various ways of trying to reattach my own BBC account to our team (now with some success),
discovered something else about BOINC combined stats that I'd never noticed in the past.

A member who belongs to two or more teams in different projects will only appear in ONE team's combined table.

We have 11 from 81 CPDN members (with credits) who are excluded from our team's BOINC combined stats
but do appear in one other team's stats. Conversely, multi-team members included in our stats don't appear elsewhere.

Our leading CPDN member (with over 5 million credits) is the most significant example.
His one other project is SETI and is listed under another team's combined stats. Also confusing is the fact that
a member's combined total appears in the designated team page - i.e. in this example, the total credits for
both CPDN (our team) + SETI (other team) rather than what is actually "owned" by the team.

Perhaps the first project that gets attached to a team remains as the only one for combined stats? e.g. SETI is older than CPDN.

Another example of erroneous / misleading stats is with CPDN itself. For over 6 weeks, we have had a shortfall of over 2 million
(the exact number never varies) in the team total (as seen in the team account home page and top teams)
- i.e. adding up members credits comes to 2 million more. This has happened in the past but eventually got sorted (a lot
quicker than in 6 weeks). Thinking the variation was related to member movements, checked this as we had a few in June
/ early July but got no conclusion. One member rejoined us on 5th July with 2.9m. His old team's CPDN totalnow has a
surplus of 1.6m. With only 4 members with credits, it is blatantly obvious!


Previous 20 · Next 20

©2024 climateprediction.net