home page
Posts by charles

Posts by charles

1) Questions and Answers : Unix/Linux : help debugging (Message 63982)
Posted 23 May 2021 by charles
There's nothing wrong with the models.
It's the way that you're trying to run them.

So you'll just have to keep letting them fail.

Well agree to disagree. That's the always same response from dev that can't debug correctly their app. And as a dev myself I know that pretty well since I need devs from my team to turn 7 times their tongue in their mouth before to say something that stupid. And I'm sorry to say this not diplomatically but this is just the reality of the area and there are plenty of articles about it from people having actually done some thinking about it. Including some research papers about it like from Edwin Zaccai.

Anyway since,

have all successfully completed. And as they've run for more than 10 days each, same environment, same workload, same phenomenons about the pausing of the CPU, and different upgrades along the way, this is the actual proof that there were at least something wrong with the tasks before and so maybe something wrong with the model if tasks are all created the same way. This means that there is some inconsistencies that the model doesn't handle correctly and conclude to an error.
2) Questions and Answers : Unix/Linux : help debugging (Message 63957)
Posted 7 May 2021 by charles
1. If it was libs, then they'd fail at about 6 seconds.

2. You have far too many interruptions to each model.
In the "computing prefs on your account, set: Suspend when non-BOINC CPU usage is above to 100%.

3. The n216 models like LOTS of L3 cache. This was discussed early last year somewhere.
Minimum seems to be 4 Megs per model.

can't do this unfortunately. I'm not using my hosts for amusement and debugging boinc models as I've discussed it last a zillion times in the past like this on other projects is not solving anything because that just does show a lack of programming skills about the model.
3) Questions and Answers : Unix/Linux : help debugging (Message 63956)
Posted 7 May 2021 by charles
I'm sure you would. But as you've seen and as you've stated, the tasks didn't just crashed in the 5 first seconds, it was way more long than that and resulted in computational error all of a sudden.
What I meant by the team should make them make a run, they should run it through an industrial debugger and let it run and so you would have the informations we need.
From my experience we usually don't have enough informations in the logs of the tasks themselves or boinc logs. It's the same problem from LHC@home or others that have been running into computational error. an industrial debugger used like in this. Then we would exactly know what would the task have run as a function before dying and which variables it hitted. That's what I meant.
4) Questions and Answers : Unix/Linux : help debugging (Message 63953)
Posted 7 May 2021 by charles
Well since it s the only thing that fails, and I use a lot of others tasks which takes a lot of cpu usage too and I don't have any other complaint from those.
If I'm not mistaken those taks has never used GPU so that can't be it.
If i had RAM errors I would have already a lot of problems with other prrojects like worldcommunitygrid, LHC or einstein.
self statement for the cpu
And the task that has been running for 4 days now would not be able to continue under the same conditions. hadam4h_21f0_209905_5_903_012081295
So if it was hardware I assume in confidence I would have other symptoms.
So even if the others lacked the libraries, I would maybe be more inclined to believe that there is a problem with the task itself .
To be certain the team should do a run of the same tasks
5) Questions and Answers : Unix/Linux : help debugging (Message 63938)
Posted 5 May 2021 by charles
Thanks for the tip.
Your question is going to be a tough one to answer because I always have virtual machines turned on. Or some browsers or maybe other stuffs, so it depends really. For the little I know the segmentation fault is often due to a pointer not pointing to a place in memory owned by the program itself, so I was going to ask more about a compatibility issue with another soft or anything alike. And also I have always 3 tasks from boinc running on this machine.
6) Questions and Answers : Unix/Linux : help debugging (Message 63935)
Posted 5 May 2021 by charles

I've recently been happily surprised with the update of your website platform which is finally up to date which was not the case in 2018. So I wanted to start some tasks under my fedora 34 but apparently it all ends up in error at some point after several days on computation in general:

Signal 15 received: Software termination signal from kill 
Signal 15 received: Abnormal termination triggered by abort call
Signal 15 received, exiting...
SIGSEGV: segmentation violation
Stack trace (21 frames):

I've installed the libraries recommended for fedora of course. Soo tell me what I should look for.
Best regards