climateprediction.net (CPDN) home page
Thread 'CPDN website issues due to webcrawlers from AI LLM development'

Thread 'CPDN website issues due to webcrawlers from AI LLM development'

Message boards : Number crunching : CPDN website issues due to webcrawlers from AI LLM development
Message board moderation

To post messages, you must log in.

AuthorMessage
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1074
Credit: 17,020,946
RAC: 5,160
Message 72008 - Posted: 28 Apr 2025, 11:24:15 UTC

Some people have noticed that the CPDN website has repeatedly gone down over recent months. This is being caused by excessive requests from companies/sites crawling CPDN webpages and accessing task results. We have traced some of the requests to sites building Large Language Models. The problem has been ongoing since last year and although CPDN have taken steps to reduce it, it's a persistent problem that's in part due to the open nature of the boinc server and also the very large number of batches in the CPDN database.

The database gets overloaded because the boinc server allows for non-login users to access the task results. The URL route is via Community -> Teams -> Members and then if a team member on the that list has their computers as 'public', anyone can then access the tasks completed. Since looking at a task result page involves a database query to retrieve the information, the database can get overloaded with excessive requests. This is the same on the other boinc projects we've looked at, it's not particular to CPDN.

Unfortunately these crawlers do not respect the 'robots.txt' file which is what websites put in place to limit what crawlers can access.

Blocking IP ranges only partially worked as the requests are using random IP addresses from a very large range.

CPDN will archive old batches to reduce the size of the database which will help mitigate this but the problem is essentially a BOINC one.

There's an ongoing conversation with BOINC developers about tackling this.
---
CPDN Visiting Scientist
ID: 72008 · Report as offensive     Reply Quote

Message boards : Number crunching : CPDN website issues due to webcrawlers from AI LLM development

©2025 cpdn.org