Blocking Bots For Development Instances – MDOQ

Since some crawlers don't listen to robots.txt along with the growth of AI crawlers which seem to have their own rules. It is becoming more common to need to block specific crawlers / bots from development or staging instances.

On MDOQ this is relatively simple.

Create the file: mdoq/nginx/generated/blocking.conf


map $http_host $isProduction {
    default 1;
    ~*(?i)(mdoq\.io) 0;
    ~*(?i)(mdoq\.dev) 0;
}

map $http_user_agent $badBot {
    default 0;
    ~*(?i)(claudebot) 1;
}

map $isProduction$badBot $block {
    default 0;
    01 1;
}

In this configuration we are blocking just claudebot, you can simply extend this by adding more lines underneath.

Create the file: mdoq/nginx/templates/default_https.conf
```
if ($block = 1) {
    return 444 "Blocked";
}
```
Add both files to source control, you may need to use git add -f mdoq/nginx/generated/blocking.conf as this is an excluded directory
Sync Nginx component

This will fix the development instance you are on, you can check its working by tailing the nginx logs and looking for a 444 response code.

Once you're happy you can follow the normal deployment process (zero downtime).

Once this code is in your "main" source control branch all new instances will inherit it.

NB although this configuration should effect production, after doing your release it would be worth syncing Nginx on production, to ensure there are no conflicts. Please reach out to support if you are unsure.

Related to

Related articles