Since some crawlers don't listen to robots.txt along with the growth of AI crawlers which seem to have their own rules. It is becoming more common to need to block specific crawlers / bots from development or staging instances.
On MDOQ this is relatively simple.
Create the file:
mdoq/nginx/generated/blocking.confmap $http_host $isProduction { default 1; ~*(?i)(mdoq\.io) 0; ~*(?i)(mdoq\.dev) 0; } map $http_user_agent $badBot { default 0; ~*(?i)(claudebot) 1; } map $isProduction$badBot $block { default 0; 01 1; }In this configuration we are blocking just claudebot, you can simply extend this by adding more lines underneath.
Create the file:
mdoq/nginx/templates/default_https.confif ($block = 1) { return 444 "Blocked"; }- Add both files to source control, you may need to use
git add -f mdoq/nginx/generated/blocking.confas this is an excluded directory - Sync Nginx component
This will fix the development instance you are on, you can check its working by tailing the nginx logs and looking for a 444 response code.
Once you're happy you can follow the normal deployment process (zero downtime).
Once this code is in your "main" source control branch all new instances will inherit it.
NB although this configuration should effect production, after doing your release it would be worth syncing Nginx on production, to ensure there are no conflicts. Please reach out to support if you are unsure.