Analyse your HTTP response headers
Pour vérifier la politique de sécurité des entêtes d'un site.
— Permalink
Pour vérifier la politique de sécurité des entêtes d'un site.
— Permalink
A list of AI agents and robots to block. Contribute to ai-robots-txt/ai.robots.txt development by creating an account on GitHub.
Pour ma part, j'ai rajouté cela dans le conf-enabled/security.conf de mon apache :
# https://raw.githubusercontent.com/ai-robots-txt/ai.robots.txt/refs/heads/main/.htaccess
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^.*(AI2Bot|Ai2Bot-Dolma|Amazonbot|anthropic-ai|Applebot|Applebot-Extended|Bytespider|CCBot|ChatGPT-User|Claude-Web|ClaudeBot|cohere-ai|cohere-training-data-crawler|Crawlspace|Diffbot|DuckAssistBot|FacebookBot|FriendlyCrawler|Google-Extended|GoogleOther|GoogleOther-Image|GoogleOther-Video|GPTBot|iaskspider/2.0|ICC-Crawler|ImagesiftBot|img2dataset|ISSCyberRiskCrawler|Kangaroo\ Bot|Meta-ExternalAgent|Meta-ExternalFetcher|OAI-SearchBot|omgili|omgilibot|PanguBot|PerplexityBot|PetalBot|Scrapy|SemrushBot|Sidetrade\ indexer\ bot|Timpibot|VelenPublicWebCrawler|Webzio-Extended|YouBot).*$ [NC]
RewriteRule .* - [F,L]
After comparing some notes around the Internet I've came up with this "catch-all" for in Apache. So not just one vhost, anything will catch it. Keeping the configs clean and simple.
Put the following in the conf-enabled directory (Debian based systems):
Alias /.well-known/acme-challenge/ "/var/www/html/.well-known/acme-challenge/"
<Directory "/var/www/html/">
Options None
AllowOverride None
ForceType text/plain
RedirectMatch 404 "^(?!/\.well-known/acme-challenge/[\w-]{43}$)"
</Directory>
Enable it with a2enconf, reload the Apache service. Make sure the directory /var/www/html/.well-known/acme-challenge/ is created and owned by the Apache data user, e.g. www-data. It can be any directory, as long as you keep it consistent.
Then run this command:
certbot certonly --webroot --agree-tos --email youradmin@example.com --webroot-path /var/www/html/ --domain yoursite.example.com
Pour tester la charge supporté par un serveur.
— Permalink
Bon, c'est toujours mieux que rien : une conf Apache pour "tenter" de bloquer les Bots d'IA sur son/ses site/s. Je dis bien tenter, parce que beaucoup ignorent les robots.txt et/ou ne donnent pas leur véritable user agent…