mNo edit summary |
mNo edit summary |
||
Line 21: | Line 21: | ||
SetEnv rate-limit 1 | SetEnv rate-limit 1 | ||
Header set Content-Type "text/plain" | Header set Content-Type "text/plain" | ||
</Location> | </Location> | ||
Line 28: | Line 27: | ||
SetEnv rate-limit 1 | SetEnv rate-limit 1 | ||
Header set Content-Type "text/plain" | Header set Content-Type "text/plain" | ||
</Location> | </Location> | ||
</syntaxhighlight>Does it work? Yes! | </syntaxhighlight>Does it work? Yes! |
Revision as of 11:13, 12 February 2025
Written on 12/02/2025
Recently I turned on Apache2 access logs to have a look at traffic to my web server (of course privacy-friendly, no IP logging, etc.). Turns out, a lot of it is bot traffic. Besides having to ban Amazonbot, GPTBot and Bots from Bytedance due to hammering the site or ignoring the robots.txt file, I noticed a lot of access requests to .env files. So far 800 GET requests in the last week were to some kind of .env file.
.env files usually contain configuration parameters for certain web apps and should always be made inaccessible, else one risks the leaking of sensitive data. My web services do not use .env files.
Instead of letting those bots run into 404 or banning them when they try to open the file, I thought of something else: How about I give them a useless file? Why not make it huge? And can we throttle the download speed?
And so that's what I did: I created a 1 GB sized file with content from /dev/urandom (don't want to go larger to clog up my backups) and then added a rewrite for common subdirectories or variations of the .env file, together with a rate limit of 1 KB/s. This is what it looks like when trying to download it in Firefox:
Later I also redirected any file that lays in a .git subdirectory. These are the rules I implemented on the web server:
RewriteCond %{REQUEST_URI} ^.*/\.env[._]?[A-Za-z0-9]*$ [NC]
RewriteRule ^(.*)/\.env[._]?[A-Za-z0-9]*$ /.env [L]
RewriteCond %{REQUEST_URI} ^.*/\.git/.*$ [NC]
RewriteRule ^(.*)/\.git/.*$ /.env [L]
<Location ~ "/\.git/.*$">
SetOutputFilter RATE_LIMIT
SetEnv rate-limit 1
Header set Content-Type "text/plain"
</Location>
<Location ~ "/\.env[._]?[A-Za-z0-9]*$">
SetOutputFilter RATE_LIMIT
SetEnv rate-limit 1
Header set Content-Type "text/plain"
</Location>
Does it work? Yes! Looking at the apache2 status page the day after, I found multiple bots trying to slowly download the .env file. Some disconnected at some point, but one host remains persistent, even with two connections:
Srv PID Acc M CPU SS Req Dur Conn Child Slot Client Protocol VHost Request
0-0 687110 1/126/126 W 8.98 18945 0 4220404 0.0 7.79 7.79 174.138.xx.xx http/1.1 web2.ad.etlam.eu:80 GET /.env HTTP/1.1
1-0 687111 1/79/79 W 22.27 66418 0 25159 0.0 3.14 3.14 174.138.xx.xx http/1.1 web2.ad.etlam.eu:80 GET /.env HTTP/1.1
1-0 687111 1/175/175 W 57.99 377 0 31681 0.0 4.57 4.57 93.237.xx.xx http/1.1 etlam.eu:80 GET /.env HTTP/1.1
The connection duration can be seen under the SS (seconds since) column. Here, the longest so far has been 18 hours. So only 272 hours to go until you finish downloading 1 GB of random data, blocking one of your scraping threads for a while :-)