How to Prevent Downloading of Your Entire Website
Preventing Web Site Downloading Using robots.txt
The first step is to disallow the downloading programs in your robots.txt file. To do this, you will need to define which bad robots you wish to disallow.
Disallowing bad programs in robots.txt does not prevent all web site downloading, because many bad programs simply ignore the contents of robots.txt and do what they want to do.
Preventing Web Site Downloading Using User Agent Blocking in httpd.conf
Another method is to exclude the downloading programs user agent in httpd.conf.
Add every agent you wish to exclude to httpd.conf:
SetEnvIfNoCase User-Agent ^Httrack keep_away
SetEnvIfNoCase User-Agent ^Offline Explorer keep_away
SetEnvIfNoCase User-Agent ^psbot keep_away
SetEnvIfNoCase User-Agent ^Teleport keep_away
SetEnvIfNoCase User-Agent ^WebCopier keep_away
SetEnvIfNoCase User-Agent ^WebReaper keep_away
SetEnvIfNoCase User-Agent ^Webstripper keep_awayOrder Allow,Deny
Allow from all
Deny from env=keep_away
User agent blocking also does not prevent all web site downloading, because the user can delete his user agent or spoof it to appear to be Internet Explorer or another common browser.
Preventing Web Site Downloading Using User Agent Blocking in PHP
If the content you are attempting to protect is in PHP, you may be interested in the user agent blocking technique described in Deny Spambots and Prevent Email Harvesting.
- robots.txt
robots.txt is a text file which can be used to restrict web robots to accessing your web site only in ways of which you approve. This robots.txt file blocks Google's Imagebot from the entire web site: User-agent: Googlebot-Image Disallow: / For more information on robots.txt, read A Standard for Robot Exclusion. Check the Syntax of [...]...
- How to Prevent Hotlinking
Hotlinking, also known as deep linking, occurs when one web site links directly to graphics files on another web site. The site which serves the images pays for the bandwidth. The site which links to the images gets free content for its users. Many webmasters object to being hotlinked to, but hotlinking is difficult to [...]...
- How to Track Website Visitors
If you have a website or blog, collecting data on visitors to your site is extremely important. Whether you have a business or just develop websites or blogs as a hobby, data from your visitors can be extremely helpful in tweaking and fine-tuning your site. The more data you can collect from visitors, the more [...]...
- How to Prevent Caching of Your Web Page
How to Prevent Caching The web documents, media and other web resources that are retrieved by a web browser are often saved, or cached, locally on the user’s hard drive. Caching reduces load times when a user browses the Internet by reducing the amount of new data that needs to be transferred. The next time [...]...
- Should I Use WWW in my site name?
Some users type “www.topbits.com” into their web browsers when they want to reach this web page. Other users type in only “topbits.com”, leaving off the “www” portion. On a technical level, the two names do not refer to the same domain object. Google, and other search engines, often see these two objects as seperate web [...]...




