Deep Web
The terms Deep Web, Hidden Web, Invisible web and Deep Net describe the portion of the World Wide Web that is not visible to the public or has not been indexed by the search engines. Some portions of the deep web consist of dynamic pages accessible only via a form or submitted query. Web pages that are not linked to other pages are also part of the deep web. They are, in effect, invisible; search engine crawlers will not be able to find them since they have no backlinks or inbound links.
Sites that require registration prior to access can also be considered part of the deep net. These sites block the search engine spiders from browsing and indexing their web pages through protocols such as the Robots Exclusion Standard. Furthermore, pages created by Flash and JavaScript, scripted content as well as non-text content or non-HTML file formats in Usenet archives such as PDF and DOC documents are indexed only by some search engines. This makes them part of the Hidden Web.
Crawler Limitations
A search engine’s web crawler uses hyperlinks to uncover and index content found on the Web. This tactic is ineffective in a search of deep web resources. For instance, search engine crawlers do not look for dynamic web pages that result from queries of databases because there are may be a lot of possible results.
New Innovations
These limitations are, however, being overcome by the new search engine crawlers (like Pipl) being designed today. These new crawlers are designed to identify, interact and retrieve information from deep web resources and searchable databases. Google, for example, has developed the mod oai and Sitemap Protocol in order to increase results from deep web searches of web servers. These new developments will allow the web servers to automatically show the URLs that they can access to search engines.
Another solution that is being developed by several search engines like Alacra, Northern Light and CloserLookSearch are specialty search engines that focus only in particular topics or subject areas. This would allow the search engines to narrow their search and make a more in-depth search of the deep web by querying password-protected and dynamic databases.
Deep web or Surface Web
The challenge that researchers in this field face is related to the classification of resources. The area between the surface web and the deep web is a gray area. There are sites that appear to be indexed by search engines but are actually found not by conventional web crawlers but by OAIster, mod_oai or sitemap protocol. Other examples are pages that are found in the surface web but are not yet found by web crawlers.
The research being done in this field of computer science today will be able to provide Internet users more access to the deep web data as well as more meaningful results for their searches. Researchers are currently looking for a way to classify and categorize search results by topics and according to the users’ needs.
- Multi-Search Engines
A multi-search engine combines many search engines into one unified search engine. It is a fast and comprehensive way of casting a wide net on the Word Wide Web. There are thousands of search engines on the Internet. Although some are quite good, none of them is comprehensive. Some may use a small database to [...]...
- How to Solve “Cannot Copy the Path is Too Deep” with My USB Drive
If the error “Cannot Copy the Path is Too Deep” is displayed while a USB drive is being used, it is possible that there is a problem accessing the data on the drive. This is not an uncommon problem and it occurs especially when a USB storage device is used on different computers. A well [...]...
- SERPs
SERPs are “Search Engine Result Pages” If a search engine user types in a search for “rubber baby buggy bumpers” and your web site is the first result, then you can be said to have the #1 SERP for “rubber baby buggy bumpers.” The whole purpose of Search Engine Optimization (SEO) is cause web pages [...]...
- Scraper Site
A scraper site is a website that displays no original or usable information. This site is usually automated and its content automatically updated by bots crawling all over the web. All the content showed in a scraper site is taken without permission from other open-content websites and their webmasters. Unlike search engines, a scraper site [...]...
- Google Bomb
A Google bomb is when a group of people attempt to cause a target web page to become the #1 search result for a chosen search phrase. Every good Search Engine Optimization specialist tries to make his web pages become the #1 search result for the relevant search terms. A Google bomb is different, because [...]...




