geoseo.net

Search Engine - Program - World Wide Web - Web Directory - Google - Yahoo - FTP - Microsoft - Ask.com - Search Engine Optimization - Webmaster -

Search Engine

History - Search Engine SEO History - SEO How Search Engines work Relationship between SEO & the Search Engines

Geoseo Resources

             
 

 

How Search engines work

search engine

A search engine operates, in the following order

  1. Web crawling
  2. Indexing
  3. Searching

Web search engines work by storing information about a large number of web pages, which they retrieve from the WWW itself. These pages are retrieved by a Web crawler (sometimes also known as a spider) — an automated Web browser which follows every link it sees. Exclusions can be made by the use of robots.txt. The contents of each page are then analyzed to determine how it should be indexed (for example, words are extracted from the titles, headings, or special fields called meta tags). Data about Web pages is stored in an index database for use in later queries. Some search engines, such as Google, store all or part of the source page (referred to as a cache) as well as information about the web pages, whereas some store every word of every page it finds, such as AltaVista. This cached page always holds the actual search text since it is the one that was actually indexed, so it can be very useful when the content of the current page has been updated and the search terms are no longer in it. This problem might be considered to be a mild form of linkrot, and Google's handling of it increases usability by satisfying user expectations that the search terms will be on the returned Web page. This satisfies the principle of least astonishment since the user normally expects the search terms to be on the returned pages. Increased search relevance makes these cached pages very useful, even beyond the fact that they may contain data that may no longer be available elsewhere.

When a user comes to the search engine and makes a query, typically by giving key words, the engine looks up the index and provides a listing of best-matching web pages according to its criteria, usually with a short summary containing the document's title and sometimes parts of the text. Most search engines support the use of the boolean terms AND, OR and NOT to further specify the search query. An advanced feature is proximity search, which allows you to define the distance between keywords.

The usefulness of a search engine depends on the relevance of the result set it gives back. While there may be millions of Web pages that include a particular word or phrase, some pages may be more relevant, popular, or authoritative than others. Most search engines employ methods to rank the results to provide the "best" results first. How a search engine decides which pages are the best matches, and what order the results should be shown in, varies widely from one engine to another. The methods also change over time as Internet usage changes and new techniques evolve.

Most Web search engines are commercial ventures supported by advertising revenue and, as a result, some employ the controversial practice of allowing advertisers to pay money to have their listings ranked higher in search results.

The vast majority of search engines are run by private companies using proprietary algorithms and closed databases, the most popular currently being Google, MSN Search, and Yahoo! Search. However, Open source search engine technology does exist, such as ht://Dig, Nutch, Senas, Egothor, OpenFTS, DataparkSearch and many others.

Challenges faced by search engines

  • The Web is growing much faster than any present-technology search engine can possibly index. In 2006, some users found major search-engines became slower to index new Web pages.
  • Many Web pages are updated frequently, which forces the search engine to revisit them periodically.
  • The queries one can make are currently limited to searching for key words, which may result in many false positives, especially using the default page-wide search. Better results might be achieved by using a proximity-search option with a search-bracket to limit matches within a paragraph or phrase, rather than matching random words scattered across large pages. Another alternative is using human operators to do the researching for the user with organic search engines.
  • Dynamically generated sites may be slow or difficult to index, or may result in excessive results, perhaps generating 500 times more Web pages than average. Example: for a dynamic Web page which changes content based on entries inserted from a database, a search-engine might be requested to index 50,000 static Web pages for 50,000 different parameter values passed to that dynamic Web page.
  • Many dynamically generated Web sites are not indexable by search engines; this phenomenon is known as the invisible web.
  • Some search-engines do not rank results by relevance, but by the amount of money the matching Web sites pay.
  • In 2006, hundreds of generated Web sites used tricks to manipulate a search-engine to display them in the higher results for numerous keywords. This can lead to some search results being polluted with linkspam or bait-and-switch pages which contain little or no information about the matching phrases. The more relevant Web pages are pushed further down in the results list, perhaps by 500 entries or more.
 

Search Engines - how does it work?

Web search engines work by storing information about a large number of web pages, which they retrieve from the WWW itself. These pages are retrieved by a Web crawler— an automated Web browser which follows every link it sees.

More >>>

 

Some images compliments of morguefile.com Text from wikipedia.org