How do search engines operate and process?

All of the major search engines have a list of vital functions that allow them to grant appropriate search results when a user enters a search in the search box.

There are 4 major processes that the search engines have to do, they are:

  • Crawl
  • Index
  • Processing Requests
  • Ranking

What is Crawl or Crawling?

The search engines run computerised programs that use your websites architecture and structure to find out what information you have on your website. This process is called crawling. The computerised programs are also known as ‘bots and ‘spiders’. So you may here the expression ‘my website was spidered today‘ or ‘the search engine bots visited my site recently‘.

You must also bear in mind the above sounds very simple. But imagine how many websites are out there? One of the recent additions to the search engine market place is ‘Cuil’, which claims to have over 200 billion web pages stored at there data centres. So can you imagine what Google may process? I would say a lot more than that.

What is Index or Indexing?

Once a webpage has been crawled it then goes to the process of being indexed, but that is if it has pasted a number of peripherals first which are set by the search engines. Let say Google finds a page on your website which has been copied from someone elses website, this will not get indexed as it will be classed as duplicate content. Why would you need to rank the same page twice? This is why I said earlier in this guide, that content is very important on a website.

Once your page(s) on your website have been indexed it will be stored on one of the many data centres that the big search engines own. As you can see from the example above about duplicate content, the data centers have to be well managed with a very complex computer system. These computer systems are probably the most powerful computers in the world being able to process your query (the search term you enter into the search box) and in most cases can bring back over 100 million results in under half a second.

What is Processing Requests?

When a user types in a search term (for example – SEO Nottingham) or search phrases in to one of the many search engines, a request is sent to the major data centres for it to be processed, which in turn will then send that request back to you with the most relevant search results based around your search term or search phrase.

A match for your search is made if the words or phrases you searched for are found on a webpage, so if you did a search for ‘blue box’, for example, it wouldn’t return results aimed at yellow cars. So the computer system will determine this for you to bringing back the best results.

What is Ranking?

Once the search engines have crawled, indexed and processed the many millions of websites and web pages available it needs to rank all of the results relevant to your search into the best order possible. This is done through a mathematical process called an ‘algorithm‘. Using Google as an example, the algorithm will look at over two hundred different factors based on your search term and the most relevant pages matched to that term. The algorithm will then sort these results into the most relevant for you, the user of the search engines can then determine which is the best web page or website for you regarding the information you may require.

Overview of how SE’s Operate and Process

Now you have a basic understanding of how search engines operate and process, can you imagine how many millions of searches are done each second of the day all through out the world? This is why Google, Yahoo and MSN are regarded as the best search engines in the world when it comes to performing all of the above tasks.

Credit: Binary image from SXC.hu

Post written by ,

Blog | Cookies | Sitemap | Twitter | LinkedIn

© Dave Cain