How are Search Engine

May 3, 2009 by  
Filed under Search Engine

In the Internet, hundreds of millions of pages of information. The problem is how to find exactly the information you need. The time when people were talking at conferences and give each other links to interesting sites. Of course, such an exchange is still there, but it is difficult to identify effective to search for specific information. It is the need to quickly find the necessary information (preferably more than one source) and has created projects of search engines.
Search engines – a web-based services on the Internet, created to assist the user in finding information stored on various sites.

Did you know that:
The search engine in English: Search Engine (SE). The most close analogue in the Russian language – a search “engine.” It is also very frequently used word – “search engine”.

Different search engines work in different ways, but there are major tasks, which solve all the search engines:
Crawl many sites in the network and make up the index contained in the information (site index)
They allow users to search for words and combinations of words in its index.
Indexing information online

Before search engines give you the information you request, they must first find that information. Of course, they do not crawl the entire Internet if you enter each request. That would be too wasteful, and very long.

Instead, search engines create a database on all the pages in the network, and searches in the database. Of course, it is much faster than a lot of time to search for all sites. How do search engines that fill its database (usually a database called the index of the search engine and the inclusion of a particular site in the index is called indexing the site).

Each search engine has a special program – the robot, which indexes sites. This program is called a spider (spider), and the indexing process – crawl spider (spider crawling). And, really, when you think, the process resembles the spider crawl on various sites and collecting information from them (indexing).

Did you know that:
Search spiders are often called – bots. A search bot is its name, so you can distinguish between bots is this search engine. This name is usually displayed in the User-Agent request to the server. For example, the Google bot has a name – Googlebot, and Yandex – Yandex. These names Web master may use, for example, in the file robots.txt (I talk about it in another article), to prohibit a specific search engine to index some pages.
Some web programmers to create different pages for different search bots. For example, when entering from Yandex bot to a page that is issued to one page, a bot for Google – the other. Even worse, if a search bot is issued to one page, but for the user – the other. This is dishonest tricks, and if search engines know about it, it is likely that this site does exclude from the search engine index. (Exclusion from the index is called BAN).

How do spiders begin their journey on a network?
Usually, the search bots are beginning their journey with the most popular and frequently visited sites and pages on the web. They index the words on the page and then follow all links from this page and other pages on the same site. Thus the search bot pretty quickly scan the most widely used network resources.
Crawl page

Now consider what happens when a search bot came to a certain page and started it scanning.

Search engine is a list of words that are present on the page and put these words in its database, with some weights. These factors will then influence the position of the page to issue a search on the word or phrase that includes the word.

Different search engines use a different system of weights for the words on the page. In addition, search engines typically do not reveal the principle of charging “weights” for the web masters are not artificially inflated rating site.

However, there are a few general points that are likely present in all the search engines when calculating the “weight” of speech.
The word present in the page header (tag title), will receive greater weight than the same word within the text on the page.
The word present in the tags meta, add the weight of the page. However, as the contents of the tag does not display the user, there is a temptation to “push” them as many different words. Therefore, it was felt that the current search engines are giving content of these tags are less and less attention.
The word in headlines and subheads (tags H1, H2, etc.) has a higher weight.
The word highlighted in one way or another (such as bold (tag B) or in italics (Tag I), is likely more “valuable” to a search engine (no wonder you made it).
Previously prevail opinion that if the word is located in the first 20 rows on the page, it is more “valuable” to a search engine. I do not think that now it has any value. Although, who knows?
The word, whose shape corresponds entered in the query string, obviously carries more weight than other forms of the word. For example, if a user entered “Elephant”, the word “elephant” on the page will be valued more than the word “elephant.”
There is a notion of “the weight of words on a page.” This attitude of the number of repetitions of the words on the page to the total number of words per page. Previously, it was a significant factor for search engines. Today, however, search engines are giving him less attention, because you can create a page filled with the same word, which obviously will have a “weight” is close to 1. In other words, this parameter can be easily lifted into the hands of dishonest webmaster.

There are many different techniques as possible to raise the “weight”, the words on the page, but as far as search engines account for a given parameter, just did not know anybody.

Rather, each search engine has created its own system for calculating “weight” of words on the page, which is based both on the parameters listed above, and on the other, unique to each search value. As I said, the system will calculate the “weight” is kept in the strictest secrecy. Of course, these systems are susceptible to changes in search engines. Periodically, the strategy of calculating adjusted, introducing new parameters, change the old ones.
Issuance of search results

Now is not difficult to imagine what happens when a user enters a query to the search engine in the search bar.

Search engine searches its database. Finds pages that match a user’s query, and displays them in order of relevance. Compliance with the request is called – the relevance of the page (page relevancy). Relevance is determined by certain algorithms, partially described above. It is here, and apply all the “weight” and the coefficients of certain search bot indexing the page.

Everything becomes more complicated if the user has introduced not one but two or more words. Here come into play other factors. For example, how close are the words on the page relative to each other. It is obvious that the closer the search terms in the text, the relevant page (more in line with demand).

SEO-wise, and others experienced in the promotion of people who read up to this point, has apparently возмущаются: How so? But what about external factors affecting the issuance of the search engines? Of course, I will not leave them unattended. Just before that moment, I have described, the influence of internal factors only to the issuance of the pages in search engines. Now it’s time to mention about the popularity of the page.

As has been said that the above is called the internal factors affecting the position of pages in the issuance of a search engine. There are some external factors, which are no less (and often even more) important than internal.

Each search engine has its own parameter, which determines the popularity of the page to other participants in the Internet. Different search engines and it is defined in different ways, but its function is quite simple:
The more popular site with other members of the network, the higher its chances to appear in the issuance of a search engine.

It’s really logical. If the site is referred to many other sites, most likely, they posted the required quality and content. Although it is not always the case. This is often wins projects with a good budget. In promotion of the site, bought a lot of links from other sites, raising the popularity and good values.

I will not dwell in this article describes how to determine the popularity of a site search engine. This is a separate question about which I will discuss another time.
Conclusions

This article describes the basic principles of search engines. We describe the various factors that affect the issuance of sites in search results. I hope that this paper will encourage the reader to further research in the field of search engine optimization.

Speak Your Mind

Tell us what you're thinking...
and oh, if you want a pic to show with your comment, go get a gravatar!

Spam Protection by WP-SpamFree

Bad Behavior has blocked 1386 access attempts in the last 7 days.