Pages

How Search Engines Work Exactly?

"Search Engine" is the term is often used generically to describe both crawler-based search engines and human-powered directories. These two kinds of search engines collect data in radically different ways.
 
Crawler-Based Search Engines:

Crawler-based search engines, like Google, create their listings automatically. They "crawl" or "spider" the web, then people search through what they have found.

If you change your web pages, crawler-based search engines eventually find these changes, and that can affect how you are listed. Page titles, body copy and other elements all play a role.


Human-Powered Directories:

Human-powered directory, such as the Open Directory, depends on humans for its listings. You submit a short description to the directory for your web site, or editors write one for sites they review. A search looks for matches only in the descriptions submitted.

Changing your web pages has no effect on your listing. Things that are useful for improving a listing with a search engine have nothing to do with improving a listing in a directory. The only exception is that a good site, with good content, might be more likely to get reviewed for free than a poor site.


"Hybrid Search Engines" Or Mixed Results

In earlier days, it used to be that a search engine either presented crawler-based results or human-powered listings. Now, it extremely common for both types of results to be presented. Usually, a hybrid search engine will favor one type of listings over another. For example, MSN Search is more likely to present human-powered listings from Look Smart. However, it does also present crawler-based results (as provided by Inktomi), especially for more obscure queries.


The Parts of a Crawler-Based Search Engine

Crawler-based search engines have three major elements. First is the spider, also called the crawler. The spiders visits a web page and reads it, and then follows links to other pages within the site. This is what it means when someone refers to a site being "spidered" or "crawled." The spider returns to the site on a regular basis, such as every month or two, to look for changes.

Everything the spider finds goes into the second part of the search engine, the index. The index, sometimes called the catalog, is like a giant book containing a copy of every web page that the spider finds. If a web page changes, then this book is updated with new information.

Sometimes it can take a while for new pages or changes that the spider finds to be added to the index. Thus, a web page may have been "spidered" but not yet "indexed." Until it is indexed -- added to the index -- it is not available to those searching with the search engine.

Search engine software is the third part of a search engine. This is the program that sifts through the millions of pages recorded in the index to find matches to a search and rank them in order of what it believes is most relevant. You can learn more about how search engine software ranks web pages on the aptly-named How Search Engines Rank Web Pages page.


Major Search Engines: Same But Different

Crawler-based search engines have the basic parts described above, but there are differences in how these parts are tuned. That is why the same search on different search engines often produces different results. Some of the significant differences between the major crawler-based search engines are summarized on the Search Engine features Page. Information on this page has been drawn from the help pages of each search engine, along with knowledge gained from articles, reviews, books, independent research, tips from others and additional information received directly from the various search engines.

Now let's look more about how crawler-based search engine rank the listings that they gather

What is the Difference in top search engine ranking criteria?

The most recent Internet size estimate is 1 billion pages - and growing. 85-90% of all Internet users rely on search engines to locate sites, but only 7% of them look past the first three pages of search results. Those top slots are valuable and competition for them is intense.

Let's look at several ways to move your site to the top.
Content

You've heard it before, but we can't stress it enough: the three most important factors in your search engine rank are: content, content, content.

Good content is critical to a good search engine score because many elements of search engine algorithms rely on page content to score Web sites. It also increases the probability that Yahoo or other popular directories will list your site.

Good content is vital. It's fundamental to every legitimate search engine strategy.

Link Popularity

Common optimization techniques (TITLE tags, META tags, and keyword frequency) are important because most search engines rely on them to score pages. Automated tools can help you simplify this task and leave you free to focus on another increasingly popular ranking strategy: link popularity (the total number of Web sites that link to yours). This technique requires no additional coding - just old-fashioned networking. Content is still critical since your site must contain valuable information that other sites want to share with their visitors.

Search engines determine your link popularity score by counting the number of outside links to your site (your internal page links don't count). Some use more complex algorithms that consider link importance - they rank the importance of the links and calculate a weighted link popularity score. Sites linked to "important" sites are more likely to be ranked higher. For instance, if Web Developer's Journal were to link to your site, that link could be worth more than 20 links from your friends' personal Web pages. In fact, it may be worth more since some search engines refuse to include links to free sites (like Geocities homepages), because spammers can use them to set up bogus links.

Many search engines are giving link popularity greater weight in their algorithms because they believe it indicates quality. After all, other sites are most likely to link to a site that displays good content, design, and usability. Google relies heavily on link popularity to rank sites. Other search engines factor it into their algorithms.

Look at how some of the largest search engines use link popularity:

Search Engine: Link Popularity

AltaVista: Uses link analysis and ranks sites based on "good" link popularity. Tends to ignore links generated through "link exchange" programs.

Excite: Uses link popularity and quality data to determine relevancy.

Inktomi: Link popularity is one ranking criteria.

GO: Link popularity is one ranking criteria.

Google: Uses weighted link popularity and analyzes link content almost exclusively to determine site rankings. Recently partnered with Yahoo - the largest directory.

Infoseek: Link popularity is considered in the new retrieval algorithm.

Site rankings based on link popularity impose huge penalties on new sites that haven't accumulated many links. This is where schmoozing counts. When you contact webmasters, offer to link to their site in return for a link and remind them how important link popularity can be to their overall ranking. While you're building links, remember to pay close attention to your HTML tags, keywords, and content. Until you have a large number of "good" links, those basic techniques are your best bet to improve your ranking.

Avoid Spam and HTML "Tricks"

As part of their continuing battle against spammers, many search engines have tightened their site eligibility policies. AltaVista recently instituted one of the most restrictive in the industry, banning sites for one or more of the following reasons:

· Using a hosting service that also hosts adult sites or documented spammers.

· Improper use of Gateway pages - also called Doorway or Jump pages.

· Submitting the same URL repeatedly or a large number of URLs from the same site.

· Excessive keyword repetition.

· Inserting keywords unrelated to the page's content.

· Hidden text.

· META refresh commands set to less than 30 seconds.

The first two items may surprise you. Most beginning webmasters look for a Web host based on cost first, then speed and reliability, when their provider's policy on adult sites may be just as important. AltaVista sometimes retaliates against adult sites' spam techniques by blocking those sites' underlying IP addresses entirely (as does GO.com). If you host with the same provider, your site may share that banned IP address. Choose your host carefully: you are judged by the company you keep!

Ask the Experts

You spend months tuning your Web site to achieve high rankings, and then have it drop. Or no matter what you try, your site never gets a good ranking. Do you know why? If you don't have the time or expertise to ferret out the reasons yourself, consider paying for expert advice.

Thousands of consultants are eager to advise you about all aspects of the Internet. You can even hire a Good consultant to advise you on which consultant to hire!

Good consultants supply focused, personalized service, but many businesses can't afford the expense. Webmasters for smaller sites often find less expensive automated search engine tools to be an efficient way to tune their Web sites. This requires a more do-it-yourself approach. While a consultant might personally optimize your page code and content, most automated tools require you to make the changes yourself.

You can purchase or subscribe to tools that provide a full suite of search engine and page optimization services. The tools are simple to use and give you advice that is easy to understand and implement on your own. Some companies sell software packages for your PC that analyze your pages and monitor your search engine rankings, while others offer similar tools online. Net Mechanic’s Search Engine Power Pack is an online tool that tracks your site's ranking and provides keyword assistance to improve your position.

Expert advice can come at a high price, but it doesn't have to. Get the best value for your money by carefully researching your options and evaluating your requirements. If you need immediate, reliable advice, a well-designed online tool may be your most cost-effective investment.

Constant Monitoring Is Crucial

Search engine ranking strategy is an ongoing process that begins during design and never stops. You may spend more time tweaking your site than you spent designing it! At a minimum, you must:

· Monitor your site's rankings by keyword on a weekly basis.

· Experiment with keyword and page content modifications.

· Be alert to changes in search engine policies and requirements. Today's legal design technique may be spam tomorrow if search engine policies change.

If you depend on search engines to deliver traffic to your Web site, then a high search engine ranking is critical to your success. Think you can't afford to spend the time and effort it takes to get there? You can't afford not to.