Anatomy of a Search Engine

By now you probably have a fuzzy picture of how a search engine works. But there’s much more to it than just the basic overview you’ve seen so far. In fact, search engines have several parts. Unfortunately, it’s rare that you find an explanation for just how a search engine is made — and that information is vitally important to succeeding with search engine optimization (SEO).

Query interface
The query interface is what most people are familiar with, and it’s probably what comes to mind when you hear the term “search engine.” The query interface is the page that users see when they navigate to a search engine to enter a search term.

There was a time when the search engine interface looked very much like the google.com. The interface was a simple page with a search box and a button to activate the search.

Today, many search engines on the Web have added much more personalized content in an attempt to capitalize on the real estate available to them. For example, Yahoo! Search, allows users to personalize their pages with a free e-mail account, weather information, news, sports, and many other elements designed to make users want to return to that site to conduct their web searches.

One other option users have for customizing the interfaces of their search engines is a capability like the one Google offers. The Google search engine has a customizable interface to which users can add different gadgets. These gadgets allow users to add features to their customized Google search home that meet their own personal needs or tastes.

When it comes to search engine optimization, Google’s user interface offers the most ability for you to reach your target audience, because it does more than just optimize your site for search; if there is a useful tool or feature available on your site, you can allow users to have access to this tool or feature through the Application Programming Interface (API) made available by Google. This allows you to have your name in front of users on a daily basis.

For example, a company called PDF24.org has a Google gadget that allows users to turn their documents into PDF files, right from their Google home page once the gadget has been added. If the point of search engine optimization is ultimately to get your name in front of as many people as possible, as often as possible, then making a gadget available for addition to Google’s personalized home page can only further that goal.

Crawlers, spiders, and robots

The query interface is the only part of a search engine that the user ever sees. Every other part of the search engine is behind the scenes, out of view of the people who use it every day. That doesn’t mean it’s not important, however. In fact, what’s in the back end is the most important part of the search engine.

If you’ve spent any time on the Internet, you may have heard a little about spiders, crawlers, and robots. These little creatures are programs that literally crawl around the Web, cataloging data so that it can be searched. In the most basic sense all three programs — crawlers, spiders, and robots — are essentially the same. They all “collect” information about each and every web URL.

This information is then cataloged according to the URL on which they’re located and are stored in a database. Then, when a user uses a search engine to locate something on the Web, the references in the database are searched and the search results are returned.

Databases
Every search engine contains or is connected to a system of databases, where data about each URL on the Web (collected by crawlers, spiders, or robots) is stored. These databases are massive storage areas that contain multiple data points about each URL.

The data might be arranged in any number of different ways, and will be ranked according to a method of ranking and retrieval that is usually proprietary to the company that owns the search engine.

Search algorithms

All of the parts of the search engine are important, but the search algorithm is the cog that makes everything work. It might be more accurate to say that the search algorithm is the foundation on which everything else is built. How a search engine works is based on the search algorithm, or the way that data is discovered by the user.

In very general terms, a search algorithm is a problem-solving procedure that takes a problem, evaluates a number of possible answers, and then returns the solution to that problem. A search algorithm for a search engine takes the problem (the word or phrase being searched for), sifts through a database that contains cataloged keywords and the URLs those words are related to, and then returns pages that contain the word or phrase that was searched for, either in the body of the page or in a URL that points to the page.

This neat little trick is accomplished differently according to the algorithm that’s being used. There are several classifications of search algorithms, and each search engine uses algorithms that are slightly different. That’s why a search for one word or phrase will yield different results from different search engines. Some of the most common types of search algorithms include the following:

  • List search: A list search algorithm searches through specified data looking for a single key. The data is searched in a very linear, list-style method. The result of a list search is usually a single element, which means that searching through billions of web sites could be very time-consuming, but would yield a smaller search result.
  • Tree search: Envision a tree in your mind. Now, examine that tree either from the roots out or from the leaves in. This is how a tree search algorithm works. The algorithm searches a data set from the broadest to the most narrow, or from the most narrow to the broadest. Data sets are like trees; a single piece of data can branch to many other pieces of data, and this is very much how the Web is set up. Tree searches, then, are more useful when conducting searches on the Web, although they are not the only searches that can be successful.
  • SQL search: One of the difficulties with a tree search is that it’s conducted in a hierarchical manner, meaning it’s conducted from one point to another, according to the ranking of the data being searched. A SQL (pronounced See-Quel) search allows data to be searched in a non-hierarchical manner, which means that data can be searched from any subset of data.
  • Informed search: An informed search algorithm looks for a specific answer to a specific problem in a tree-like data set. The informed search, despite its name, is not always the best choice for web searches because of the general nature of the answers being sought. Instead, informed search is better used for specific queries in specific data sets.
  • Adversarial search: An adversarial search algorithm looks for all possible solutions to a problem, much like finding all the possible solutions in a game. This algorithm is difficult to use with web searches, because the number of possible solutions to a word or phrase search is nearly infinite on the Web.
  • Constraint satisfaction search: When you think of searching the Web for a word or phrase, the constraint satisfaction search algorithm is most likely to satisfy your desire to find something. In this type of search algorithm, the solution is discovered by meeting a set of constraints, and the data set can be searched in a variety of different ways that do not have to be linear. Constraint satisfaction searches can be very useful for searching the Web.
These are only a few of the various types of search algorithms that are used when creating search engines. And very often, more than one type of search algorithm is used, or as happens in most cases, some proprietary search algorithm is created. The key to maximizing your search engine results is to understand a little about how each search engine you’re targeting works. Only when you understand this can you know how to maximize your exposure to meet the search requirements for that search engine.

Retrieval and ranking

For a web search engine, the retrieval of data is a combination activity of the crawler (or spider or robot), the database, and the search algorithm. Those three elements work in concert to retrieve the word or phrase that a user enters into the search engine’s user interface. And as noted earlier, how that works can be a proprietary combination of technologies, theories, and coding whizbangery.

The really tricky part comes in the results ranking. Ranking is also what you’ll spend the most time and effort trying to affect. Your ranking in a search engine determines how often people see your page, which affects everything from revenue to your advertising budget. Unfortunately, how a search engine ranks your page or pages is a tough science to pin down.

The most that you can hope for, in most cases, is to make an educated guess as to how a search engine ranks its results, and then try to tailor your page to meet those results. But keep in mind that, although retrieval and ranking are listed as separate subjects here, they’re actually part of the search algorithm. The separation is to help you better understand how search engines work.

Ranking plays such a large part in search engine optimization that you’ll see it frequently in this book. You’ll look at ranking from every possible facet before you reach the last page. But for now, let’s look at just what affects ranking. Keep in mind, however, that different search engines use different ranking criteria, so the importance each of these elements plays will vary.
  • Location: Location doesn’t refer here to the location (as in the URL) of a web page. Instead, it refers to the location of key words and phrases on a web page. So, for example, if a user searches for “puppies,” some search engines will rank the results according to where on the page the word “puppies” appears. Obviously, the higher the word appears on the page, the higher the rank might be. So a web site that contains the word “puppies” in the title tag will likely appear higher than a web site that is about puppies but does not contain the word in the title tag. What this means is that a web site that’s not designed with SEO in mind will likely not rank where you would expect it to rank. The site www.puppies.com is a good example of this. In a Google search, it appears ranked fifth rather than first, potentially because it does not contain the key word in the title tag.
  • Frequency: The frequency with which the search term appears on the page may also affect how a page is ranked in search results. So, for example, on a page about puppies, one that uses the word five times might be ranked higher than one that uses the word only two or three times. When word frequency became a factor, some web site designers began using hidden words hundreds of times on pages, trying to artificially boost their page rankings. Most search engines now recognize this as keyword spamming and ignore or even refuse to list pages that use this technique.
  • Links: One of the more recent ranking factors is the type and number of links on a web page. Links that come into the site, links that lead out of the site, and links within the site are all taken into consideration. It would follow, then, that the more links you have on your page or leading to your page the higher your rank would be, right? Again, it doesn’t necessarily work that way. More accurately, the number of relevant links coming into your page, versus the number of relevant links within the page, versus the number of relevant links leading off the page will have a bearing on the rank that your page gets in the search results.
  • Click-throughs: One last element that might determine how your site ranks against others in a search is the number of click-throughs your site has versus click-throughs for other pages that are shown in page rankings. Because the search engine cannot monitor site traffic for every site on the Web, some monitor the number of clicks each search result receives. The rankings may then be repositioned in a future search, based on this interaction with the users.
Page ranking is a very precise science. And it differs from search engine to search engine. To create the best possible SEO for your site, it’s necessary to understand how these page rankings are made for the search engines you plan to target. Those factors can then be taken into consideration and used to your advantage when it’s time to create, change, or update the web site that you want to optimize.

  • Digg
  • Del.icio.us
  • StumbleUpon
  • Reddit
  • RSS

0 komentar: