Search Engines and Directories

Search Engines and Directories are a critical factor which must be taken into account in the design of a web site for a Small Business. This page covers how to optimise and submit your site to the most important Search Engines and Directories. There are basically only three ways that anyone can reach your site, firstly they have followed a link, secondly because they have read the address somewhere and noted it and entered it into the browser, or thirdly they have used one of the many Searches available and found your site. Searches bring new visitors and business to your site, unless you properly registered with the major Search Engines you are invisible to potential customers searching the web. This article will cover:


What is the difference between a Search Engines and a Directory

Consolidation. Most of the major Search Sites now have both Search Engines and Directories available although the emphasis is usually on one or the other. The effort in indexing the whole WWW is enormous as is the reviewing of a general purpose Directory and a large amount of consolidation has taken place. Whilst there are thousands of Directories and Search Engines only a handful are of major importance, in particular in the case of the Search Engines. In both cases one finds that the same base directories and indexes are being consolidated and used by a number of apparently different services sometimes with different algorithms for ranking the results or with additional information such as popularity included. This makes submission easier but optimising the site potentially more difficult.

Which Searches are important?

There are various sources of estimates of "audience reach" and other criteria for ranking the importance of the various Search "services" and the underpinning Engines and Directories. Audience reach is usually defined to be the percentage of web surfers estimated to have used each Search during the month. The various sources produce different numbers but the general order is consistent. The Audience Reach numbers below are based on Jupiter Media Matrix's estimates for mid 2002 using a group of about 50,000 surfers who all had special monitoring software installed. Because a web surfer may visit more than one service, the combined totals exceed 100 percent.

MSN Search (default for Internet Explorer) 37%
Yahoo 33%
Google 29%
Lycos N/A
AOL 22%
Ask Jeeves 16%
LookSmart 9%
Infospace (Excite) 9%
Netscape Search (default for Netscape Browsers) 8%
Overture (Goto) 7%
Altavista <7%

A rather different perspective is obtained by measuring the time spent at each Search service by surfers, again using figures from Jupiter Media Matrix's estimates for mid 2002 to the nearest minute.

Google 24
Altavista 18
Ask Jeeves 16
Yahoo 11
LookSmart 8
Netscape Search (default for Netscape Browsers) 7
Infospace (Excite) 7
Lycos N/A
MSN Search 6
Overture (Goto) 3

Perhaps the most meaningful figures of all are the total time spent on Searches at each site given below in millions of hours per month for the main Search engines (figures from Jupiter Media Matrix's estimates for mid 2002) .

Google 12
Yahoo 7
Ask Jeeves 5
MSN Search 4
Altavista 2

The situation is quite different to a couple of years ago with the big winner being Google and the big loser being Altavista. The defaults for browsers still have a big reach with AOL coming up the lists. Lycos withholds access to its figures but is still significant. The figures above to not give the whole picture as some of the underlying services for the big portals such as Inktomi do not feature and consolidation again masks some of the trends.

Optimising a site for Search Engines


It is easiest and best to design your site and it's most important pages, to be friendly to Search Engines from the start. This will have an important additional advantage in that it will also make it more friendly to users. Why is this? A Search Engine's agent (robot/spider) has to be fairly simple and fast so you have to make sure that the important facts (keywords) are readily available and that navigation is clear and easy. Most of the Fancy Bits which slow a site down and annoy visitors will be ignored by Search Engines. Good design for a Search Engine means you are forced time and time again to examine and refine your content and "Content is King" when it come to the visitors to your site. It forces one to define the message you are getting over on a page and make it clear, consistent and concise in every place from the Title of the page to the Alt statement on an image.

Where do the Keywords come from and how is a page Ranked?

Keywords. Keywords are collected from the whole of the displayed text by all the Search Engines and also from the Title. Many Engines also use the text in the Description and Keyword META Tags. Some Engines also use the text in the ALT statements in Image Tags. Inktomi also indexes the text in comments.

Ranking for relevancy. I will now briefly summarize how a Search Engine ranks a site for relevancy. They all follow a set of rules, with the main rules involving the location and frequency of keywords on a web page. For example pages with keywords appearing in the title are considered to be more relevant than others when ranking the results. Search Engines will also check to see if the keywords appear near the top of a web page, such as in the headline or in the first few paragraphs of text. They assume that any page relevant to the topic will mention those words right from the beginning. Some indexing software places preference on complete, punctuated sentences. Frequency is another very important factor in how Search Engines determine relevancy. A Search Engine will analyze how often keywords appear in relation to other words in a web page. Those with a higher frequency are often deemed more relevant than other web pages. A note of caution however - if a word appears too often or in very small print or with no contrast to a background most Search Engines consider it to be spamming and may well reject the page.

Boost to Ranking. Search Engines may also boost the ranking using various other criteria, for example, many engines now use link popularity - they can tell which of the pages in their index have a lot of links pointing at them. These have an improved ranking because it is a reasonable assumption that a popular site is more relevant. Some Search Engines go further and only index pages with high link popularity. Some Search Engines boost the ranking of pages to which people have followed links presented in previous searches. Search Engines with associated directories, may boost sites they've reviewed. Meta tags may also provide a slight boost to pages with keywords in their meta tags but not all the Search Engines even use the Meta Tags so one should not depend solely on them.

The Importance of META Tags

Using <META> tags allow one to provide more detail about your Web pages and to gain some control over how your pages are indexed. You should use META tags on every page that you expect to be indexed, ie every page. They are not a magic bullet which will automatically improve your rankings but they do make the text which is displayed and the keywords which are give importance more predictable. Not all Search Engines, however, make use of <META> tags. Google and Northern Light do not use META Statements.

<META> tag codes are inserted between the <HEAD> and </HEAD> tags. Two tags are important:

Are Meta Tags essential? Not in most cases as the robots default to the text in the page - usually that at the top for the description. It is however very important to use <META> tags if one has Netscape Frames, Javascript or lots of links at the top of the page.

Tips for Optimising a Page for Search Engines

Minimise the use of Frames. It is important to start in the correct way so I will commence with a few features of HTML to avoid so we can get them out of the way. Frames should be avoided where possible - they are superficially attractive but most experienced users vary between dislike and hate and many Search Engines give up when presented with a framed site. Both people and Search Engine agents have the same problems in knowing where they are and where they came from. Even when an agent does get in and index a real page with content rather than the empty frame, which is rare, it still loses the context so that the address delivered as the search result is often frame-less so the searcher is left with a small part of the story and no way to navigate other than guess the URL of the site and start again. If you have a framed site already there are some tricks which will help but it is best to keep clear of frames. Think of the difficulty most browsers still have in printing or bookmarking a framed site or even displaying the URL of the page you are actually looking at - how many times have you printed an empty page or a page with only buttons?

Use conventional hyperlinks as well as Image Maps and Java. Clickable image maps are a HTML feature, which is poorly or not handled by many agents. This prevents the spider following the links to your other pages. The same applies to some of the clever Java based navigation with buttons that change when you hover over them etc. The solution is to add ordinary links in addition which is why you see tiny lists of links duplicated at the bottom of many web pages when they have fancy navigation. Most users prefer a standard hyperlink not least because you can immediately see the links you have already visited. Again keeping it simple, or adding a standard alternative, keeps the site fast and user friendly as well as satisfy the needs of the Search Engines - how often have you given up waiting for a fancy map to download or set of buttons to appear and tried to guess which link to try or given up and gone elsewhere?

Select your Strategic Keywords Think how people will search for your web page? The words you imagine them typing into the search box are your strategic keywords. Each page in your web site will have different strategic keywords that reflect the page's content. Some of your strategic keywords should always be two or more words long. Usually, too many sites will be relevant for a single word and the "competition" means the chances of a good ranking are reduced so pick phrases of two or more words, and you'll have a better chance of success.

Look at the Keywords used by high ranked competitors You can view the source of an HTML page in your Browser by View -> Source or on a Right Click menu and see what they have in their META statements.

Choose the page Title carefully. The Title referred to here is not the first HTML heading that shows up on your page but is the text located between the <TITLE> and </TITLE> tags - this is what a browser will display at the top of it's window . Titles are the most important element that a Search Engine indexes as the robots go first to the <TITLE> Tag, so every page must have a Title and it should be as descriptive as possible. It is normal to use around six words and most Search Engines only display the first 70 Characters of the Title. Use Important Keywords at the start of the Tag and do not repeat them. Failure to put one's strategic keywords in the page title is arguably the most common cause of a relevant web page being poorly ranked.

Position Your Keywords Make sure your strategic keywords appear in the crucial locations on your web pages. Search Engines give higher ranking to keywords appear "high" on the page so use strategic keywords in your page heading - I often use the same text in the Title and the Heading for the page. You then need to repeat them in the first paragraphs of the web page. The first words of the first sentence are the most important of all and are also the words which have to get and hold the attention of the visitor.

Have Some Relevant Content This may sound obvious but carefully setting up page titles and adding meta tags is not necessarily going to help your page do well if the page has nothing to do with them - Keywords need to be reflected in the page's content. This also means that you need ordinary text on the page - Search Engines can't read graphics. Some of the Search Engines will index ALT text and comment information, along with meta tags but to be safe, use ordinary text whenever possible. Visitors also like to have text to read - it loads quickly and keeps them busy whilst your graphics are appearing.

Take care with Tables and JavaScript. Tables and JavaScript can "push" your main text further down the page, making keywords less relevant because they appear lower on the page. This is because tables break apart when most Search Engines read them. For example, it is very common to have a two-column page, where the first column has navigational links and the second column has the main text. This moves the keywords down the page. I still think Tables are a much better option than Frames but one should consider carefully how they affect your pages - you may be able to add an empty cell in the first row and column and start the navigation links in the second row. Large sections of JavaScript have the same effect - the Search Engine reads them first causing the visible text to appear lower on the page. If possible place scripts low down on the page.

Avoid Spamming. Be sure that your text is "visible." It is tempting to try to fool the Search Engines by repeating keywords in a tiny font or in the same colour at the background colour to make the text invisible to browsers. Search Engines are catching on to these and other tricks. Expect that if the text is not visible in a browser, then it won't be indexed by a Search Engine.

Registering a site

Services for Registering sites

A simple approach is to use a free service such as that provided at Netmechanic to do an initial registration with some of the more popular Search Engines. This service submits to both the classic robot agent/crawler type Search Engines and also to some Directories so you need to have a list of key words and descriptions of various lengths prepared in advance. This service is better than most as you get an emailed report back stating which registrations were accepted or rejected and estimates of how long before the site will be indexed by each Search.

Detailed instructions for selected Search Engines

I generally favour a more direct approach to the most important Search Engines - it is usually quite quick as all they need is the URL of the site although they usually request an email address and sometimes a broad category or country from a drop down list. The home page of the site often has an "Add URL" link, if it not there try under "Advanced Search" or Help to find it. Note that Several Indexes and Directories are now used at multiple search sites so not every site even has an"Add URL" link. The list of "true" Search Engines that crawl the web that I registered for both myself and the sites for which I was Webmaster in 2000:
  1. Google is also used for parts of a number of primarily directory and portals including Yahoo and Netscape
  2. Altavista
  3. Excite - now Infospace
  4. HotBot is used as the route into Inktomi which can not be accessed directly but not only powers Hotbot but also underpins a number of other important Searches including AOL and MSN (Add URL is at http://hotbot.lycos.com/addurl.asp or http://www.hotbot.lycos.co.uk/submit.html )
  5. Go / Infoseek
  6. FAST Search is also used by parts of Lycos Add URL page is at http://www.alltheweb.com/add_url.php
  7. Northern Light (Add URL at http://www.northernlight.com/docs/regurl_help.html )
  8. WebCrawler and Magellan are now owned by Excite which is used by Infospace and it looks as if they now share the same Index

The first seven probably cover over 90% of the true crawler based Searches and an even higher percentage of the secondary searches on Search Sites which are primarily Directory based such as Yahoo, Lycos, AOL, Netscape and MSN. Several of the services now have a fast track paid service for typically $39 which guaranties inclusion and indexing within a couple of days - it may be the only way soon to get a deep indexing of all pages although the home page should be indexed by all major searches if it has links to it.

Important Directories

The most important Directories are probably:
  1. Yahoo - very important but also very difficult to get an entry. Yahoo is probably the Directory most people first think of and has a vast following. It is very difficult to get a site reviewed and accepted and there is no feedback unless you use the premium service where you pay a significant sum. Just keep trying every six months as the site develops and matures.
  2. The Open Directory - Used by Googles Directory section, Netscape, and a growing number of other Search sites. The Open Directory is edited by volunteers who check every submission and may well change the category you suggest. It is recently gaining a more regional emphasis and sites are being moved to new categories. I have had no difficulty in getting sites accepted and have had feedback from the reviewer when I asked for a category change to be revised.
  3. LookSmart - largely because it is used by MSN. LookSmart reviews all the sites and again pushes the premium service where you pay $199 to ensure the site is reviewed within 48 hours. I found it much more difficult to find suitable categories and to use LookSmart than Yahoo or the Open Directory. When I tried to register a site recently I went round in circles and it seems that it is now a pay only service.
Coverage of Global Directories If you get into all three you have probably covered over 85% of Global Directory searches relevant to a Small Business - it may not be worth paying for LookSmart if you are in the others.

National Directories There will also be National Directories such as Scoot, Yell, Ask Alex where you can add a link to your web site but usually only when subscribing to a premium service - these will not be considered here as they can be very expensive. Scoot could cost you £1400 a year if you get a lot of referrals.

Registering with a Directory

This is much more difficult and time consuming than registering with a Search Engine and is best done in a number of stages.

Read all the Help Information for Registration. Find the Add URL link and read all the help files so you know what is being looked for. Successful Directories have to adapt and develop so you need the latest information. Your submission and site will be reviewed by a skilled person whose time is short so make sure your site is finished and you know what is required.

Select a category This is a very important step as it is quite difficult to get the category changed once the site has been reviewed and the review is not going to be favourable if you have not picked the right place for the site. Spend some time using the catalog for each directory you are going to register with - you can follow the links down through the subcategories or in many cases there will be a Search Box which helps identify suitable categories and find sites based on the small number of keywords the site has provided and/or keywords associated with categories. Many Directories are now adopting a regional basis so you may need to start from several top levels. When you have a feel for the Directory check you are on the right lines by finding all your competitors. Bookmark the selected category in your favourites list.

Compile the information which the Directory needs on the Registration Form. Much of this can be based on the Title and META tags on the page but needs to be adapted for each Directory. The Help Information should tell you exactly how long the Title and Description has to be and many Keywords are allowed or whether the site creates its own list. I note this down then spend a while producing all the information in a Word Processor or Notepad/Write carefully checking lengths etc. When it is all complete sleep on it overnight. The following are the lowest common denominator of the Directories above when I last checked:

Register. This document with all the prepared inputs for the form can then be opened and the information cut and pasted into the form. Check that drop down boxes for region etc are correct. I print what I have entered before pressing the submit button.

Check if you are registered frequently It may take several weeks or even months to appear in the Directory and when you do you will need to check that you are in the correct place. Directories evolve and your category may be changed at any time so check every week or two and contact the Directory if the category is not appropriate. If you do not appear after 3 months resubmit and continue to do so every 3 or 4 months.

Extra Sources of Information

Search Engine Watch is an excellent site which has more information than I can ever convey - you should visit if you are serious about getting a good ranking with Search Engines. A good entry is at http://searchenginewatch.internet.com/sitemap.html which is the site map with links to about 100 internal pages split into the main areas. Each page has links to in depth information on other sites - a formidable database. The site is referred to at many of the search sites and seems to be well regarded - I have used the information extensively in producing a strategy for optimising the sites I look after and selecting and submitting them to the various Search Engines and Directories.

