W5H . Search Engine Tutorial






Who is the best?

  • Size is important
  • Specialty Features
  • Retrieval Algorithm
    • Boolean Logic
    • Proximity Searching
    • Field Searching
    • Truncation *
    • Case Sensitivity
  • Ranking Algorithm
    • Frequency of words
    • Proximity of words
    • Location of words (Field placement)
  • Interface
Comparison Survey

"If the bars are close together, it suggests that the reported sizes given by the search engine are indeed correct.

If the "Reported" bar is higher than the "Estimated" bar, it suggests that the reported size for that search engine is inflated. It could also indicate that the research was not correct for that search engine.

Where the "Estimated" bar is higher than the "Reported" bar is higher, it suggests that the research may not have been correct for that search engine, or that the search engine itself underrepresented its index."

Sullivan, D. Search Engine Watch. 2000. www.searchenginewatch.com

Coverage Comparison

Sullivan, D. (January 30, 1999) "Search Engine Sizes" http://www.searchenginewatch.com

Statistics for search-engine coverage and recency
Search Engine Northern Light Snap
AltaVista HotBot Microsoft InfoSeek Google Yahoo Excite Lycos EuroSeek Average
Coverage with respect to
combined coverage (%)
38.3 37.1 37.1 27.1 20.3 19.2 18.6 17.6 13.5 5.9 5.2 n/a
95% Confidence Interval ± 0.82 ± 0.75 ± 0.77 ± 0.64 ± 0.51 ± 0.82 ± 0.69 ± 0.61 ± 0.46 ± 0.30 ± 0.40 n/a
Coverage with respect
to estimated web size
16.0 15.5 15.5 11.3 8.5 8.0 7.8 7.4 5.6 2.5 2.2 n/a
Percentage of
invalid links
9.8 2.8 6.7 2.2 2.6 5.5 7.0 2.9 2.7 14.0 2.6 5.3
Mean age of
new matching documents (days)
141 240 166 192 194 148 n/a 235 206 174 n/a 186
Median age of new
matching documents (days)
84 91 33 51 57 60 n/a 76 47 174 n/a 57

Lawrence, S. & Giles, C.L. (July 8 1999). "Accessibility of Information on the Web" Nature. Vol. 400
