How to do an Effective Search on the Web

Searching the Web is a lot like life. Some days you find all the answers, and some days you can't buy a clue with a million dollars. I can't change your life, but I hope to change the quality of your searching with a few tips.

First of all, search engines come in two basic models: catalogs such as Magellan and Yahoo; and search engines like AltaVista, Excite, Infoseek, Lycos, Open Text, and WebCrawler. Catalogs are ideal for broad category searches of established sites. For narrow, nitty-gritty searching of all Web sites, a search engine is your tool of choice. A search engine is based on an abstract of the contents of millions of Web sites. Some, like Infoseek, AltaVista, and Excite cover the full text of many pages. Others are based only on headers and page titles.

Want to peruse all that's available? The Internet Sleuth links to hundreds of searchable databases, from the standard engines to Usenet searchers, plus extensive categorized topics (http://www.isleuth.com). Also, Wired Source offers descriptions of several engines and allows you to search them from one page (http://www.wiredsource.com/wiredsource)

Get to Know Your Sources
 
But just because you know how to navigate Yahoo like a pro doesn't mean you know how to use all the other Web catalogs and search engines. When approaching a new search site, take the time to study any instructions and FAQs to find out what kind of search strings it accepts. Minutes spent in the beginning will save you hours of searching in the future.

In particular, you want to note which search method is the default for the system and whether you have any options in how the engine presents its results to you. For instance, it's better to rank successful searches by their relevance to your keywords than by the date the page was updated or indexed.

Know Your Subject
 
Understand exactly what it is that you're looking for. If you start out with an ill-defined query, you're sure to end up with an excess of unwanted information. You can always learn as you go, and sometimes that's the only option; but things will go far more smoothly when you already know the terminology of your target subject. For example, in looking for information about Ethernet, you'll fare better if you know that the two main varieties are 10-Mbps and 100-Mbps Ethernet.

Expect Limitations
 
No Web search tool is perfect in terms of accuracy or comprehensiveness. Many people, at least at first, believe that search engines enable them to find the most recent available information on the subject du jour. Nothing could be further from the truth. Catalogs, because they require intelligent intervention to fit sites into their matrix, are always months behind. Even when a Webmaster makes a point of submitting a site to a catalog, it can take months before the site appears. In my own case, my company's site took four months from the date of my initial request to appear in Yahoo.

Search engines are more up to date, but even they take months to sweep the Internet for new material. By the time they've completed one sweep, Web content has already changed significantly. Even the best search engines can give you only a fragmentary view of the endlessly evolving Web.

Some dynamic sites, by their very nature, are impossible to index correctly. News sites such as CNN and daily updated corporate sites are some examples. The bottom line is that no search engine can find very recent material.

Cut Through Data Overload
 
Keep in mind that search sites are not in any way comprehensive maps of the Net. The World Wide Web is simply too vast for even the most advanced search engine to cover adequately.

Compounding this problem is that all search engines have flaws. You might think, for instance, that a simple keyword search would produce identical results from each, with the main difference being their performance speeds. Not by a long shot. Identical searches, even when used on systems that have the same search syntax, will result in radically different results.

Ask a Question
 
Some engines, such as Infoseek, allow you to input your query as a question or phrase. If this doesn't yield the desired results, try recasting your question to something more specific. Remember to capitalize proper names in your queries.

Use Boolean Operators
 
Most data hunts require you to fine-tune your search requests. All of the top sites permit Booleans, or logical search operators, for this purpose. Boolean Web systems tend to use four different operators to help you narrow down your search.

The most important of these is "AND." When you use AND in a query, you're telling the search engine to find one word AND another word in a document. For example, the best way to find Web pages about Java working with databases is to search for Java AND database. (Some engines like AltaVista allow you to simulate the AND search with a semicolon, such as Java;database.)

AND isn't a cure-all, and it's easy to misuse. One common mistake is to use it in a phrase. Say you wanted to find sites concerning Java Database Connect (JDBC). Most sites will let you search with a phrase. But if you use Java AND database AND connect, you're going to get not only pages about JDBC, but also ones that cover Java and Open Database Connect (ODBC), an entirely different way of linking databases to the Web.

Many sites default to using AND between words, but not all do so. Lycos, for example, uses "OR" as its default. This is why you should know exactly how each site works before you use it.

OR is the least useful of the Boolean operators. When you key in a search string, such as Internet OR World, you end up not only with references to Internet World, but also references to all documents containing either Internet or World anywhere in their text. How helpful is this? Many search servers won't even process this request because it would generate hundreds of thousands of hits. Terms like this--words that are too broad to be worth searching on, such as "computer"--are called stopwords. Each server has its own list, which you may want to check. If you don't, you could run a search that yields wildly wrong answers.

The only place OR belongs is in a very narrow search in which one of your key terms may appear in two different ways. With our Java-database example above, you'd be smart to search for Java Database Connect or its acronym, JDBC.

Far more helpful for efficient information retrieval is the "NOT" command. With NOT, you're telling the search engine to find pages that contain one term but not another. If I wanted to read about Oracle's Universal Server, but not Informix's product of the same name, I'd hunt the information down with: Oracle AND "Universal Server" NOT Informix. The quotes are there to make sure that the search engine understands that I'm looking for the phrase, not the words Universal or Server separately. Without NOT, I'd be deluged with false hits of the countless sites that contain these common words.

Some search servers, such as Yahoo and Lycos, can't process the Boolean NOT. Others, like AltaVista, allow the oddball AND NOT combination. Still others require that you enter Boolean terms only in special advanced search forms and won't accept them for simple searches (although they won't give you an error message to warn you).

Another, less common, Boolean operator is NEAR. When you use NEAR, you're asking the engine to find words that are not only in a document, but also are within a few words of each other. Say you want to find documents about Internet faxing and telephony. With a mere AND search, you'll get hundreds of useless hits from sites concerning telephony that also happen to contain the all-too-common "fax" (perhaps mentioned in their "contact us" section). To avoid a myriad of false hits, use "fax NEAR telephony." This method can go a long way in eliminating search clutter.

AltaVista's New Approach
 
LiveTopics is a variation on Boolean searching employed by AltaVista. To try it out, enter your keywords as usual into the AltaVista engine. When your results are returned, you'll find a LiveTopics option at the top of the page. Click on it and AltaVista will deliver a page of possible related keywords in a checkable-box format. By clicking to highlight some and reject others, you can greatly narrow your search. It's a simple way to imitate Boolean AND and NOT searching via checkmarked boxes. For example, a search on banking yields a LiveTopics page of subcategories such as mergers, financing, customers, cash, and insurance. You'll see how your new search string is built at the top of the page as you click on subtopics.

Specialized Engines
 
For more precise searching, you can step up to a specialized search engine or index, which are dedicated to collecting all relevant sites for a particular subject. These sites allow you to zero in on a subject quickly and easily. Some examples are FindLaw, for targeting legal resources on the Net (http://www.findlaw.com), and HealthAtoZ, which indexes health and medical sites (http://www.healthatoz.com). When you want to know the latest dirt on ISDN, go straight to Dan Kegel's ISDN site (http://www.alumni.caltech.edu/~dank/isdn).

As you become more adept at hunting down specific topics on a regular basis, you'll come to know the best specialized indexes for those subjects. In short, when someone has already sifted through a subject area, build on their results rather than start from scratch.

For concise telephone, Net phone, and street and e-mail address searches, try Four11, which is one of many sites offering online white-page directories (http://www.four11.com). And sites such as Big Yellow provide electronic yellow pages (http://www.bigyellow.com).

Graphics Searching
 
If graphics are vital to your job, an efficient search method for them is an absolute necessity. While Web search engines haven't mastered image indexing, they have reached the point where they can offer some assistance to design professionals.

Some of the general-purpose search engines have added image-searching capabilities to their bag of tricks. AltaVista, for example, lets you search for images by using the "image:" prefix for a search, while Yahoo has entire sections devoted to graphics. There are image-specific search engines but, to be honest, they're disappointing. Image Surfer, a Yahoo subsidiary, combines a basic search mechanism with a small category-oriented guide to image-based Web sites. If you search for pictures of modems, for example, you'll only get pictures that have the word "modem" in the image tag. Because of minimal tools and a tiny database, Image Surfer isn't terribly helpful (http://isurf.interpix.com).

WebSeer, on the other hand, has a lot going for it. You can search not only by subject but also by dimensions, colors, and image origin. Combine this with a 700,000-plus image database and you have a solid beginning for the future of image searching (http://webseer.cs.uchicago.edu). That being said, we're still a long, long way from having a comprehensive guide to graphics on the Web.

By combining these tips with persistence, you'll come a long way toward finding what you need, right when you need it. To be truly proficient, you'll have to practice your search strategies, but the rewards will be more than worth the effort. (For an overview of search engines, read the May 1996 IW Labs review, "Search Engine Showdown." You can view it online at http://www.iw.com/1996/05/showdown.html.)


Rohit Bafna is the CEO of CyberAds studio. He has developed several search engines on various sites in cyberspace.  

 
Sponsors

CyberAds Studio

The Difference is People
Experienced consulting team assists to oursource your IT projects and Technical Support Help Desk. CyberAds Studio runs an Offshore Software Development Center (ODC) in India and China.

Our strengths are in Content Management, Portal Development, Custom Software and application development, Wireless Application Development, Smart Card, Embedded System development, System Integration, Global Project Management and Offshore Software Development Center.

Innovative and state-of-the-art website design offering technically-savvy perspectives on corporate communications and web marketing. See examples of our website design.

Submit your Resume
Apply online for an exciting career with CyberAds Studio in the US, Europe and Brazil and at our offshore software development centers in India and China. View Jobs and Submit Resume

Spear Art Museum

Exclusive showcase of contemporary Indian Fine Art with famous paintings of M.F. Husain, Satish Gujral, Laxman Shreshtha, Deepak Shinde, S.H. Raza, Sanjay Bhattacharyya, Prabhakar Barwe, N.S. Bendre, and Anjolie Ela Menon.

Visit the SPEAR Art Museum and Gallery.