WebClust Clustering Search Engine
Frequently Asked Question
 
Is WebClust a search engine? Not exactly. WebClust does not crawl or index the web. It organizes the outputs of other search engines: URLs, titles, and short descriptions. Moreover, WebClust is more than a simple meta search engine: it is a clustering engine.
What is WebClust doing?
WebClust is based on a technology called "Document clustering": the automatic organization of documents into meaningful groups. WebClust queries one or more web search engines, parses their result pages to extract the documents (titles, URLs, and short descriptions) and groups the documents based on this information. This extracted information is shown as a topic tree. All this is done in seconds.
What is the mission of WebClust? The mission of WebClust is to use data mining techniques to make sense of large amounts of textual information extracted from internet, intranet or digital libraries.
What is "Document clustering"?
Document clustering is the automatic organization of documents into groups or clusters. "Document clustering" differs from other techniques (e.g. classification) in that it is fully automated and there is no human intervention at any point. Clustering is performed just before the user sees the search results, just in time. There is no need to prepare anything beforehand, much less pre-process the entire document collection from where the results came. Moreover, clustering is a fully automatic process that requires no preparation steps, and hence no maintenance. On the contrary, classification requires pre-specifying categories (typically broad and hence rather bland) and updating these categories as new documents are added to the collection.
Is WebClust useful? Yes, the key difficulty in the searching process relies in what is “relevant” to the user. The same set of keywords may abstract different user needs that may also vary over the time according to the context in which the user is formulating his/her own query. Clustering search engines like WebClust can help users in their searches. WebClust helps the user to choose his topic of interest and then sift through the documents in that group, thus greatly reducing the burden of going though numerous documents returned by a typical search engine.
Why use WebClust? Clustering tools such as WebClust return results in clusters that provide a potentially more useful way to view your results. This is especially useful when your topic is complex or you are unfamiliar with your topic. Moreover, use WebClust if you are unfamiliar with your topic and want to see the breakdown of your topic into component areas doing in-depth topical research.
Is WebClust.com a commercial site? No, Webclust.com site is only a showcase for our clustering engine. Infact, the feeds of search results are provided through commercial APIs and the Javascript GUI includes software developed by the Carrot2 Project.
Why WebClust is different from competitors?

WebClust is light: the core clustering engine is written in C++ and the other components of the system are developed using XML, Perl and Javascript. You can develop a clustering search engine using a small Linux machine. Clustering of 200 document summaries (about three lines per summary) takes about 200 ms on a 1GHz Pentium class Linux machine. The core clustering engine of WebClust is also available in Windows version.

WebClust is simple: the system works without adopting any pre-defined organization in categories like taxonomies or databases of terms. The generation of the labels for each document group is created automatically by the software using text processing techniques. Other commercial clustering engines crawl the Web (e.g. web directories in an offline process) in order to extract statistically significant 2-word and 3-word sentences.

How WebClust works? Information processing performed by WebClust is articulated in some steps:
  • Draws the web results from one or several Web search engines.
  • Builds the clusters on-the-fly without adopting any pre-defined organization in categories.
  • Labels the clusters with sentences or relevant keywords of variable length, drawn from the snippets and possibly missing some terms.
In the following schema more details about a classic WebClust application. However, the WebClust Clustering Engine represents a core of more innovative services or products.



About  Contact  FAQ  Demos  Press

© 2006 WebClust. All rights reserved.