WebClust Clustering Search Engine
About
 


About WebClust  

WebClust is a meta search engine based on a technology called "Document clustering": the automatic organization of documents into meaningful groups. WebClust queries one or more web search engines, parses their result pages to extract the documents (titles, URLs, and short descriptions) and groups the documents based on this information. This process presents the best results of the web in a "horizontal" topical arrangement in addition to a single vertical list.

WebClust offers a service similar to Vivisimo but is more simple, immediate and light: the core clustering engine is written in C++ and the clustering of 200 documents takes about 200 ms on a 1GHz Pentium class Linux machine. A small Linux box is sufficient to serve dozens of queries per second.

The mission of WebClust is to use this data mining technique to make sense of large amounts of textual information extracted from internet, intranet or digital libraries.

 
About the founder  

Vincenzo Bacarella got in 2003 a BS in Computer Science from University of Pisa, Italy. In 2004 he was Research Associate at the Knowledge Discovery and Delivery Laboratory (KDD), a joint research group of ISTI (Institute of Italian National Research Council) and the Computer Science Department of University of Pisa. His academic research interests include search engines, information extraction from unstructured sources, and data mining of large text collections and operational data.

Actually he works at Milan in the function "Business Optimization and Analytics" for an important italian company.

 


About  Contact  FAQ  Demos  Press

© 2006 WebClust. All rights reserved.