![]() |
About |
| About WebClust | |
WebClust is a meta search engine based on a technology called "Document clustering": the automatic organization of documents into meaningful groups. WebClust queries one or more web search engines, parses their result pages to extract the documents (titles, URLs, and short descriptions) and groups the documents based on this information. This process presents the best results of the web in a "horizontal" topical arrangement in addition to a single vertical list. WebClust offers a service similar to Vivisimo but is more simple, immediate and light: the core clustering engine is written in C++ and the clustering of 200 documents takes about 200 ms on a 1GHz Pentium class Linux machine. A small Linux box is sufficient to serve dozens of queries per second. The mission of WebClust is to use this data mining technique to make sense of large amounts of textual information extracted from internet, intranet or digital libraries. |
|
| About the founder | |
![]() |
Vincenzo Bacarella got in 2003 a BS in Computer Science
from University of Pisa, Italy. In 2004 he was Research Associate at
the Knowledge Discovery and Delivery Laboratory (KDD), a joint research
group of ISTI (Institute of Italian National Research Council) and the
Computer Science Department of University of Pisa. His academic research
interests include search engines, information extraction from unstructured
sources, and data mining of large text collections and operational data. |
© 2006 WebClust. All rights reserved.