[Xapian-devel] [GSoC 2014] About "Clustering of Search Results"

Olly Betts olly at survex.com
Tue Mar 11 00:47:52 GMT 2014


On Mon, Mar 10, 2014 at 08:50:14PM +0800, Chi Liu wrote:
> The topic of "Clustering of Search Results" looks interesting and I think
> it suits me. I have been involved in a project that aims to clustering
> tweets based on the text similarity and user profile. I noticed that
> "Clustering of Search Results" have mentioned disappointing performance.I
> am puzzled that is this project just concerned improve the performance of
> the old code or also trying to find new approaches?

Most applications of Xapian are interactive, so to actually be
practically useful clustering needs to complete in a reasonable amount
of time (a fraction of a second ideally).  I think that needs to be a key
aim of the project.

But if that aim is addressed, exactly what else the project consists of
is largely up to you.

If by "find new approaches" you mean different approach to that used by
the existing clustering branch, then sure.  If you're talking about
doing original research, I'd be a little cautious about that, as
clustering is a relatively mature field, and I'm a bit dubious a student
could development and implement a new approach in the GSoC timescale.

> Besides clustering speed, how to evaluate clustering effect?

That's a good question - I'm not sure how clustering effectiveness is
typically measured.  But if we're implementing known approaches,
a formal evaluation of effectiveness is probably less necessary.

Cheers,
    Olly



More information about the Xapian-devel mailing list