[Xapian-devel] [GSoC 2014] About "Clustering of Search Results"

Olly Betts olly at survex.com
Tue Mar 11 13:33:56 GMT 2014


On Tue, Mar 11, 2014 at 10:11:31AM +0800, Chi Liu wrote:
> Thank you for your patient explanation about the project. My
> understanding about the project "Clustering of Search Results" is that
> we mainly focus on processing speed of the existing code.

We need something which can cluster larger result sets faster than the
current code.  Speeding up the existing code might be the best way to do
that, but we could start again.  If we start again, I'd suggest it would
be prudent to try to understand why the previous attempt didn't succeed.
We don't want to end up repeating that.

> By "find new approaches" I mean trying other known clustering algorithms.

OK - that's fine then.

> What I am concerned is whether the low efficiency is caused by
> improper algorithm. I am reading the existing clustering branch code
> and have not completely finished yet. I might be able to talk more
> about existing code in my application of GSoC. But now, I really can
> not comment before fully understanding exiting code.

Sure.

> My idea about measure clustering effectiveness is that when we trying
> other known clustering algorithms, we can use the old clustering
> result as a baseline.  If the difference of clustering results is
> acceptable and new clustering algorithm has high efficiency, we may
> find a better approach. I will give more details about this in my
> application of GSoC.

Great.

Cheers,
    Olly



More information about the Xapian-devel mailing list