[Xapian-devel] Document clustering module?

☼ 林永忠 ☼ (Yung-chung Lin) henearkrxern at gmail.com
Tue Sep 18 05:46:30 BST 2007


I have made a new class for calculating document similarity. Please
review it. Maybe the class should be an internal one, since this will
only be called by Xapian::Cluster in my plan.

Thanks.

Best,
Yung-chung Lin

On 9/17/07, ☼ 林永忠 ☼ (Yung-chung Lin) <henearkrxern at gmail.com> wrote:
> Yes, it's true. Focusing on a simple and plain design is the way it should be.
>
> Best,
> Yung-chung Lin
>
> On 9/17/07, Olly Betts <olly at survex.com> wrote:
> > If you're talking about grouping collapsed documents, that should
> > probably happen during the match process, like collapse does.  Don't
> > worry too much about that idea - let's focus on the clustering part
> > for now, and just bear in mind how it might be reused for this (or
> > perhaps this problem is too different).
> >
> > If you're not talking about that, there needs to be a clustering
> > algorithm specified for this to work.
> >
> > I wouldn't get too fancy initially - we don't want to produce an
> > elaborate API which we think does everything conceivable, only to
> > discover a better approach or something it can't nicely do, and then
> > have to choose between keeping the sub-optimal API we have, or the pain
> > of deprecation and transition.
> >
> > Let's just go with tagging each MSet entry with a cluster id for now.
> > That seems a good starting point, and everything which has been
> > suggested so far can either be built on top of that, or provide that as
> > a side-effect.
> >
> > And that should allow us to get clustering functionality into a release
> > sooner.
> >
> > Cheers,
> >     Olly
> >
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: docsim-patch.txt.gz
Type: application/x-gzip
Size: 3594 bytes
Desc: not available
Url : http://lists.tartarus.org/pipermail/xapian-devel/attachments/20070918/5e9ecaa9/docsim-patch.txt.bin


More information about the Xapian-devel mailing list