[Xapian-devel] Proposed changes to omindex

James Aylett james-xapian at tartarus.org
Tue Aug 29 10:54:11 BST 2006

On Tue, Aug 29, 2006 at 12:06:53AM +0100, Olly Betts wrote:

[links to related projects]
> For example, Wikipedia manages to maintain a vast amount of what is
> essentially documentation with relatively few problems because there are
> enough people who care going round and keeping things tidy and
> consistent.

The difference is that Wikipedia doesn't have a separate website. ie:
wikipedia is using a wiki as a content management system.

> I wonder if a CMS isn't overkill for what we need, but perhaps CMS
> conjures up a different image to me than to you...

If you think it's overkill, then yes, I'm thinking of something a
little different to you :)

Nothing hardcore, just an easy way of editing the web page content,
and maybe creating new pages.

> But 4-byte and 8-byte strings won't compare correctly, so you can't
> suddenly start adding 8-byte strings to a database full of 4-byte ones.
> So you need to convert all the 4-byte values first.  Or implement custom
> sort orders in the matcher.

That's what I meant.

Of course, you can make 4-byte and 8-byte strings reverse sort correctly.

> > Okay, but if omindex added the file path as the source identifier, I
> > can see how that would be useful. In particular, if you (for some
> > reason) batch delete files, it's an awful lot quicker than using
> > omindex to reindex the entire system to get rid of them from xapian.
> But you really don't want a field in the document data for that, since
> you'll have to read the document data for every document in the database
> which will be really slow.  It could well be faster to just rerun
> omindex once it checks last modified times (actually we could add a
> "purge" mode which just removed any documents which no longer exist).

But not easier if you have lots of different subsites; although we've
discussed having a config file for omindex, which would resolve this.

I'm not particularly arguing for this, on reflection. It might be good
to have a defined field name for this, for consistency, and not have
omindex use it. But not essential.


  James Aylett                                                  xapian.org
  james at tartarus.org                               uncertaintydivision.org

More information about the Xapian-devel mailing list