[Xapian-devel] Proposed changes to omindex
James Aylett
james-xapian at tartarus.org
Tue Aug 29 10:54:11 BST 2006
On Tue, Aug 29, 2006 at 12:06:53AM +0100, Olly Betts wrote:
[links to related projects]
> For example, Wikipedia manages to maintain a vast amount of what is
> essentially documentation with relatively few problems because there are
> enough people who care going round and keeping things tidy and
> consistent.
The difference is that Wikipedia doesn't have a separate website. ie:
wikipedia is using a wiki as a content management system.
> I wonder if a CMS isn't overkill for what we need, but perhaps CMS
> conjures up a different image to me than to you...
If you think it's overkill, then yes, I'm thinking of something a
little different to you :)
Nothing hardcore, just an easy way of editing the web page content,
and maybe creating new pages.
> But 4-byte and 8-byte strings won't compare correctly, so you can't
> suddenly start adding 8-byte strings to a database full of 4-byte ones.
> So you need to convert all the 4-byte values first. Or implement custom
> sort orders in the matcher.
That's what I meant.
Of course, you can make 4-byte and 8-byte strings reverse sort correctly.
> > Okay, but if omindex added the file path as the source identifier, I
> > can see how that would be useful. In particular, if you (for some
> > reason) batch delete files, it's an awful lot quicker than using
> > omindex to reindex the entire system to get rid of them from xapian.
>
> But you really don't want a field in the document data for that, since
> you'll have to read the document data for every document in the database
> which will be really slow. It could well be faster to just rerun
> omindex once it checks last modified times (actually we could add a
> "purge" mode which just removed any documents which no longer exist).
But not easier if you have lots of different subsites; although we've
discussed having a config file for omindex, which would resolve this.
I'm not particularly arguing for this, on reflection. It might be good
to have a defined field name for this, for consistency, and not have
omindex use it. But not essential.
James
--
/--------------------------------------------------------------------------\
James Aylett xapian.org
james at tartarus.org uncertaintydivision.org
More information about the Xapian-devel
mailing list