[Xapian-discuss] Omega: partial searches

Frank Bruzzaniti frank.bruzzaniti at gmail.com
Thu Aug 14 05:28:13 BST 2008


Thanks heaps for answering all my questions, I'll have a look at tweakin
omindex for my instal to do partial string searchesl.

I was in the situation where I needed index more than 1 directory/share. But
I only have 1 search page.  Is the "best practice" to index build separate
databases for each directory/share I am indexing then use xapian-compact -m
DB1 DB2 DefaultDB ?

Thanks again.







2008/8/14 Olly Betts <olly at survex.com>

> On Thu, Aug 14, 2008 at 03:47:56AM +0930, Frank Bruzzaniti wrote:
> > Partial searches. In omega if I enter "red" I would get a match with the
> > word "red" but not "redhead", how would I search for the pattern within a
> > string. E.g. so when I enter "red" it returns "red" & "redhead"
>
> You can't do that as such.  Stemming will mean that different forms of
> the same word should match each other (e.g. "red" would match "reds")
> but red and redhead aren't the same word (though clearly there's a
> relationship between them).
>
> Trailing wildcards (e.g. red*) are supported by Xapian, but you
> currently need to tweak Omega's source code to add the appropriate flag
> to the call to QueryParser::parse_query() if you want Omega to enable
> this feature.
>
> > Also I made a simple script that runs omindex then xapian-compact, is
> there
> > any issue with this?  I thought I might as well compress the database at
> the
> > end of each omindex run.
>
> Note that "compress" isn't really the right term - xapian-compact just
> shuffles data around to eliminate as much dead space as it can.  The
> downside of this is that having a bit of dead space is a B-tree achieves
> its amortised cost of updates, so in simple terms, updates to a
> compacted database are slower until the dead space reemerges.  In
> practice, this probably isn't an issue.
>
> > Also I scheduled the script in crontab, but I'm thinking it might be a
> bad
> > idea if the script's run time is longer the the cron interval.
> > E.g. If the scrpt takes 1 hour to run but I've set crontab job to run
> every
> > 15 mins.  Would I be better to create a script that runs on boot and
> keeps
> > running with maybe a wait/sleep at the end?
>
> omindex will just fail to get a lock if another omindex is already
> running.  You'd probably want some lock around the combined
> omindex+xapian-compact though.
>
> > Also I noted that when the script runs it say's that it updates files
> even
> > tho i haven't altered them, is this normal?.   What's also suspecious is
> > that everytime I run omindex it takes almost the exact amout of time to
> run
> > even tho no files have changed.  Is there an easy way to only "scan"
> what's
> > changed?
>
> Currently the file modification times are stored but not used to decide
> to avoid indexing unmodified stuff.  There's a patch around for that,
> but it's not been merged yet.  It's more of an issue if you're running
> expensive external filters on files.
>
> > I guess at the end of the day I'm trying to keep an index that's able to
> > update files as they are added or best effort.
>
> The more efficient approach than regular polling (on systems which
> support it at least) is to use FAM or similar to notify you when
> files/directories of interest change:
> http://en.wikipedia.org/wiki/File_alteration_monitor
>
> Omega doesn't support that currently, though it would be nice to have as
> an option.
>
> > Also is the database searchable wile indexing is occuring, my tests say
> yes,
> > just wanna double check
>
> Yes.
>
> Cheers,
>     Olly
>


More information about the Xapian-discuss mailing list