[Xapian-discuss] Omega: partial searches

Frank Bruzzaniti frank.bruzzaniti at gmail.com
Thu Aug 14 10:07:39 BST 2008


Great,  I've been looking thought the source, I was wondering if you could
point me in the right direction re which files I need to "tweak".

Thanks

Frank


2008/8/14 Frank Bruzzaniti <frank.bruzzaniti at gmail.com>

> Thanks heaps for answering all my questions, I'll have a look at tweakin
> omindex for my instal to do partial string searchesl.
>
> I was in the situation where I needed index more than 1 directory/share.
> But I only have 1 search page.  Is the "best practice" to index build
> separate  databases for each directory/share I am indexing then use
> xapian-compact -m DB1 DB2 DefaultDB ?
>
> Thanks again.
>
>
>
>
>
>
>
> 2008/8/14 Olly Betts <olly at survex.com>
>
> On Thu, Aug 14, 2008 at 03:47:56AM +0930, Frank Bruzzaniti wrote:
>> > Partial searches. In omega if I enter "red" I would get a match with the
>> > word "red" but not "redhead", how would I search for the pattern within
>> a
>> > string. E.g. so when I enter "red" it returns "red" & "redhead"
>>
>> You can't do that as such.  Stemming will mean that different forms of
>> the same word should match each other (e.g. "red" would match "reds")
>> but red and redhead aren't the same word (though clearly there's a
>> relationship between them).
>>
>> Trailing wildcards (e.g. red*) are supported by Xapian, but you
>> currently need to tweak Omega's source code to add the appropriate flag
>> to the call to QueryParser::parse_query() if you want Omega to enable
>> this feature.
>>
>> > Also I made a simple script that runs omindex then xapian-compact, is
>> there
>> > any issue with this?  I thought I might as well compress the database at
>> the
>> > end of each omindex run.
>>
>> Note that "compress" isn't really the right term - xapian-compact just
>> shuffles data around to eliminate as much dead space as it can.  The
>> downside of this is that having a bit of dead space is a B-tree achieves
>> its amortised cost of updates, so in simple terms, updates to a
>> compacted database are slower until the dead space reemerges.  In
>> practice, this probably isn't an issue.
>>
>> > Also I scheduled the script in crontab, but I'm thinking it might be a
>> bad
>> > idea if the script's run time is longer the the cron interval.
>> > E.g. If the scrpt takes 1 hour to run but I've set crontab job to run
>> every
>> > 15 mins.  Would I be better to create a script that runs on boot and
>> keeps
>> > running with maybe a wait/sleep at the end?
>>
>> omindex will just fail to get a lock if another omindex is already
>> running.  You'd probably want some lock around the combined
>> omindex+xapian-compact though.
>>
>> > Also I noted that when the script runs it say's that it updates files
>> even
>> > tho i haven't altered them, is this normal?.   What's also suspecious is
>> > that everytime I run omindex it takes almost the exact amout of time to
>> run
>> > even tho no files have changed.  Is there an easy way to only "scan"
>> what's
>> > changed?
>>
>> Currently the file modification times are stored but not used to decide
>> to avoid indexing unmodified stuff.  There's a patch around for that,
>> but it's not been merged yet.  It's more of an issue if you're running
>> expensive external filters on files.
>>
>> > I guess at the end of the day I'm trying to keep an index that's able to
>> > update files as they are added or best effort.
>>
>> The more efficient approach than regular polling (on systems which
>> support it at least) is to use FAM or similar to notify you when
>> files/directories of interest change:
>> http://en.wikipedia.org/wiki/File_alteration_monitor
>>
>> Omega doesn't support that currently, though it would be nice to have as
>> an option.
>>
>> > Also is the database searchable wile indexing is occuring, my tests say
>> yes,
>> > just wanna double check
>>
>> Yes.
>>
>> Cheers,
>>     Olly
>>
>
>


More information about the Xapian-discuss mailing list