[Xapian-discuss] [q] phrase replacement in thousands of text files

V S P toreason at fastmail.fm
Tue May 22 01:40:00 BST 2012


Thanks Olly
can Xapian, at least return the line number where the matches where
found (the text files are a few, but big)
also is there a way to 'batch up' a search to send multiple queries in
one shot ?

I did not quite understand the the 'pass each chunk of text when
indexing' -- but I could probably figure this out as long the above
functionality is present in some form

thank you again
vlad

On Mon, May 21, 2012, at 06:46 AM, Olly Betts wrote:
> On Sun, May 20, 2012 at 10:59:11PM -0400, V S P wrote:
> > Obviously couple of problems 3 million times search 17GB worth of text
> 
> I'm not sure I see why this a problem unless the run time of this
> is highly sensitive.
> 
> > Second -- I do not understand how (if at all possible) to get the
> > start/end offset of the found phrase within the source file
> 
> Xapian doesn't store the byte offsets (only word offsets), so this isn't
> possible.  It can narrow down the number of files you need to go and
> look at for each replacement though, which could make quite a difference
> if many of the replacements are rarely done.
> 
> > Third  how do I insure that the phrase words are together (and the one
> > with period between them is not concidered a find).
> 
> When indexing, pass each chunk of text between periods to
> TermGenerator::index_text(), calling increase_termpos() after each
> index_text() call.  Then phrases can't span a period.
> 
> Cheers,
>     Olly


-- 
http://www.fastmail.fm - Same, same, but different...




More information about the Xapian-discuss mailing list