[Xapian-discuss] [q] phrase replacement in thousands of text files
V S P
toreason at fastmail.fm
Tue May 22 01:40:00 BST 2012
Thanks Olly
can Xapian, at least return the line number where the matches where
found (the text files are a few, but big)
also is there a way to 'batch up' a search to send multiple queries in
one shot ?
I did not quite understand the the 'pass each chunk of text when
indexing' -- but I could probably figure this out as long the above
functionality is present in some form
thank you again
vlad
On Mon, May 21, 2012, at 06:46 AM, Olly Betts wrote:
> On Sun, May 20, 2012 at 10:59:11PM -0400, V S P wrote:
> > Obviously couple of problems 3 million times search 17GB worth of text
>
> I'm not sure I see why this a problem unless the run time of this
> is highly sensitive.
>
> > Second -- I do not understand how (if at all possible) to get the
> > start/end offset of the found phrase within the source file
>
> Xapian doesn't store the byte offsets (only word offsets), so this isn't
> possible. It can narrow down the number of files you need to go and
> look at for each replacement though, which could make quite a difference
> if many of the replacements are rarely done.
>
> > Third how do I insure that the phrase words are together (and the one
> > with period between them is not concidered a find).
>
> When indexing, pass each chunk of text between periods to
> TermGenerator::index_text(), calling increase_termpos() after each
> index_text() call. Then phrases can't span a period.
>
> Cheers,
> Olly
--
http://www.fastmail.fm - Same, same, but different...
More information about the Xapian-discuss
mailing list