[Xapian-discuss] check for blacklisted words (and thanks)
James Aylett
james-xapian at tartarus.org
Wed May 21 10:17:37 BST 2008
On Wed, May 21, 2008 at 09:23:28AM +0200, Alessandro Pasotti wrote:
> Now the question: I must check if a particular document contains
> blacklisted words (which are in a textfile, unstemmed one per line),
> is there a way to restrict a query to a single document and return a
> boolean value if one of the terms in the query are contained in the
> checked document?
If you want the blacklist to work unstemmed, and are using the
QueryParser, you can construct a new Query using
QueryParser::unstem_begin() and QueryParser::unstem_end(), OP_OR them
all together, and then OP_FILTER with a special (probably prefixed)
term that's only in the blacklist document. You'll get back nothing,
or the blacklist document.
If you want to employ stemming, instead use Query::get_terms_begin()
to get out the stemmed terms.
There are going to be other ways, possibly more efficient, than doing
this (for instance, if you're not using a stopper, you could write a
custom one and check if it's fired on any of your words; however I
suspect the above will scale to lots of blacklisted words better, if
that's an issue for you).
J
--
/--------------------------------------------------------------------------\
James Aylett xapian.org
james at tartarus.org uncertaintydivision.org
More information about the Xapian-discuss
mailing list