[Xapian-discuss] I'm having problems using queryparser's wild cards
with Python
Matti Heinonen
matti.heinonen at uta.fi
Fri Jun 9 09:06:48 BST 2006
Hello,
I'm having trouble with queryparser using python bindings. Using
wildcards yields an empty query although there are matching terms in the
database.
I'm running
* xapian 0.9.6
with the utf-8 patch
http://search.gmane.org/~xapian/xapian-qp-utf8-0.9.2.patch and
and the transliteration patch
http://article.gmane.org/gmane.comp.search.xapian.general/1927
* python 2.4.2
* the data is in Finnish and in Swedish
Running a small test programme yields:
$python test.py "terveyspalvelut"
Query string is terveyspalvelut
TEST QUERYPARSER
Parsed query to Xapian::Query(terveyspalvelut:(pos=1))
Found these docs: [48, 143, 74, 150, 31, 11, 20, 103, 92, 36]
TRUNCATE
Term is terveyspalvelut
Found these terms: ['terveyspalvelut', 'terveyspalvelutoiminnan']
$python test.py "terveyspalvelut*"
Query string is terveyspalvelut*
TEST QUERYPARSER
Parsed query to Xapian::Query()
Found these docs: []
TRUNCATE
Term is terveyspalvelut
Found these terms: ['terveyspalvelut', 'terveyspalvelutoiminnan']
Here's my test programme
#!/usr/bin/python
# -*- coding: utf-8 -*-
import sys
import xapian
# Querystring is taken from shell. Encode to utf-8.
query = sys.argv[1].encode("utf-8")
print "Query string is %s" % (query,)
print
DB = xapian.Database("xapian")
### Test queryparser
print "TEST QUERYPARSER"
# Set up query
qp = xapian.QueryParser()
qp.set_stemming_strategy(xapian.QueryParser.STEM_NONE)
parsed_query =
qp.parse_query(query,xapian.QueryParser.FLAG_BOOLEAN|xapian.QueryParser.FLAG_PHRASE|xapian.QueryParser.FLAG_LOVEHATE|xapian.QueryParser.FLAG_BOOLEAN_ANY_CASE|xapian.QueryParser.FLAG_WILDCARD)
print "Parsed query to %s" % (parsed_query.get_description(),)
# Do query, print out results
enquire = xapian.Enquire(DB)
enquire.set_query(parsed_query)
mset = enquire.get_mset(0,10)
print "Found these docs: %s" % ([ data[0] for data in mset ],)
print
### Truncate "by hand" to check if they are present
print "TRUNCATE"
# Set up term for iteration (ie. drop "*" at the end if present)
if query[-1] == "*":
term = query[:-1]
else:
term = query
print "Term is %s" % (term,)
# Iterate over matching terms
term_iterator = DB.allterms_begin()
term_iterator.skip_to(term)
matching_terms = []; cut_point = len(term)
while True:
candidate_term = term_iterator.get_term()
if candidate_term[:cut_point] != term:
break
matching_terms.append(candidate_term)
term_iterator.next()
print "Found these terms: %s" % (matching_terms,)
print
Am I missing something? I'd rather avoid writing my own queryparser as
Xapian's queryparser seems to have all the features I need (and more!).
However, right truncation is a neccessity for my project.
Yours,
Matti Heinonen
--
Matti Heinonen | email: matti.heinonen at uta.fi
Atk-erikoistutkija | tel: +358 3 215 8523
Yhteiskuntatieteellinen tietoarkisto FSD | fax: +358 3 215 8519
FIN-33014 TAMPEREEN YLIOPISTO | WWW: http://www.fsd.uta.fi/
More information about the Xapian-discuss
mailing list