[Xapian-discuss] I'm having problems using queryparser's wild cards with Python

Matti Heinonen matti.heinonen at uta.fi
Fri Jun 9 09:06:48 BST 2006


Hello,

I'm having trouble with queryparser using python bindings. Using 
wildcards yields an empty query although there are matching terms in the 
database.

I'm running
* xapian 0.9.6
   with the utf-8 patch
     http://search.gmane.org/~xapian/xapian-qp-utf8-0.9.2.patch and
   and the transliteration patch
     http://article.gmane.org/gmane.comp.search.xapian.general/1927
* python 2.4.2
* the data is in Finnish and in Swedish

Running a small test programme yields:

$python test.py "terveyspalvelut"
Query string is terveyspalvelut

TEST QUERYPARSER
Parsed query to Xapian::Query(terveyspalvelut:(pos=1))
Found these docs: [48, 143, 74, 150, 31, 11, 20, 103, 92, 36]

TRUNCATE
Term is terveyspalvelut
Found these terms: ['terveyspalvelut', 'terveyspalvelutoiminnan']

$python test.py "terveyspalvelut*"
Query string is terveyspalvelut*

TEST QUERYPARSER
Parsed query to Xapian::Query()
Found these docs: []

TRUNCATE
Term is terveyspalvelut
Found these terms: ['terveyspalvelut', 'terveyspalvelutoiminnan']


Here's my test programme


#!/usr/bin/python
# -*- coding: utf-8 -*-
import sys
import xapian

# Querystring is taken from shell. Encode to utf-8.
query = sys.argv[1].encode("utf-8")
print "Query string is %s" % (query,)
print

DB = xapian.Database("xapian")

### Test queryparser
print "TEST QUERYPARSER"

# Set up query
qp = xapian.QueryParser()
qp.set_stemming_strategy(xapian.QueryParser.STEM_NONE)
parsed_query = 
qp.parse_query(query,xapian.QueryParser.FLAG_BOOLEAN|xapian.QueryParser.FLAG_PHRASE|xapian.QueryParser.FLAG_LOVEHATE|xapian.QueryParser.FLAG_BOOLEAN_ANY_CASE|xapian.QueryParser.FLAG_WILDCARD)
print "Parsed query to %s" % (parsed_query.get_description(),)

# Do query, print out results
enquire = xapian.Enquire(DB)
enquire.set_query(parsed_query)
mset = enquire.get_mset(0,10)
print "Found these docs: %s" % ([ data[0] for data in mset ],)
print


### Truncate "by hand" to check if they are present
print "TRUNCATE"

# Set up term for iteration (ie. drop "*" at the end if present)
if query[-1] == "*":
     term = query[:-1]
else:
     term = query
print "Term is %s" % (term,)

  # Iterate over matching terms
term_iterator = DB.allterms_begin()
term_iterator.skip_to(term)
matching_terms = []; cut_point = len(term)
while True:
     candidate_term = term_iterator.get_term()
     if candidate_term[:cut_point] != term:
         break
     matching_terms.append(candidate_term)
     term_iterator.next()
print "Found these terms: %s" % (matching_terms,)
print


Am I missing something? I'd rather avoid writing my own queryparser as 
Xapian's queryparser seems to have all the features I need (and more!). 
However, right truncation is a neccessity for my project.

Yours,
Matti Heinonen

-- 
Matti Heinonen                           | email: matti.heinonen at uta.fi
Atk-erikoistutkija                       | tel: +358 3 215 8523
Yhteiskuntatieteellinen tietoarkisto FSD | fax: +358 3 215 8519
FIN-33014 TAMPEREEN YLIOPISTO            | WWW: http://www.fsd.uta.fi/



More information about the Xapian-discuss mailing list