[Xapian-tickets] [Xapian] #636: get_docid() and multiple databases
Xapian
nobody at xapian.org
Mon Mar 17 13:42:51 GMT 2014
#636: get_docid() and multiple databases
-----------------------------+--------------------
Reporter: jeffrand | Owner: olly
Type: defect | Status: new
Priority: normal | Milestone:
Component: Other | Version: 1.2.12
Severity: normal | Keywords:
Blocked By: | Blocking:
Operating System: Linux |
-----------------------------+--------------------
\
\
I'm using the python bindings for xapian 1.2.12 and I'm getting some
unexpected behavior which I believe is a bug. While searching multiple
databases I am getting inconsistent values from doc.get_docid() when using
an overloaded KeyMaker class for custom sorting. The id value in the
document's data is the same as the id set for each document.
The behavior is expected when searching only one database: doc.get_docid()
== int(json.loads(doc.get_data())['id']) .
When searching more than one database the doc.get_data() will return a
value that is not the same as int(json.loads(doc.get_data())['id']).
According to the docs:
docid Xapian::Document::get_docid ( ) const
Get the document id which is associated with this document (if any).
NB If multiple databases are being searched together, then this will be
the document id in the individual database, not the merged database!
Here's my sample code and some output:
import xapian as x
import simplejson as json
db = x.Database()
db.add_database(x.Database('/var/xapian/db1.db')) #has XTYPA
q = x.Query('XTYPA')
q = x.Query(x.Query.OP_OR, q, x.Query('XTYPB'))
class WhatsTheId(x.KeyMaker):
def __init__(self):
return super(WhatsTheId, self).__init__()
def __call__(self, doc):
my_doc_id = json.loads(doc.get_data())['id']
if my_doc_id <= 10:
print doc.get_docid(), my_doc_id,
json.loads(doc.get_data())['type']
return x.sortable_serialise(1)
e = x.Enquire(db)
e.set_query(q)
e.set_sort_by_key(WhatsTheId())
e.get_mset(0, 1000000000, 0, None)
# Expected results
2 2 A
3 3 A
4 4 A
5 5 A
6 6 A
7 7 A
8 8 A
9 9 A
10 10 A
db.add_database(x.Database('/var/xapian/db2.db')) #has XTYPB
e = x.Enquire(db)
e.set_query(q)
e.set_sort_by_key(WhatsTheId())
r = e.get_mset(0, 1000000000, 0, None)
# Add another, unexpected results
3 2 A
5 3 A
7 4 A
9 5 A
11 6 A
13 7 A
15 8 A
17 9 A
19 10 A
2 1 B
4 2 B
# It will consistently modify the internal get_docid value when adding
more databases:
q = x.Query(x.Query.OP_OR, q, x.Query('XTYPC'))
db.add_database(x.Database('/var/xapian/db3.db')) #has XTYPC
e = x.Enquire(db)
e.set_query(q)
e.set_sort_by_key(WhatsTheId())
r = e.get_mset(0, 1000000000, 0, None)
4 2 A
7 3 A
10 4 A
13 5 A
16 6 A
19 7 A
22 8 A
25 9 A
28 10 A
2 1 B
5 2 B
3 1 C
6 2 C
9 3 C
12 4 C
15 5 C
18 6 C
21 7 C
24 8 C
27 9 C
30 10 C
\
\
\
--
Ticket URL: <http://trac.xapian.org/ticket/636>
Xapian <http://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list