How to use omindex-produced Xapian database with xapian-haystack?

Thu Oct 7 10:21:57 BST 2021

On 06/10/2021 02:54, Olly Betts wrote:
> On Tue, Oct 05, 2021 at 11:12:44AM +0530, Charles wrote:
>> I am stuck on making xapian-haystack read our existing omindex-produced
>> Xapian database.
>>
>> The documentation linked above includes ...
>>
>> HAYSTACK_CONNECTIONS = {
>> 'default': {
>> 'ENGINE': 'xapian_backend.XapianEngine',
>> 'PATH': os.path.join(os.path.dirname(__file__), 'xapian_index'),
>> },
>> }
>>
>> ... which is a single file but our omindex-produced Xapian database is
>> several files
> I don't know anything specifically about xapian-haystack, but almost
> certainly this expects to be pointed at the directory containing those
> files (since that's what Xapian itself expects).
>
> Cheers,
> Olly
Thanks Olly

Setting the HAYSTACK_CONNECTIONS PATH to the directory that omindex 
populates got further but generated "UnpicklingError: could not find 
MARK" suggesting xapian-haystack was expecting differently formatted data.

I suspect xapian-haystack cannot be used with a Xapian database created 
by omindex as these lines from xapian-haystack's only script, 
xapian_backend.py suggest

# this maps the different reserved fields to prefixes used to
# create the database:
# id str: unique document id.
# django_id int: id of the django model instance.
# django_ct str: of the content type of the django model.
# field str: name of the field of the index.
TERM_PREFIXES = {
     ID: 'Q',
     DJANGO_ID: 'QQ',
     DJANGO_CT: 'CONTENTTYPE',
     'field': 'X'
}
...
     def update(self, index, iterable, commit=True):
         """
         Updates the `index` with any objects in `iterable` by 
adding/updating
         the database as needed.

         Required arguments:
             `index` -- The `SearchIndex` to process
             `iterable` -- An iterable of model instances to index
         Optional arguments:
             `commit` -- ignored

         For each object in `iterable`, a document is created containing all
         of the terms extracted from `index.full_prepare(obj)` with 
field prefixes,
         and 'as-is' as needed.  Also, if the field type is 'text' it 
will be
         stemmed and stored with the 'Z' prefix as well.

         eg. `content:Testing` ==> `testing, Ztest, ZXCONTENTtest, 
XCONTENTtest`

         Each document also contains an extra term in the format:

         `XCONTENTTYPE<app_name>.<model_name>`

         As well as a unique identifier in the the format:

         `Q<app_name>.<model_name>.<pk>`

         eg.: foo.bar (pk=1) ==> `Qfoo.bar.1`, `XCONTENTTYPEfoo.bar`

         This is useful for querying for a specific document 
corresponding to
         a model instance.

         The document also contains a pickled version of the object 
itself and
         the document ID in the document data field.

         Finally, we also store field values to be used for sorting 
data.  We
         store these in the document value slots (position zero is reserver
         for the document ID).  All values are stored as unicode strings 
with
         conversion of float, int, double, values being done by Xapian 
itself
         through the use of the :method:xapian.sortable_serialise method.
         """