[Xapian-discuss] Omindex python - first version.

Srijon Biswas srijon.biswas at googlemail.com
Thu May 21 19:49:24 BST 2009


Hi all.

A basic version of omindex in python that works (atleast the limited amount
that I tested). Standard caveats apply.
Please let me know if this proves useful to you, and any
problems/improvements.

This needs BeautifulSoup for the html parsing. I am sure there are
better/faster alternatives (ElementTree??),  but I have not really tried
them out.

Working:
 - html parsing/indexing.
 - text parsing and indexing.
 - pdf
 - has basic support to be extended, even for scriptindex kind of extension
(read code for how).

There will likely be edge conditions that dont work - that I have not tested
but basic indexing matches omindex.

To run it:
1) xapian_omindex.py compare db1 db2
 - compares two dbs (for testing) to see where they differ.

2) xapian_omindex.py omindex <omindex options>
 - generates the index as omindex.
 - NOT supported:
   -M option.
   - support for a subdirectory (ie. omindex --db x --url y dir1 subdir2)

Srijon.


More information about the Xapian-discuss mailing list