[Xapian-discuss] Omindex python - first version.
Srijon Biswas
srijon.biswas at googlemail.com
Thu May 21 19:49:24 BST 2009
Hi all.
A basic version of omindex in python that works (atleast the limited amount
that I tested). Standard caveats apply.
Please let me know if this proves useful to you, and any
problems/improvements.
This needs BeautifulSoup for the html parsing. I am sure there are
better/faster alternatives (ElementTree??), but I have not really tried
them out.
Working:
- html parsing/indexing.
- text parsing and indexing.
- pdf
- has basic support to be extended, even for scriptindex kind of extension
(read code for how).
There will likely be edge conditions that dont work - that I have not tested
but basic indexing matches omindex.
To run it:
1) xapian_omindex.py compare db1 db2
- compares two dbs (for testing) to see where they differ.
2) xapian_omindex.py omindex <omindex options>
- generates the index as omindex.
- NOT supported:
-M option.
- support for a subdirectory (ie. omindex --db x --url y dir1 subdir2)
Srijon.
More information about the Xapian-discuss
mailing list