[Xapian-devel] Plan for indexing XML files

Peter Karman peter at peknet.com
Sun May 18 17:53:22 BST 2014


On 5/18/14 10:24 AM, Aarsh Shah wrote:
> Hello,
> 
> I have added an entry to my journal containing a link to a sample XML
> file and my ideas on how to index the entire imdb movie database.:-
> 
> http://trac.xapian.org/wiki/GSoC2014/Performance%20and%20Optimisation/Journal
> 
> Please do let me know what you think.
> 

I'm not sure if you're looking for input from non-mentors, so I
apologize in advance if this reply is inappropriate.

The plan looks sane to me.

I offer this an example of prior art along the same lines:

https://github.com/karpet/libswish3/blob/master/src/xapian/swish_xapian.cpp

It uses the libxml2 library (the same library Python's lxml bindings
use) to parse XML files and create Xapian indexes. Perhaps it may aid as
a baseline for your own efforts.


-- 
Peter Karman  .  http://peknet.com/  .  peter at peknet.com



More information about the Xapian-devel mailing list