[Xapian-discuss] docx support

Olly Betts olly at survex.com
Thu Jul 24 15:48:02 BST 2008


On Thu, Jul 24, 2008 at 12:29:09PM +0100, Colin Bell wrote:
> I use my own indexer (which is very customised) and not Omega.  
> Essentially you would have to integrate the example code I gave you  
> into the Omega source and compile it. Otherwise you could use the code  
> in your own indexer.
> 
> I'm not sure if the Xapian mega coders responsible for Omega might  
> find it worthy of official inclusion?

Support for new formats is welcome, with the usual provisos for code
contributions to almost any open source project.  It needs to be tested
and working, follow the coding style of existing code, etc - the HACKING
document in xapian-core covers the details, but most of it shouldn't be
surprising.

It really needs to be a complete patch (rather than several code
snippets) and against vanilla Xapian - the code you sent seems to refer
to variables and functions which aren't in omindex.cc, such as fileData,
fFileData, parseWordXMetaData(), mstdout_to_string().

It would also be better to use a subclass of HtmlParser (like the
existing XmlParser class) rather than adding a second XML parser to
omindex along with an external dependency.  Although the base class
is called HtmlParser, it's not really HTML-specific and should be
capable of parsing XML well enough for this...

While with enough effort I could probably use what you've sent to write
a suitable importer, that's not likely to happen any time soon I'm
afraid.

Cheers,
    Olly



More information about the Xapian-discuss mailing list