[Xapian-discuss] Xapian and "document filters"

Charlie Hull charlie at juggler.net
Mon Apr 27 12:16:59 BST 2009


Gavin Whitehead wrote:
> Please forgive what I suspect is a newbie question - but I'm a newbie!
> 
> I have been looking around http://xapian.org/ for information about what 
> document types (.pdf, .doc etc) Xapian supports.  All I have found so far 
> is the section on the Omega page 
> (http://xapian.org/docs/omega/overview.html) which talks about external 
> filter programs (pdftotext etc).
> 
> Does Xapian rely entirely on external file programs?  Is there any 
> built-in support for indexing non-plain text 'documents'?
> 
Xapian itself (the core library) doesn't handle anything apart from 
plain text - it's intended to implement the part of a system after text 
has been extracted from documents.

Omega handles many file formats, but relies on the use of external 
filters for everything except HTML, some XML formats, and plain text.

Regards

Charlie
> 
> Regards,
> 
> Gavin Whitehead
> 
> 
> Technical Consultant (Messaging)
> Steria 
> Switchboard:0870 600 4466
> Direct: 01442 884883
> Mobile:07966 824883
> 
> Think before you print - save energy and paper
> 
> This email originates from Steria*. It, and any attachments, may contain confidential information and may be subject to copyright or other intellectual property rights. It is only for the use of the addressee(s). You may not copy, forward, disclose, save or otherwise use it in any way if you are not the addressee(s) or responsible for delivery.
> If you receive this email by mistake, please advise the sender and cancel it immediately.
> Steria may monitor the content of emails within its network to ensure compliance with its policies and procedures.
> Any email is susceptible to alteration and its integrity cannot be assured. Steria shall not be liable if the message is altered, modified, falsified, or edited.
> _____________________________________________________
> * Steria Limited, number 4077975;
> Steria Recruitment Limited, number 1437998.
> Registered in England and Wales; registered office Three Cherry Trees Lane, Hemel Hempstead, Hertfordshire HP2 7AH
> _______________________________________________
> Xapian-discuss mailing list
> Xapian-discuss at lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-discuss
> 




More information about the Xapian-discuss mailing list