[Xapian-discuss] php5 binding

Bill Crawford bill.crawford at wcn.co.uk
Fri Mar 16 17:39:50 GMT 2007


On Friday 16 Mar 2007, iX Gamerz wrote:

> I have a repository with about 200000 files (principaly PDF, doc, xls, ).
> represent 20Go of data, and I want to index all my files to have a full
> text search.
>
> My first question is :
> - Can we do that or we can index only html and jpeg?

Personally I would recommend you look at e.g. pdftotext, there are various 
options for converting Word and other formats, depending on whether you mind 
looking at commercial software (we use one here, but I'm not going to throw 
around recommendations). I'm using pdftotext here (the PDFs are produced by a 
conversion process from a variety of formats including Word).

-- 
http://www.lost.eu/175db



More information about the Xapian-discuss mailing list