[Xapian-discuss] php5 binding
Bill Crawford
bill.crawford at wcn.co.uk
Fri Mar 16 17:39:50 GMT 2007
On Friday 16 Mar 2007, iX Gamerz wrote:
> I have a repository with about 200000 files (principaly PDF, doc, xls, ).
> represent 20Go of data, and I want to index all my files to have a full
> text search.
>
> My first question is :
> - Can we do that or we can index only html and jpeg?
Personally I would recommend you look at e.g. pdftotext, there are various
options for converting Word and other formats, depending on whether you mind
looking at commercial software (we use one here, but I'm not going to throw
around recommendations). I'm using pdftotext here (the PDFs are produced by a
conversion process from a variety of formats including Word).
--
http://www.lost.eu/175db
More information about the Xapian-discuss
mailing list