[Xapian-devel] Gsoc- Text Extraction Libraries

Zongwei Li zli2009 at hotmail.com
Tue Mar 22 05:37:17 GMT 2011


Hello,
My name is Zongwei, and I'm a 2nd year computer science major at UCLA.  I was interested in the text extraction library project, since I have almost 2 years experience with C++ and half a year with Linux/Unix.  As I look the formats that Omega already supports, I see that there a lot of formats that only work if a certain program is included.  What would be the most important formats to support first?  Based on the ideas page, it seems that .zip, pdf, and .doc would be the most helpful to have.  Which formats would be preferred to be implemented after those?  Roughly speaking, how many would be a feasible amount for 12 weeks?
Pleasure to meet everyone,
Zongwei 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20110321/62790d71/attachment.htm>


More information about the Xapian-devel mailing list