[Xapian-discuss] GSoC 2012

Liam xapian at networkimprov.net
Thu Feb 16 06:37:16 GMT 2012


On Wed, Feb 15, 2012 at 9:38 PM, Olly Betts <olly at survex.com> wrote:

> Google have announced their "Summer of Code" for this year - for
> background info see:
>
> http://code.google.com/soc/
>
> We took part last year with great success, and after a brief discussion
> with those who mentored last year, we concluded it was worthwhile
> applying to take part again.
>
> I'm happy to act as admin again and submit the application.
>
> I've updated of the list of project ideas for students on the wiki from
> last year, removing those done tackled last year, and updating those
> where work has been done outside GSoC:
>
> http://trac.xapian.org/wiki/GSoCProjectIdeas
>

Re: Text-Extraction Libraries, starting a new process isn't expensive (on
the order of 40usec for Linux, I believe), and prevents crashing the main
program. So the benefit of libraries vs apps would be saving any
extractor-specific initialization time, which I'd guess would be pretty
low. If init time is a factor for some extractors, one could rev those
programs (if source available) to accept a sequence of filenames via stdin
or other input stream.

Wouldn't handling archive files (tar, zip) would be the more pressing need
in this area?

Re: Support Another Language, you might mention the Node.js binding I've
been working on? It could use a LOT more Xapian features. I'd be glad to
mentor for that. https://github.com/networkimprov/node-xapian

Liam


More information about the Xapian-discuss mailing list