[Xapian-devel] Results GSOC 2011

Olly Betts olly at survex.com
Mon May 2 12:34:27 BST 2011


On Mon, May 02, 2011 at 01:05:11PM +0400, Julia Medvedeva wrote:
> 2011/5/1 Olly Betts <olly at survex.com>:
> 
> > And note that although GSoC prohibits students from collaborating on a
> > project, there's no reason why you can't work together if you're not
> > in GSoC.
> 
> I think it will be difficult a bit. The parser implementation is
> consist from three part: grammar writing, scanner  and parser. Each
> part is implementing sequentially and I don't see any way to separate
> this task on two independent parts.  But, probably, somebody have any
> idea how it could be done.

Yes, the queryparser project is probably a bit tricky to split up.
"You write the scanner, I'll write the parser" probably isn't going
to work well as it is hard to test the parser at all without the
scanner.  Any collaboration would likely need to be at the level of
smaller sub-tasks, which might get a bit frustrating.

But it might be better to focus on alternative projects with a smaller
scope anyway, unless you have about 3 months free to work on the project
even without the stipend from Google.  Getting a smaller project fully
completed is better all-round than partly completing a more ambitious
project, and you can always work on second or third smaller project.

> > It would be, though it takes time to come up with a list of interesting
> > project ideas with suitable scope, so the list is likely to grow fairly
> > slowly.  If there's an area someone is interested in, please say as we
> > might be able to come up with some relevant ideas.
> 
> I am interesting in parsing technology and data extracting.

The "FilterProcessor" concept might make a good project in this area.

The idea is to be able to specify an object to be used by the
QueryParser to handle filters with a particular prefix, in a similar way
to how ValueRangeProcessor objects are used for value ranges.

So for example with gmane, groups filters are indexed with a G prefix,
and without the "gmane.", so gmane.comp.search.xapian.devel is
Gcomp.search.xapian.devel.  It would be great to allow the user to
include a group: filter in their query string which worked for both
group:comp.search.xapian.devel or group:gmane.comp.search.xapian.devel
but there's no nice way to achieve this currently - only the first
example works.

If I could say "process 'group:' prefixed filters with this object"
then the object could remove any "gmane." prefix from the group name
when building the filter term.

And if the object returns a subquery, it can be used to perform wildcard
expansion and all sorts of other neat tricks - for example, this could
be turned into a suitable value range:

  date:"last year"

There's a bit more discussion in this ticket:

http://trac.xapian.org/ticket/128

Cheers,
    Olly



More information about the Xapian-devel mailing list