[Xapian-discuss] Fwd: Re: Another ue for Recoll/Xapian? - AI/Eliza

Fri Jan 18 03:22:45 GMT 2013

On Sun, Dec 23, 2012 at 11:50:23PM +1100, Philip Rhoades wrote:
> > I have been using Recoll happily for some time now but I also have a
> > need for an AI/Eliza-like facility and I thought Recoll's fast text
> > searching may allow for fast semantic parsing for such a thing - has
> > anyone considered this sort of thing before?
>
> I know nothing about this kind of things. I know that the people from
> Xapian have been working with different statistical models with students
> from the Google summer of code this last summer. Semantic models etc.  
> are probably more of their domain of knowledge than of mine. Any code
> dealing with this (apart from text extraction) will be very close to
> the Xapian layer in any case.
>
> I think that there has been work about improving search engines with
> semantic models ("concepts") as long as search has existed (much before
> internet existed). As far as I know, nothing really convincing has ever
> emerged. Maybe things are more mature now, but it seems that the general
> tendancy is more towards using sophisticated statistics than explicit
> semantic models, of which humans are apparently still the exclusive
> masters.

"Semantic" is one of those words that's been rather abused over the
years.  I think we've a long way to go before machines can "understand"
in the general case, but for some better defined tasks machines can now
do a pretty good job - for example, Named Entity Recognition:

http://en.wikipedia.org/wiki/Named-entity_recognition

The techniques Xapian currently uses are statistical, though if you
treat the system as a black box, you might think it "understands" at
some level from looking at the output it produces in relation to the
input you gave it.

The GSoC project jf refers to was probably the one implementing document
weighting based on Language Modelling, which is also in tasks like
speech recognition and machine translation, though it's in essence a
statistical technique.

So I'm not really sure what the most useful answer is.  I don't think
I'd describe anything we do as "semantic", but you can certainly build
systems using Xapian that you could apply that term to.  There are also
other libraries which do part of speech tagging, entity extraction, etc
which might be more useful to you (or useful as a source of terms to
index with Xapian).

Cheers,
    Olly