[Xapian-discuss] Fwd: Re: Another ue for Recoll/Xapian? - AI/Eliza

Fri Jan 18 16:32:09 GMT 2013

Olly,

On 2013-01-18 14:22, Olly Betts wrote:
> On Sun, Dec 23, 2012 at 11:50:23PM +1100, Philip Rhoades wrote:
>>> I have been using Recoll happily for some time now but I also have a
>>> need for an AI/Eliza-like facility and I thought Recoll's fast text
>>> searching may allow for fast semantic parsing for such a thing - has
>>> anyone considered this sort of thing before?
>> 
>> I know nothing about this kind of things. I know that the people from
>> Xapian have been working with different statistical models with 
>> students
>> from the Google summer of code this last summer. Semantic models etc.
>> are probably more of their domain of knowledge than of mine. Any code
>> dealing with this (apart from text extraction) will be very close to
>> the Xapian layer in any case.
>> 
>> I think that there has been work about improving search engines with
>> semantic models ("concepts") as long as search has existed (much 
>> before
>> internet existed). As far as I know, nothing really convincing has 
>> ever
>> emerged. Maybe things are more mature now, but it seems that the 
>> general
>> tendancy is more towards using sophisticated statistics than explicit
>> semantic models, of which humans are apparently still the exclusive
>> masters.
> 
> "Semantic" is one of those words that's been rather abused over the
> years.  I think we've a long way to go before machines can 
> "understand"
> in the general case, but for some better defined tasks machines can 
> now
> do a pretty good job - for example, Named Entity Recognition:
> 
> http://en.wikipedia.org/wiki/Named-entity_recognition
> 
> The techniques Xapian currently uses are statistical, though if you
> treat the system as a black box, you might think it "understands" at
> some level from looking at the output it produces in relation to the
> input you gave it.
> 
> The GSoC project jf refers to was probably the one implementing 
> document
> weighting based on Language Modelling, which is also in tasks like
> speech recognition and machine translation, though it's in essence a
> statistical technique.
> 
> So I'm not really sure what the most useful answer is.  I don't think
> I'd describe anything we do as "semantic", but you can certainly build
> systems using Xapian that you could apply that term to.  There are 
> also
> other libraries which do part of speech tagging, entity extraction, 
> etc
> which might be more useful to you (or useful as a source of terms to
> index with Xapian).

Thanks for that info - very useful and much appreciated!  I will see 
how things develop but I think Xapian might still be useful so I will 
probably return with more questions at some stage . .

Regards,

Phil.
-- 
Philip Rhoades

GPO Box 3411
Sydney NSW	2001
Australia
E-mail:  phil at pricom.com.au