[Xapian-discuss] Fwd: Re: Another ue for Recoll/Xapian? - AI/Eliza
Philip Rhoades
phil at pricom.com.au
Fri Jan 18 16:32:09 GMT 2013
Olly,
On 2013-01-18 14:22, Olly Betts wrote:
> On Sun, Dec 23, 2012 at 11:50:23PM +1100, Philip Rhoades wrote:
>>> I have been using Recoll happily for some time now but I also have a
>>> need for an AI/Eliza-like facility and I thought Recoll's fast text
>>> searching may allow for fast semantic parsing for such a thing - has
>>> anyone considered this sort of thing before?
>>
>> I know nothing about this kind of things. I know that the people from
>> Xapian have been working with different statistical models with
>> students
>> from the Google summer of code this last summer. Semantic models etc.
>> are probably more of their domain of knowledge than of mine. Any code
>> dealing with this (apart from text extraction) will be very close to
>> the Xapian layer in any case.
>>
>> I think that there has been work about improving search engines with
>> semantic models ("concepts") as long as search has existed (much
>> before
>> internet existed). As far as I know, nothing really convincing has
>> ever
>> emerged. Maybe things are more mature now, but it seems that the
>> general
>> tendancy is more towards using sophisticated statistics than explicit
>> semantic models, of which humans are apparently still the exclusive
>> masters.
>
> "Semantic" is one of those words that's been rather abused over the
> years. I think we've a long way to go before machines can
> "understand"
> in the general case, but for some better defined tasks machines can
> now
> do a pretty good job - for example, Named Entity Recognition:
>
> http://en.wikipedia.org/wiki/Named-entity_recognition
>
> The techniques Xapian currently uses are statistical, though if you
> treat the system as a black box, you might think it "understands" at
> some level from looking at the output it produces in relation to the
> input you gave it.
>
> The GSoC project jf refers to was probably the one implementing
> document
> weighting based on Language Modelling, which is also in tasks like
> speech recognition and machine translation, though it's in essence a
> statistical technique.
>
> So I'm not really sure what the most useful answer is. I don't think
> I'd describe anything we do as "semantic", but you can certainly build
> systems using Xapian that you could apply that term to. There are
> also
> other libraries which do part of speech tagging, entity extraction,
> etc
> which might be more useful to you (or useful as a source of terms to
> index with Xapian).
Thanks for that info - very useful and much appreciated! I will see
how things develop but I think Xapian might still be useful so I will
probably return with more questions at some stage . .
Regards,
Phil.
--
Philip Rhoades
GPO Box 3411
Sydney NSW 2001
Australia
E-mail: phil at pricom.com.au
More information about the Xapian-discuss
mailing list