[Xapian-discuss] How to beat Google aka Xapian & Natural Language Processing.

Kevin Duraj kevin.softdev at gmail.com
Tue Oct 9 19:02:08 BST 2007


On 10/1/07, David P. Novakovic <davidnovakovic at gmail.com> wrote:
> disclaimer: I work in NLP research, so I'm a believer. I'm also not a
> xapian dev, so I could be wrong :)
>
> While i do think that NLP will play a big role in the future of
> search, what makes you think that Google doesn't have the resources to
> do it better? :P
>
> Anyway, you have mentioned two techniques from NLP there, which are
> part of speech tagging and question asking. It becomes very unwieldy,
> very quickly to include stop words which tend to overshadow other more
> meaningful relationships in the text, and manually tag every term in
> every context in the system. This would lead to large overheads in the
> core engine. Question asking is an area of research that is still
> getting a lot of attention, and just like most other areas of NLP it
> is accepted that there is no single way of doing things. It depends
> highly on the data you are indexing/querying.
>
> I believe one of the wonderful things about xapian is that it's fast,
> simple and does the job better than a simple keyword search all of the
> time, just as many other search engines do.
>
> All natural language search companies (except powerset) have
> acknowledged they stand little chance against Google, and instead
> address a particular niche.

David,

In the history majority of people were told many times that they stand
little chance against knowledge, freedom and progress. Do you know
what each time happen to those who were telling to the majorities that
they can stand little chance? They were humiliated or are no longer
here. That is what we majority learn from the history.

But I am puzzled about something. What make you think that any
corporation can compete with open source and not fail in time? Or what
make you think that any corporation can hire more programmers then
open source community?

-- 
Cheers
  Kevin Duraj
  http://MyHealthcare.com
  Los Angeles, California



> While I hate to be a buzz kill, this is a very very large area of
> research, not something we can dive head first into and just implement
> straight away.
>
> my 2c.
>
> David
>
> On 10/2/07, Kevin Duraj <kevin.softdev at gmail.com> wrote:
> > Xapians!
> >
> > If tomorrow Xapian search engine would achieved the same performance
> > and result in searches as Google we would not be able to beat Google,
> > because we would create only a copy of the searches that already
> > exists from Google search engine. However there is a way to beat
> > anyone, and there is a way to beat Google successfully as well just do
> > not give up. Some see it as implementing Ajax, or some cool interface,
> > marketing or some other nonsense. However  as I see it, the one way
> > how to beat Google, is to implement Natural Language Processing to
> > enable user to ask a question in natural human sentence and received
> > different results, based on the way the question in natural human
> > sentence was asked.
> >
> > What is interesting is that the simplest thing to do for human is the
> > most difficult to do for computer, to recognize the meaning of a
> > sentence. You have easier time to recognize my misspellings and mal
> > form sentences than computer could recognize the meaning of a perfect
> > sentence. Natural Language Processing is not a new thing and there has
> > been lot of work done that yield inconsistent results.
> >
> > What I am trying to point out is that we need start to think about
> > using natural language processing when placing infrastructure for
> > Xapian.  So far we have the following OP_AND, OP_AND_MAYBE,
> > OP_AND_NOT, OP_ELITE_SET, OP_FILTER, OP_NEAR, OP_OR, OP_PHRASE,
> > OP_VALUE_RANGE, OP_XOR search operators and we could add one more
> > OP_NLP.
> >
> > What we can do now is to implement OP_NLP  to tagged nouns,
> > adjectives, adverbs, punctuations, foreign words etc. Calculate
> > relation between them and assign boost value to the most occurred
> > terms in query for example noun.
> >
> > Search query example: What is Kevin Duraj doing?
> > OP_NLP  would analyze sentence as follow:
> > [what =  pronoun, question|is =
> > werb|kevin=noun|duraj=noun|doing=verb|?=punctuation]
> >
> > We have nouns dominating  the question.  Therefore in Xapian search
> > engine we look first for dominating nouns in this case my name Kevin
> > Duraj and then within the result we search for next dominant verb and
> > pronoun.
> >
> > PS: Can you see the future?
> >
> > --
> > Cheers
> >   Kevin Duraj
> >   http://MyHealthcare.com
> >   Los Angeles, California
> >
> > _______________________________________________
> > Xapian-discuss mailing list
> > Xapian-discuss at lists.xapian.org
> > http://lists.xapian.org/mailman/listinfo/xapian-discuss
> >
>



More information about the Xapian-discuss mailing list