[Xapian-discuss] How to beat Google aka Xapian & Natural Language Processing.

Kevin Duraj kevin.softdev at gmail.com
Mon Oct 1 22:01:50 BST 2007


Xapians!

If tomorrow Xapian search engine would achieved the same performance
and result in searches as Google we would not be able to beat Google,
because we would create only a copy of the searches that already
exists from Google search engine. However there is a way to beat
anyone, and there is a way to beat Google successfully as well just do
not give up. Some see it as implementing Ajax, or some cool interface,
marketing or some other nonsense. However  as I see it, the one way
how to beat Google, is to implement Natural Language Processing to
enable user to ask a question in natural human sentence and received
different results, based on the way the question in natural human
sentence was asked.

What is interesting is that the simplest thing to do for human is the
most difficult to do for computer, to recognize the meaning of a
sentence. You have easier time to recognize my misspellings and mal
form sentences than computer could recognize the meaning of a perfect
sentence. Natural Language Processing is not a new thing and there has
been lot of work done that yield inconsistent results.

What I am trying to point out is that we need start to think about
using natural language processing when placing infrastructure for
Xapian.  So far we have the following OP_AND, OP_AND_MAYBE,
OP_AND_NOT, OP_ELITE_SET, OP_FILTER, OP_NEAR, OP_OR, OP_PHRASE,
OP_VALUE_RANGE, OP_XOR search operators and we could add one more
OP_NLP.

What we can do now is to implement OP_NLP  to tagged nouns,
adjectives, adverbs, punctuations, foreign words etc. Calculate
relation between them and assign boost value to the most occurred
terms in query for example noun.

Search query example: What is Kevin Duraj doing?
OP_NLP  would analyze sentence as follow:
[what =  pronoun, question|is =
werb|kevin=noun|duraj=noun|doing=verb|?=punctuation]

We have nouns dominating  the question.  Therefore in Xapian search
engine we look first for dominating nouns in this case my name Kevin
Duraj and then within the result we search for next dominant verb and
pronoun.

PS: Can you see the future?

-- 
Cheers
  Kevin Duraj
  http://MyHealthcare.com
  Los Angeles, California



More information about the Xapian-discuss mailing list