Search Algorithm Used for Keyword Search

Dhiraj R dhirajr57 at yahoo.com
Sun Apr 9 08:43:08 BST 2017


I'm sorry I could not figure out how the document answers my question. My question is about the 'term' search in the dictionary of terms (term list) maintained by Xapian. I assume that Xapian maintains an alphabetical list of terms. When a user types a search term like "integration", the system first looks up this dictionary of terms for this word "integration". Is this search done using binary search or some other method ? 

Once the term is located in the dictionary, then the posting list can be accessed. But my question is not related to the posting list. It is only about the initial search in the dictionary of terms for the word entered by the user.

The document mentions "....Some systems represent each term by a number internally, so the term list is then also a list of numbers. Xapian doesn't - it uses the terms themselves, and uses prefix compression to store them compactly". If this is the case, my question is : How is the search for the user-entered word done on this prefix compressed store? 

Thank you,

Dhiraj R
--------------------------------------------
On Sat, 8/4/17, James Aylett <james at tartarus.org> wrote:

 Subject: Re: Search Algorithm Used for Keyword Search
 To: "Dhiraj R" <dhirajr57 at yahoo.com>
 Cc: "Xapian Development" <xapian-devel at lists.xapian.org>
 Date: Saturday, 8 April, 2017, 5:34 PM
 
 On 8 Apr 2017, at 18:03, Dhiraj R
 <dhirajr57 at yahoo.com>
 wrote:
 
 > Dear Mr. Aylett,
 
 Please keep replies on the mailing list so others can answer and everyone can benefit.
 
 > Thank you for your immediate response to my query. Actually, my question is not about the document (text) part. My question is confined to method of searching the list of keywords for a search term entered by the user. Assuming we have already created a huge list of keywords by looking up large number
 of documents. If this list of keywords is a sorted list, then binary search for a search term entered by the user is generally ideal.
 
 I don't think you've understood that document, which does answer your question. Xapian isn't doing binary search; it implements a probabilistic information retrieval system, so it's generally working off inverted indexes (ie terms index documents when we're running a search). By default Xapian uses BM25 as the weighting scheme, which is used to determine the ranking of documents in search results.
 
 J
 
 -- 
  James Aylett
  devfort.com — spacelog.org —
 tartarus.org/james/
 



More information about the Xapian-devel mailing list