[Xapian-discuss] Best way of showing what matched the search

Tue Aug 19 01:15:17 BST 2008

Hi,

We have an application which has "main" records, like, for example, a
person, which may have several "dependent" records such as an
attachment, an address, an email, a phone number, a note left by an
user.

I want the results to look like:

john AND smart

----------------------

2 results:

- [John] Smith
Attachment: I am very [smart].
Note: He is not as [smart] as he said.

- Peter Pérez
Phone number: 555-[john]-is-[smart]

What I've done is:

People and other main records are indexed as one document which contains
all the its data and all the data of the dependent records. They also
have the Igrouped prefixed term. In the data of the document I keep the
type and the id of the record.

Notes, attachments, etc are also indexed separately one document at a
time. Each of them also has an IdependentOf<type><id> term.

To do a search:

1. I do a query with the "Igrouped" prefixed term and whatever the query
parser gave me.

2. I collect all the ids and types of the previous results in a list
which looks like this:
[IdependentOfperson1, IdependentOfcompany5, ...]

3. I remove all operators from the search the user entered (AND OR,
parenthesis) and get all the search terms in a list:
[john, smart]

4. I build a second query which has a negated Igrouped all the
dependentof terms are ORed together and all terms are ANDed with that,
example:
(IdependentOfperson1 OR IdependentOfcompany5) AND john AND smart AND NOT
Igrouped

5. I search for it and use the dependent record's data to locate the
master records and group them together in the results.

6. I use my relational db to get the full text of the results which
matched the search and use a simple algorithm which tries to look for
words which are written close to each other and cut show that text to
the user (which looks more or less like google's results).

Considerations:

1. I index all things twice, which affects the weight all terms get. To
compensate this I index not-grouped terms with a weight of 0.

2. I guess the index is much bigger than it should.

3. I could probably have two separate dbs for grouped and ungrouped
items.

4. I probably should have used "collapse keys" but I think that they are
essentially filters and don't really achieve what I want (which would be
to logically consider all the terms or some documents as being part of
one virtual bigger document). Therefore searching for john AND smart
wouldn't have found "John Smith" since the words are in separate
records.

Am I doing something wrong? Is there any better way to do it?

. A .