[Xapian-discuss] Simulating Fields

Olly Betts olly at survex.com
Mon May 4 11:59:24 BST 2009


On Mon, May 04, 2009 at 12:26:43AM -0400, Luis Alberto Zarrabeitia Gomez wrote:
> Now, what would you recommend to match the document titled "sex and the city",
> but not "sex and the city 2: the return"?

I'm not sure I understand why the sequel isn't a relevant result (albeit
one which you would want to rank lower than the exact match).  Since I
don't really seem to understand the aim, I suspect I may be missing the
point of what you're trying to do.

> Adding a value to the document and
> then checking it for the documents in the result set?

That would avoid the length limit of a term.

But I think I'd try just setting a percentage cut-off at 100%.  With the
default BM25 parameters, that will only give you the shortest document
which matches all the terms in the query (or multiple documents if there
is a tie, such as the case of two documents with the same title).

That would return "sex and the city" for a query for 'sex city' (unless
there was a better match), but I'd think that was desirable.  You can
always vet the matching document to check if it was exact or not if you
want.

Cheers,
    Olly



More information about the Xapian-discuss mailing list