[Xapian-discuss] Extremely high WDF's
David Morris-Oliveros
dmorris at sirca.org.au
Wed Jun 27 07:48:34 BST 2007
Hi, I'm thinking of using the WDF of terms for something other than
actual "frequency".
I have to index some pages over time. The actual content of the document
isn't really that important, but it needs to find a term that may have
only appeared in a 1minute interval over the 10-year life of the document.
So I've devised a way to just extract terms, and associated "life" of
that term, it could be contiguous, it could be popping in and out all
the time.
Now I want to use the WDF to give more weight to terms that have
appeared on that page throughout the life of the document, as opposed to
terms that only appeared briefly. I thought of adding all the seconds
that the term has appeared on that page, and that could be its WDF.
However, this would give me WDF's well into the millions.
Since it's already been more than 24hours since my last ludicrous idea,
I'd thought it would be time for another one.
Plan B: normalize the time from 1..N where N is the number of terms that
have ever appeared on the page and then just assign each term its order
in that range.
More information about the Xapian-discuss
mailing list