[Xapian-discuss] Getting custom field data from the page through crawling

Olly Betts olly at survex.com
Fri Feb 9 06:42:22 GMT 2007


On Wed, Feb 07, 2007 at 11:21:37PM -0800, Matt Barnicle wrote:
> Is there a better way to achieve this result?

I think you probably want to use a web crawling library which gives you
full access to the page text for each page.  I don't know such libraries
well enough to recommend a particular one though.

Another approach is to mirror the site locally (with wget for example)
and then index from this local mirror.

> A second question that goes along with that one..  Can I have multiple
> field datum with the same name?

Yes, Omega's $field{} command documents how it is handled:

    If multiple instances of field exist the field values are returned
    tab separated

I've always thought this was a slightly odd feature though - if you
really want this, it seems better to just put the tab-separated values
into a single field and save yourself the bytes required to repeat
"FIELDNAME=" each time...

Cheers,
    Olly



More information about the Xapian-discuss mailing list