[Xapian-discuss] Getting custom field data from the page through
crawling
Olly Betts
olly at survex.com
Fri Feb 9 06:42:22 GMT 2007
On Wed, Feb 07, 2007 at 11:21:37PM -0800, Matt Barnicle wrote:
> Is there a better way to achieve this result?
I think you probably want to use a web crawling library which gives you
full access to the page text for each page. I don't know such libraries
well enough to recommend a particular one though.
Another approach is to mirror the site locally (with wget for example)
and then index from this local mirror.
> A second question that goes along with that one.. Can I have multiple
> field datum with the same name?
Yes, Omega's $field{} command documents how it is handled:
If multiple instances of field exist the field values are returned
tab separated
I've always thought this was a slightly odd feature though - if you
really want this, it seems better to just put the tab-separated values
into a single field and save yourself the bytes required to repeat
"FIELDNAME=" each time...
Cheers,
Olly
More information about the Xapian-discuss
mailing list