IR on full web
Sagar Acharya
sagaracharya at tutanota.com
Sat Jul 8 15:19:57 BST 2023
What I require is to index as much web as I can with a certain limit on database size. How do I achieve that?
I want to crawl html web pages and when the format is reasonably good as with my webpage below, I want to parse and store it in xapian glass.
How do I do such a thing? I just want text, no images or videos.
Thanking you
Sagar Acharya
https://humaaraartha.in
More information about the Xapian-discuss
mailing list