<div dir="ltr"><br>> Hi Shivang! Welcome to Xapian :)<br> <div>Hello James. Thank you :)<br><br>> It's worth pointing out that xapian-delve is not a shared library. Perhaps you mean you had difficulties with shared libraries while building or >running it? </div><div><br></div><div>Yeah that's exactly what I meant, <a href="http://libxapian-1.5.so">libxapian-1.5.so</a> to be specifically.</div><div><br></div><div>>The fact that you're using xapian-delve-1.5 tells me that you're using the development work from git, which is not the recommended >approach in the getting started guide, but is absolutely the right way when working on Xapian's codebase. (You probably need to look at the >Xapian developer guide to ensure you're set up properly with the development codebase: <a href="https://xapian-developer-">https://xapian-developer-</a>><a href="http://guide.readthedocs.io/en/latest/">guide.readthedocs.io/en/latest/</a>)<br><br>Thanks for mentioning, I've got your point.</div><div><br>> You'll need to propose quite a lot of detail on how you're going to implement this using the Xapian backend database and weighting >system. I suspect you'll have to extend it a fair amount to support TW-IDF, because we have no graph support at present. I haven't read the >paper though, so it's possible you can do this using some preprocessing in some way.<br><br>I will surely try to propose a completely detailed plan on how graph-of-words model could be implemented in Xapian. The fact that Xapian does not support graph at present could make things a bit difficult. But, I strongly believe that it would be worthful as this model will judge the terms according to their relationship order in the documents which would enhance the effectiveness of search results (please let me know if you think otherwise).</div><div><br>> It's also worth noting that we've sometimes seen quite different evaluation results to the academic research in the past. There's a module >that implements some evaluation metrics (<a href="https://github.com/samuelharden/xapian-evaluation">https://github.com/samuelharden/xapian-evaluation</a>) which can be used to gauge how a new ?>weighting scheme compares to the others we have, designed to run with TREC data. </div><div><br></div><div>This module will surely play a key role in the evaluation. I will look into it as well.</div><div><br>> Again, you'll need to propose how to fit TF-ATO into Xapian's database and weighting framework. Can the document centroid approach be >done as a processing step during indexing? (Again, I haven't had time to read the paper yet.)<br><br>I didn't get exactly, what you mean by 'during indexing'. What I can think of is, after we had indexed all the terms and assigned weights to each index term. If that's what you mean then yes, document centroid approach can be computed at that time only from the documents vectors of term weights. Its basic aim is to reduce the size of documents in the dataset by the discriminative approach which is nothing, but to remove the terms having weight lesser than documents centroid. According to the paper, this approach gives an average reduction in size of
2.3% from the actual dataset size. </div><div>I will come up with a detailed proposal for this scheme as well.</div><div><br></div><div>Thank You for your time.</div><div><br></div><div>Regards,</div><div>Shivang Bansal<br><br></div></div>