<div dir="ltr">Hello James,<div><br></div><div>Thanks for the suggestions! I've tried to answer some of your questions. <span style="color:rgb(34,36,38);font-family:Arial,'Helvetica Neue',Helvetica,sans-serif;line-height:17.7273px">This is a rough idea of how the autocomplete can be implemented. I know that this would be quite slow and I'm trying to figure out how this can be improved. This is what I think can be done:</span></div><div><ul style="margin:0px 0px 1em 30px;padding:0px;border:0px;color:rgb(34,36,38);font-family:Arial,'Helvetica Neue',Helvetica,sans-serif;line-height:17.7273px"><li style="margin:0px 0px 0.5em;padding:0px;border:0px;word-wrap:break-word"><p style="margin:0px 0px 1em;padding:0px;border:0px;clear:both">Construct a weighted undirected graph with words as nodes ( weights as the number of times those 2 words have been searched together or found together in the documents).</p></li><li style="margin:0px 0px 0.5em;padding:0px;border:0px;word-wrap:break-word"><p style="margin:0px 0px 1em;padding:0px;border:0px;clear:both">Each node also keeps track of 5 most commonly used words with it(redundancy to save time from sorting its neighbours).</p></li><li style="margin:0px 0px 0.5em;padding:0px;border:0px;word-wrap:break-word"><p style="margin:0px 0px 1em;padding:0px;border:0px;clear:both">Whenever a user starts typing his query, this graph is queried and the first word is predicted by prefix matching. The results of the matching are shown in order of their searched frequency.</p></li><li style="margin:0px;padding:0px;border:0px;word-wrap:break-word"><p style="margin:0px 0px 1em;padding:0px;border:0px;clear:both">When the user starts typing the next word, the graph is visited and the neighbours of this word and any other word are retrieved and shown to the user in order of frequency(weight of edges.).</p></li></ul><p style="margin:0px 0px 1em;padding:0px;border:0px;clear:both;color:rgb(34,36,38);font-family:Arial,'Helvetica Neue',Helvetica,sans-serif;line-height:17.7273px">This is a rough overview of what I think can be used as an algorithm for autocomplete. I am going to read some research papers and improve on this. </p><p style="margin:0px 0px 1em;padding:0px;border:0px;clear:both;color:rgb(34,36,38);font-family:Arial,'Helvetica Neue',Helvetica,sans-serif;line-height:17.7273px">Learning from user queries: The method I have suggested is pretty basic. Incrementing the weight of edges whenever 2 or more words are searched together. </p><p style="margin:0px 0px 1em;padding:0px;border:0px;clear:both;color:rgb(34,36,38);font-family:Arial,'Helvetica Neue',Helvetica,sans-serif;line-height:17.7273px">Stop Words: Since the prediction is quite basic right now, I dont think that stop words can be integrated in providing query predictions in scope of this project. What are your thoughts? </p><p style="margin:0px 0px 1em;padding:0px;border:0px;clear:both;color:rgb(34,36,38);font-family:Arial,'Helvetica Neue',Helvetica,sans-serif;line-height:17.7273px">Another part of this project would be to implement the bindings for currently supported languages.</p><p style="margin:0px 0px 1em;padding:0px;border:0px;clear:both;color:rgb(34,36,38);font-family:Arial,'Helvetica Neue',Helvetica,sans-serif;line-height:17.7273px"><span style="line-height:17.7273px">I thought about your suggestion to let autocomplete be a part of xapian and use a separate database for it. What are your views on this?</span></p><p style="margin:0px 0px 1em;padding:0px;border:0px;clear:both;color:rgb(34,36,38);font-family:Arial,'Helvetica Neue',Helvetica,sans-serif;line-height:17.7273px"><span style="line-height:17.7273px">Ayush</span></p></div><div><br></div><div><br></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Mar 10, 2016 at 10:51 PM, James Aylett <span dir="ltr"><<a href="mailto:james-xapian@tartarus.org" target="_blank">james-xapian@tartarus.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span class="">On Thu, Mar 10, 2016 at 05:59:38PM +0530, Ayush Gupta wrote:<br>
<br>
> Could you please expand on the project idea of integration of xapian in a<br>
> framework with an example. I did not fully understand the requirements of<br>
> this project.<br>
<br>
</span>It would be about adding or improving support for Xapian when used<br>
with some sort of programming framework, probably a web development<br>
framework like Rails, Django, Play, &c. You'd need experience with the<br>
framework you wanted to work with, and to think about the use cases<br>
that need to be supported, and design an API to allow them to be<br>
solved.<br>
<br>
Trac (which is mentioned in the project description) effectively<br>
contains its own framework, which is why that's under the same<br>
project.<br>
<br>
It's a little difficult to provide concrete requirements, because<br>
that's very dependent on the framework you choose and what you want to<br>
make possible. For instance, Django has a search abstraction system<br>
called Haystack, so for that it might make sense to improve the Xapian<br>
support there. For others, there may be no support at all and you'd<br>
have to start from scratch.<br>
<span class=""><br>
> Also I want to discuss an idea of my own. Xapian doesn't have an auto<br>
> complete feature. It is quite common for an search engine to have an auto<br>
> complete feature. What I propose is a API that is totally separate from<br>
> xapian core, has its own indexing and learns from user queries as well as<br>
> documents. I know this is a very rough idea, please help me refine it. What<br>
> are your views on this?<br>
<br>
</span>It sounds like an interesting project, but it's difficult to evaluate<br>
further until you've put some more detail around it. That's the big<br>
difference between a project we've listed and one you come up with<br>
yourself. If you can get it to a draft proposal in some form, then we<br>
can provide feedback on that. At this point, beyond saying that it<br>
sounds like a good idea I can't offer any concrete suggestions, as I<br>
mostly have questions :-)<br>
<br>
Some things that I think you should consider in the proposal include<br>
what API you're suggesting for it (try starting by writing the code<br>
you'd want to write to use it, and see what feels natural). Another<br>
key aspect would be to explain what it is about autocomplete that<br>
couldn't be achieved by directly using Xapian (perhaps with a separate<br>
database for autocomplete). Other things to consider include how<br>
synonyms and spelling correction would apply, if at all? What sort of<br>
ranking model makes sense for autocompletion? What sort of data are<br>
you dealing with? You suggest it could learn from user queries; your<br>
proposal should explain how, and what benefit that would provide (over<br>
just the initially-indexed data).<br>
<span class=""><font color="#888888"><br>
J<br>
<br>
--<br>
James Aylett, occasional trouble-maker<br>
<a href="http://xapian.org" rel="noreferrer" target="_blank">xapian.org</a><br>
</font></span></blockquote></div><br></div></div>