[Xapian-discuss] What are the separators that scriptindex uses?

Jim Lynch jwl at sgi.com
Wed Sep 15 20:48:54 BST 2004


I've been asked to find out what are considered separators for 
scriptindex?  Whitespace obviously.  What is done with special 
characters?  The reason for the question is that my data contains some 
strange stuff, like output from core dumps, source code for various 
programming languages like assembly, part numbers (not just numbers, of 
course) and other wierd collections of funny characters.  Fortunately no 
unicode just yet.  I'm trying to get a feel for how difficult it's going 
to be to search for this stuff and what the rules might be. 

Also can I assume omega uses the same set of separators? 

For instance if I look for something like PARAM_DEV-445*Foggy, will it 
be found?  Will it be multiple terms? 


BTW, how are phrase searches these days? 

Thanks,
Jim.




More information about the Xapian-discuss mailing list