Indexing and Quick Search Tool Best Practice

 

This page provides guidelines for setting up and using indexes in Weave. It should be used in conjunction with the Indexing page which goes into technical detail about indexes and how they can be configured. Each Weave implementation is unique so the information provided below may not be relevant to all sites at all times. 

The Quick Search engine is a fully integrated part of your core Weave installation and it is not a separate installation or server system.


 

The Quick Search tool (weave.indexcombo) in the Weave Client uses indexes that are created through configuration XML files.

Once created, the indexes can be updated manually or can be scheduled to update on a regular basis. Often this update will be after hours when no-one is using the Weave server and each index can have it's own update schedule.

Indexes can be complex and need to be set up correctly in order to get the best results. If you do not set parameters appropriate to your index then your Quick Search will return the records you are after some of the time, but other times it will return, what seem to be, totally unrelated records.

We do encourage you to experiment with creating a new index, or get a better understanding of your existing index(es), and the following notes will provide you some handy tips. If you get stuck then you can also get advice from Cohga as we have set up a variety of indexes for different purposes.

  • The Quick Search tool does not replace the Search Panel (main.searchView). When you have a number of different fields to be searched simultaneously or want to find an exact match to the input criteria, the Quick Search tool is what you should be using. Its ability to use lists, convert text case, add wildcards, etc. makes it the best tool for finding exactly what you want. The Search Panel is provided for more structured or forms based queries.

  • Indexes that are used in the Quick Search use Lucene which is an open-source text search engine library (written in Java). There is a wealth of information on the internet about Lucene but some of the characteristics of the library listed by its creator (Apache) are:
    • Scalable, High-Performance Indexing

      • over 150GB/hour on modern hardware

      • small RAM requirements

      • index size roughly 20-30% the size of text indexed

    • Powerful, Accurate and Efficient Search Algorithms

      • ranked searching - best results returned first

      • many powerful query types: phrase queries, wildcard queries, proximity queries, range queries

      • sorting by any field

      • multiple-index searching with merged results

      • allows simultaneous update and searching

      • fast, memory-efficient and typo-tolerant suggesters

    • Cross-Platform Solution

      • Available as Open Source software

[Source: https://lucene.apache.org/core/]

While Lucene is a valuable library for inclusion in Weave, without understanding how it works you will not get the search results you were expecting. 
 

  • One of the advantages of Lucene is its ability to do a "sounds like" query. So a user can search for "tara" and the returned records will include "nara", "sara", and a search for "bell" will return "jells", "bella", "wells", etc. This type of search is not possible in the Search Panel. Lucene indexing also allows you to have a list of synonyms so that common alternative terms (e.g. drive, dr, dv, dve) can be taken into consideration in the search without the user having to be aware of this.  

  • Weave indexes can do "sounds like" searches because these phrases sound the same or have a similar sequence of characters. The result of this, however, is that the Quick Search tool does not work well when searching for numbers. Therefore a good way to enable users to easily search for numbers is to add a character prefix to the numbers in the keyword definition. 

    In the configuration example below we have several numbers that can be searched on, namely land_no, property_no, and proclaim_link. We also have a property_address field that will contain a number at the start. 

    Original
    <keywords>
           <datadefinition>dd.xxxx</datadefinition>
           <level1>
                  ${land_no}
                  ${property_no}
                  ${proclaim_link}
                  ${plan}
                  ${property_address} 
                  ${land_address}
                  ${owners}
            </level1>
    </keywords>

    When a user types "15" into the search box looking for land_no "15", the index may also return the land_no matching "15", property_no matching "15" and the proclaim_link matching "15". What will also be returned are all the addresses that potentially start with "15".

    So if you want to allow the user to easily search for land_no in the above example, a simple trick is to append "ln" before the land number.

    Enhanced
    <keywords>
               <datadefinition>dd.xxxx</datadefinition>
               <level1>
                      ln${land_no} ${land_no}
                      ${property_no}
                      ${proclaim_link}
                      ${plan}
                      ${property_address} 
                      ${land_address}
                      ${owners}
              </level1>
    </keywords>

    Now if a user wants to search for land_no "15" they can simply type "ln15" into the search box and it will find the correct land number. You will notice in the example above that the land_no field is duplicated, this is done to allow the user to still type in a land_no in any way and will work well for longer sequences of numbers which are usually unique or will return only a few matches.
     

  • Multiple indexes can be used together so data from different tables and databases can be searched for in the one search - this is something that is not possible in the Search Panel as each search in the Search Panel is linked to one Active Layer. 

  • A Quick Search can be linked to the Active Layer or it can be set up to search across all indexes so it ignores the Active Layer. In this way, a Quick Search can be used in your Weave client to assist novice users. It can provide a simple tool for users who are new to the concept of "active layers or selectable layers", and those who, even once experienced with this concept, are likely to struggle to remember it because of their infrequent use of the Weave client. The Quick Search tool offers one place for users to type in their search term compared to the Search Panel which can provide many, sometimes confusing, options for text entry for the web mapping novice. 

  • You can add weighting and sorting to your index results. In many cases it is not necessary to use either of these options and if you are new to the process of creating indexes for Weave then it might be best to avoid using them. If used incorrectly they can skew the results of your index and also slow down the index creation and search time. If you want to experiment with these parameters then wait until your existing index(es) are running smoothly, and you are confident in testing the index through the OSGi console or the Weave Admin Tool.
     
  • When setting up the Quick Search tool in your client XML file, you can specify an XML Attribute of type which dictates how the search will be performed (this has nothing to do with how the index is created, just how the index is used within the Weave Client). The possible type values are 'wildcard', 'fuzzy' or 'exact'. When you specify one of these search methods, it is not the only search method used, and it is better thought of as the "starting point" for the search. This is shown in Figure 1, with the type on the left, the search start point in the middle and the resulting search string on the right.
     
  • The search methods shown in Figure 1 are only used in the Quick Search tool (which is why they are specified in the client XML file rather than the index XML file). They do not form part of the indexing process. It is important to remember this when testing your index (through the OSGi console or the Index Tester in the Admin Tool) as, in order to mimic the Quick Search tool, you will need to add the wildcard (*) or fuzzy (~) characters to your search string.
     
     



  • The Indexing page outlines all elements required to build and schedule the rebuilding of an index; Figure 2 summarises this.