The Weave quick search function allows you to pre-index your data for faster searching and provides the user with a simple single input box with which to enter the search criteria.
The indexing allows the user to search across multiple types of entities with a single input box, which provides a drop-down list of the top matches for the user to choose from. Note that this limits the results of the search to a single entity, which is different from the existing search functionality which allows for multiple entities to be selected at once but only of a single type (point, line, polygon).
The indexing is setup in two parts. On the server the indexes must be configured to tell Weave which tables to search for each entity, and on the client the input box must be placed somewhere in the user interface.
Installation
To use the indexing functionality in Weave you need to install and start the com.cohga.server.index
bundle, which at the time of writing was at version 1.0.28 (which added the scheduling).
Namespace
com.cohga.server.index
Client
Adding the index search input box to the client involves adding a new item to the UI, and that new item is called 'weave.indexcombo'. If we assume that we have a toolbar configured for our client and wish to add the index search box to that toolbar then we'd update the client configuration to something like:
<toolbar> <item component="weave.indexcombo"/> </toolbar>
Once we reload the client then the index combo will appear in our user interface and the user could start typing within the input box and will receive a list of the matching entities from the server. Since we have not setup any indexes on the server as yet nothing will be found. We will look at how to setup the indexes later in this document.
By default the index combo will only search for entities that match the currently active entity. To have the index search over all entities we can set a flag in the item to tell the client to ask for all matches, and if the user chooses an item that is not the same type as the currently active entity, then the active entity will be changed when the user chooses that entity. To do this we add a 'all' flag to the item tag and set its value to true, so we would then have:
<toolbar> <item component="weave.indexcombo" all="true"/> </toolbar>
There are other flags that can be added to the item to alter the way the index combo works:
Name | Type | Description |
width | string | Sets the width of the input box. This should be a standard HTML width setting, for example "100%" would fill the whole toolbar, "250px" would be 250 pixels wide. |
minScale | number | When the user chooses an result from the list the map will be zoomed to the extent of the chosen entity, this setting ensures that the map won't be zoomed in beyond the given scale. |
pageSize | number | This value changes the number of items that will be presented to the user as the results of the search, for example "8" will tell the client to display the top 8 results for the search. |
doSelect | boolean | The Weave client can be instructed to not update the current selection when the user chooses a result from the list, this allows the index input box to be used as a simple find tool without actually interacting with the active selection. The default for this is |
clearOthers | boolean | If |
doMarker | boolean | If set to |
doGeometry | boolean | If set to |
doMarkers | boolean | If set to |
type | 'wildcard', 'fuzzy' or 'exact' | Change the type of search that's performed, default is 'wildcard'. 'exact' performs the search using the search term, 'wildcard' appends an * to the end of the search term and 'fuzzy' appends a ~ to the end of the search term. |
geometryFirst | boolean | If This may change the way an index works depending upon the relationship between the geometry and the attributes. If there is a one-to-one relationship between the geometry records and the attribute records then this setting will make no difference. If this value is true then there will be one index entry for each record in the geometry source, regardless of how may attribute records map to the geometry (since only the first attribute match will be used). If this value is false then there will be one index entry for each record in the attribute source and the geometry information may be duplicated. For example setting this value to false means that entities with multiple names can be found if any of the names are searched for, but if this value was set to true then only one name associated with an entity would be searchable. |
all | boolean | Default is false , when set to true indicates that all indexes should be searched, not just the indexes associated with the active entity, or indexes listed in the index attribute. |
index | string | A comma separated list of index names that should be searched. When this value is set only the indexes listed will be searched, rather than all indexes (if Note, this can also be a list of entities, rather than indexes, in which case all indexes associated with the listed entities will be searched. |
Beyond these settings it is also possible to change the stroke and fill of the geometry (if doGeometry is true) by adding a 'geometryStyle' tag to the configuration. The geometryStyle tag can set the fill and/or stroke used to draw the entity geometry when it is drawn on the client, but it will not change the style when drawing the selection.
<toolbar> <item component="weave.indexcombo" all="true"> <geometryStyle> <strokeOpacity>0.75</strokeOpacity> <strokeColor>#0000ff</strokeColor> <strokeWidth>2</strokeWidth> <fillOpacity>0</fillOpacity> </geometryStyle> </item> </toolbar>
In addition, you can alter the tooltips that are displayed when the user enters the input area, and when they hover over the search button.
<toolbar> <item component="weave.indexcombo" all="true"> <geometryStyle> <strokeOpacity>0.75</strokeOpacity> <strokeColor>#0000ff</strokeColor> <strokeWidth>2</strokeWidth> <fillOpacity>0</fillOpacity> </geometryStyle> <tooltip> <title>Quick Search</title> <text>Type here to search</text> </tooltip> <tooltip2> <title>Quick Search</title> <text>Redisplay the last search results</text> </tooltip2> </item> </toolbar>
Server
For the client to have something to search for, you must configure and index entities on the server to provide the required information. To provide the type of performance required for interactive searching like this, the indexes must be pre-generated and used for the search rather than actually searching through the database directly each time the user performs a search. To do this Weave creates a free text index from information that you provide it from at least one data definition.
To understand how the index needs to be setup it could be useful to describe how Weave performs the searching. So starting from when the client has typed in some search terms into the search input box on the client, after waiting for a fraction of a second the client will send the contents of the search input box to the server (and then waits for the list of matches to be returned from the server for display), but it's what happens when the input gets to the server that we're interested in for now.
Now when the text arrives at the server it breaks the text into individual words by separating them based on the spaces in the text, and it's these words, individually called "keywords", that Weave will actually be using to search through the index. This leads us to the first important part of building our index, which is generating a list of keywords for each entity.
As part of building the index Weave creates what's referred to as a "document" for each and every entity that you want to be searched for (and index is just a collection of documents). These documents contain the unique id of the entity, so the server can know what's been found, but also has a list of keywords associated with each entity. It's these keywords that are matched against what the user types in that determines what's returned as the results of the search.
The actual contents of the keyword fields in each document are obtained by Weave from a data definition that you have to setup. This data definition links each entity id to a list of columns in the database that contain the keywords that need to be attached to the document, and it's up to you to know what columns it is that are appropriate for the search. For example you could attach registration numbers to dogs, owner names to properties, names to roads or business names to local businesses. Weave and the index builder don't particularly care what content you include in the keywords (apart from some smarts that we'll look at later) and it basically just adds the text to the documents and later when performing the search uses it's smarts to match the user supplied keywords with those included in each document.
So we've seen how the keywords the user supplies and the index builder has associated with with each document tells Weave what it is that you've searched for, but after that we need to display the results to the user and to do this we use 'display' fields from the database. This is done basically the same way as when we associated keywords with each entity when adding them to the documents, but in this case the contents of the display fields aren't indexed when performing the search, instead they're just returned to the client and used in the input box drop-down list to display the results to the user. This is again done by associating a data definition with the index.
Before we have a look at an actual index definition the last piece of information we need to supply is the entity that the index is being generated for, which is done by adding an 'entity' tag to the index definition. By telling the index which entity it's associated with we're also providing it with the other piece of information that it needs to get started, and that's the actual geometry that goes along with each document. This information provides the index with the extent and centroid of each entity, by utilizing a spatial mapper that's associated with the entity, which are stored in the document for each entity along with the entity id, list of keywords and display fields (along with some other supporting fields) to allow most of the information the user required to be quickly available after the search.
If we look at an example of creating a simple index for roads where the user can search for a road in a suburb then we would need some supporting information available. Firstly we need the actual entity that we're going to be searching for and the spatial mapper that provided its geometry and would be something like:
<!-- Create our roads entity --> <entity:entity id="roads"> <label>Roads</label> </entity:entity> <!-- Link the roads entity to the ROADS layer in our spatial engine (not shown) --> <mapper:mapper id="roads.mapper"> <spatialEngine>spatialEngine</spatialEngine> <mapping> <entity>roads</entity> <table>ROADS</table> <key>ROAD_ID</key> </mapping> </mapper:mapper>
Next we need a data definition that supplies the keywords and display fields, which could be separate data definitions, but we'll use the same one. An example of what our data definition may look like is as follows:
<!-- Provide a road name, type and suburb based on ROAD_ID from the ROADS table --> <data:datadefinition id="dd_index_roads"> <datasourcedataconnection datasource="datasource" key="ROAD_ID"> <prefix>DISTINCT</prefix> <from table="ROADS"/> <parameter name="name" column="NAME"/> <parameter name="type" column="TYPE"/> <parameter name="suburb" column="SUBURB"/> </datasourcedataconnection> </data:datadefinition>
Using the above information a simple index definition would look something like this:
<index:entity id="index.roads"> <entity>roads</entity> <display> <datadefinition>dd_index_roads</datadefinition> <level1>Road: ${name} ${type}</level1> <level2>Suburb: ${suburb}</level2> </display> <keywords> <datadefinition>dd_index_roads</datadefinition> <level1>${name} ${type}</level1> <level2>${suburb}</level2> </keywords> </index:entity>
So what we end up with here is an index called 'index.roads' which indexes 'roads' entities based on road 'name', 'type' and 'suburb', and displaying the road and suburb details to the user.
Sorting
Further to this we can now (as of version 1.1.0 of the index bundle) also sort the results.
To do this you add a <sort>
tag to the configuration to tell the index builder what information to attach to each indexed document that's used to determine the sort order.
Sorting does not effect which results are returned to the user, the weighting of the individual documents still does that, the sorting just determines the order in which the results are shown to the user.
When adding sorting to your indexes it's best to either add sort information to all indexes or none if you're searching across all entities (if you've configured the search functionality on the client to only search the active entity then this doesn't apply).
This is because the searching operation is different if sorting is involved compared to when it isn't, and if you're trying to search over multiple indexes (or entities) then as to not impose additional overhead in having to search twice (once for sorted indexes once for non-sorted) Weave will only search sorted indexes or non-sored indexes. When performing a search across all entities the server will first look for all indexes that have sorting configured and use just those for the search, unless there are indexes that are configured for sorting in which case it will use all of the indexes (and assume that none are sorted). What this means is that if you're sorting across all entities and only some of your indexes have sorting configured then only those indexes will be searched and none of the entities in your non-sorted indexes will be found.
Anyway, to add sorting to a index is the same as adding display
and keywords
, but is limited to a single level. So when you add a sort to an index an extra processing step is performed to iterate over the data definition configured for the sort and construct a sort field for each indexed document that will then be used during the searching to order the returned results (as opposed to the default ordering which is based on the weighing of the found documents).
So if we wanted our roads sorted so they're ordered by the suburb they're in followed by the road name then we could do the following:
<index:entity id="index.roads"> <entity>roads</entity> <display> <datadefinition>dd_index_roads</datadefinition> <level1>Road: ${name} ${type}</level1> <level2>Suburb: ${suburb}</level2> </display> <keywords> <datadefinition>dd_index_roads</datadefinition> <level1>${name} ${type}</level1> <level2>${suburb}</level2> </keywords> <sort> <datadefinition>dd_index_roads</datadefinition> <level1>${suburb} ${name}</level1> </sort> </index:entity>
Since the level in the sort works the same as the display and keywords it means that we can add additional text to the sort, and this can be used to our advantage to ensure that the order of the different types of entities found can be displayed to the user in a certain order. For example if we search enabled suburbs, roads and property addresses then we could use the sort field to ensure that suburbs are always listed first then followed by roads and then properties (regardless of how "well" the actual results match the search). To do this with our previous example we could change the level1
tag of the sort to
<level1>0020 ${suburb} ${name}</level1>
and then make sure that our sort for suburbs had the level prefix set to 0010
and properties set to 0030
. This way we'll ensure that the sorting order of the suburbs, roads and properties will always be returned in that order, and the original sorting we specified (suburb and road name in the previous example) will be used within those groups.
A further example of this can be seen if you have address data that stores house numbers as a separate field in your data, then when creating the sort field for your property addresses you would add the house number to the end of the sort field to ensure that the properties are returned in house number order (but remember to left pad the field in the data definition with 0's to ensure the sort is performed numerically rather than alphabetically)
Detail
In detail, what happens with this information when it comes time for Weave to build the index is that the index builder will iterate over each and every feature returned by the spatial mapper associated with the entity indicated by the index. For each feature it finds it creates a new document in the index and to that document it attaches the entity type, the entity id, the entity centroid and the entity extent.
After it's created a document for each available entity it then processes the display definition to add the fields to the document that will be displayed to the user. It does this by iterating over the data definition set in the display configuration and using the level1 and level2 information to substitute the fields retrieved from the data definition. It does this by replacing the ${} values with the matching parameter from the data definition and then using that text as the content of the field to be stored in the document. As you can see from the example above any text can be included in the display configuration, including HTML, in the example above "Road: " and "Suburb: " are examples of additional text that will be sent to the user. And the values from the data definition will replace the markers created using the ${} syntax.
A display configuration is limited to two level tags, that is you can only specify level1 and level2.
The index builder then processes the keyword fields using the same process it used for the display fields, but in this case there can be up to 5 levels set (we only use 2 in the example above). Again, you can add your own text to the keywords (which hasn't been done in the example above) but it doesn't make sense to include HTML in there, since this is the text that's going to be searched for a match. This could be done for example to add the word 'ROAD' to the keywords index to allow the user to help narrow down the search to just roads if there were other indexes setup for other entities by including 'road' in the search field (I know that's a bad example since road type will include 'road' anyway but you get the idea).
Finally an additional run through is perform if sorting is configured for the index.
Weighting
The levels in the keywords gives us our introduction to weighting in the index. Weighting allows you to set higher priority to some database fields compared to others, and it does this by giving lower numbered levels a higher weighting than higher numbered levels. That is if a document is found that has a match in the level1 field it will be returned higher in the list that another document that may have the same value but in the level2 field, which will be returned before another document that has the same value in the level3 field,and so on.
From our example above we can see that road name and road type will be given the same weighting, the highest, and suburb will be given a slightly lower weighting. It's also possible to attach a weighting explicitly to the index as a while by adding a 'weight' tag to the index definition that contains a number that's used to multiple the weighting, with the default being 1.0. So by setting a weight of 2.0 the documents in the index will be twice as likely to be returned than another index that has the same values but has the default weighting of 1.0, or setting the weight to 0.5 will halve the chance of those documents being returned.
As of version 1.8.17 of the com.cohga.server.index bundle you now have the ability to specify the weight values for individual records. As can be seen in the example below, the administrator can define a weights element inside the index comprising of a datadefinition and a value. The value is sourced from the datadefiniation in this case weight. The weight will be applied to the feature to increase it's score value when searching through the index.
<index:entity id="index.roads"> <entity>roads</entity> <display> <datadefinition>dd_index_roads</datadefinition> <level1>Road: ${name} ${type}</level1> <level2>Suburb: ${suburb}</level2> </display> <keywords> <datadefinition>dd_index_roads</datadefinition> <level1>${name} ${type}</level1> <level2>${suburb}</level2> </keywords> <weights> <datadefinition>dd_index_roads</datadefinition> <value>${weight}</value> </weights> <sort> <datadefinition>dd_index_roads</datadefinition> <level1>${suburb} ${name}</level1> </sort> </index:entity>
Keyword Smarts
As mentioned before there are some smarts built into the index builder when processing the keywords, the first is synonyms and the second is number range expansion.
Synonyms
Synonyms allow you to specify a text file that provides alternate keywords for those found in the database. This can be done for simple things like including "STREET", "STR" and "ST" as keywords when the database provides just "ST" (or just "STR" or just "STREET"), then the user can use either of those values to search for streets. Weave supplies a text file with a list of common street types synonyms as part of the indexer installation that can be used with any index.
Synonyms can also be used to provide completely different words as synonyms, for example 'pharmacist' can be set as a synonym for 'chemist', unlikely to be of much use in our roads example (unless you wanted to setup synonyms list of the street names) but handy if you want people to be able to search for business types and want to catch the different business types that a business could be.
There are two formats of synonym files, one where each of the synonyms refer to the same word, for example in our street abbreviation file, where ST, STR and STREET are abbreviations for the same word. And the other for when the words are alternatives for the same word in one direction but may not apply in the reverse direction, for example 'revoke' and 'abandon' could be a synonym for 'vacate' but 'revoke' shouldn't be a synonym for 'abandon'.
The first format has a single line for each group of synonyms with the alternatives separated by a comma, for example:
ST,STR,STREET
And the seconds format has the original word followed by an equals sign and a space separated list of alternatives, for example:
abandon=vacate revoke=vacate vacate=abandon revoke
To add synonyms to an index you must add at least one 'synonyms' tag to the index and rebuild the index.
<index:entity id="index.roads"> <entity>roads</entity> <display> <datadefinition>dd_index_roads</datadefinition> <level1>Road: ${name} ${type}</level1> <level2>Suburb: ${suburb}</level2> </display> <keywords> <datadefinition>dd_index_roads</datadefinition> <level1>${name} ${type}</level1> <level2>${suburb}</level2> </keywords> <synonyms>street.txt</synonyms> </index:entity>
Number Range Expansion
When adding keywords to a document the index builder will look for keywords that look like number ranges and expands those to include the individual numbers within the range. This way if the database contains "11-14" as one of the fields then the index builder will include "11-14" as one of the keywords, but it will also include "11", "12", "13" and "14" as separate keywords (at the same weighting as the original keyword). This is done to help the user find those likely matches when they search for part of the number range or a number within the range.
If fact the range expansion is more complex than that and can handle a wide range of different formats, including some of the following examples:
Original | Additional |
12A | 12 |
1/12 | 12 |
1/12A | 12A 12 1/12 |
1A/12A | 1/12A 12A 12 1/12 |
1A/12 | 1/12 12 |
10-14 | 10 14 11 12 13 |
1/10-14 | 10-14 10 14 11 12 13 1/10 1/14 1/11 1/12 1/13 |
1A/10-14 | 1/10-14 10-14 10 14 11 12 13 1/10 1/14 1/11 1/12 1/13 1A/10 1A/14 1A/11 1A/12 1A/13 |
Number range expansion is automatic and does not require any changes to the index definition to be enabled, which also means that at the moment it can't be disabled, but that may change in the future.
Scheduling Updates
Because the index is built from the data that's available at the time it's built it may become stale over time and require rebuilding. This can be done manually at the OSGi console, (more on that later) or setup in the index definition using a schedule defined using Cron format.
By adding a schedule tag you can indicate to Weave when the index can be rebuilt down to the millisecond, have it rebuilt at certain times each day or on certain days of the week (or a combination of these).
Schedule | Description |
0 0 30 2 | will run at 2:30am each day |
0 0 30 2,14 | will run at 2:30am and 2:30pm each day |
0 0 30 2,8,14,20 | will run at 2:30am, 8:30am, 2:30pm and 8:30pm each day |
0 0 30 2 1 | will run at 2:30am each Sunday |
0 0 30 2 * 1 | will run at 2:30am on the first of each month |
0 0 30 2 * 1 2 | will run at 2:30am on the first of February each year |
0 0 30 2 5 * 2 | will run at 2:30am on each Thursday of February each year |
0 0 30 2 5 1 2 | will run at 2:30am on each Thursday and on the first of February each year |
0 0 15,45 | will run every half hour at quarter past and quarter to |
Note that since building the indexes can be CPU intensive you should stagger the rebuilding so that you don't try and rebuild more than once index at a time.
Command Line
The indexing in Weave provides a number of commands that can be used at the OSGi prompt to work with the indexes.
Command | Parameters | Description |
is |
| return a list of all indexes |
ib | [<index>|<id>] | rebuild an index |
ik | [<index>|<id>] | update keyword fields for an index |
id | [<index>|<id>] | update display fields for an index |
ig | [<index>|<id>] | update geometry field for an index |
io | [<index>|<id>] | update sort field for an index |
iu | [<index>|<id>] | unlock an index |
ir | [<index>|<id>] | remove an index |
it | "<search terms>"|id:<entityid> [<entity>|<index>] [<limit>] | test index |
<> substitute, [] = optional, | alternate
Update: The ib, ik, id, ig, iu, io and ir commands now also accept a list of space separated index ids indexes or no index id parameters to perform the operation on all indexes.
Also, if multiple commands are submitted at once they'll be queued up so that only one command is performed at a time (this also goes for commands that are triggered through a schedule). This is to ensure that the server isn't overloaded with building indexes (you can imagine if you had 10 indexes and happened to type ib
in the console and triggered the concurrent build of 10 indexes).
So at the OSGi console you can use 'is' to see what indexes are currently registered in Weave
osgi> is Weave Index Service Index Id Entity Count Locked Modified 0 index.roads roads N/A N/A N/A
From this we can see that the 'index.roads' index has not actually been built, from here we can use the 'ib' command to build the index (assuming we haven't setup a schedule that would build the index for us)
osgi> ib 0 Building index for 0 Processing index index.roads Indexing non-unique features from ROADS based on ROAD_ID ... Total time to build index index.roads 32468ms osgi> is Weave Index Service Index Id Entity Count Locked Modified 0 index.roads roads 166131 false 19/05/09 11:10
And then we can actually test our index without having to start the client
osgi> it "cameo ct" 1 Raw Query: +(keywords_1:cameo^21.0 keywords_2:cameo^16.0 keywords_3:cameo^11.0 keywords_4:cameo^6.0 keywords_5:cameo) +(keywords_1:ct^21.0 keywords_2:ct^16.0 keywords_3:ct^11.0 keywords_4:ct^6.0 keywords_5:ct) Start search results for cameo ct Start result 0 Score: 14.316717 Entity: roads Id: 45132 Display 1: Road: CAMEO CRT Display 2: Suburb: BULLEEN Keyword 1: CAMEO CRT COURT CT Keyword 2: BULLEEN End result 0 End search results
Manually updating indexes
The server status page contains links for initiating an index update, and the 'build' link in that page can be accessed by an external application to start an index build.
The link to start an index build would http://hostname:8080/weave/server/index/build/<indexid>
Troubleshooting
The osgi console has the ability to perform index searches, using the 'it' (index test) command, and it show more details about the results, it also does it at a slightly lower level than the client.
You should run any test you do through this command to see what's actually going on.
You may want to check out http://lucene.apache.org/core/old_versioned_docs/versions/2_9_4/queryparsersyntax.html for details on the search syntax when using the 'it' command.
Note: By default the Weave index search adds an * to the end of the search term, so to replicate what the client is doing you should also include an * at the end of the search term when using the it command.
There's also a standalone tool you can download http://www.getopt.org/luke/ that will allow you to open and look at the index directly.
Note: One thing to keep in mind is to change the Analyzer to the StandardAnalyzer from the KeywordAnalyzer, in the Analysis tab under the Search tab if you do any searches.
The Documents tab is handy because it allows you to cycle through the stored documents directly and see exactly what's stored for each one.
You need to make sure you're using synonyms if you're storing things like street types where the database contains 'RD' but the user may type in 'road' or 'rd'.
Also, punctuation can cause problems if used in a keyword fields, meaning the user typing 'road' won't necessarily match the keywords if it's derived from the value '123 Main Road, Smallville'.
Updates for Weave 2.5
As of Weave 2.5 the way indexes are built has changed.
Previously they were generated based on the geometry first, now they're generated based on the attributes first, but only if the data definitions used for the keywords, display, sort and fields is the same.
To switch to the older method of generating indexes you can set
<geometryFirst>true</geometryFirst>
as an option within the index config (and ensure all data definitions are the same). This flag was introduced in Weave 2.5.4.
Additionally, since all data definition are the same they can be set once at the top level of the index config, rather than duplicated within each section.