Natural Language Geocoding

Organisations today fight an ever increasing problem and that is information overload. How do you enable users to find correct information quickly, easily and in a familiar way?

A core capability of Weave that is proving to be of significant value to Weave users is the Weave Natural Language Geocoding Engine.

This capability is quite unique to the Weave Business Integration Framework and Cohga has found that sites such as Melbourne City Council, Melbourne Fire Brigade, and North shore City Council for example are already gaining in productivity through the use of this capability.

The Natural Language Geocoding Engine provides a more intuitive method for a user to query or search for useful data that may reside in a multitude of systems at a user site without the training and experience of knowing where or how to gain access to that data.

Typical Geocoding engines rely on well defined structured data to allow geocoding to undertaken on a dataset. e.g. House Number, Street, Street Type, Suburb and PostCode. There are several disadvantages that arise with this type of geocoding:

  1. What if your data is not structured in this way?
  2. What if you do not know the structure of the data and it is just unformatted text?
  3. What if you are not interested in Address information.
  4. What if you need to search across multiple datasets each with a different structure or from a different GIS dataset.

Typical geocoding engines require that the user types the Address in it's full format. e.g. “4 Smith Street, Yarraville”

What if the user is unsure about the street name? It would be nice to allow the user to start with “Yarraville” then “Smith” then “4”.

Note that the word street was not entered nor is a delimiter required to distinguish the suburb.

Weave provides the tools to harvest and create indexes of user data and it is through these linked indexes that a user may search their data using free format queries and be provided with dynamic feedback of results as the query is being entered.

The Natural Language Geocoding Engine has the ability to rapidly search through textual based information much faster than Database Queries. It employs an indexing scheme to pre-index the textual and spatial data associated with a feature and store it in a binary format on the disk. Generally an index is ~1/3 the size of the original dataset, however this depends on the amount of information contained within the index. There are two steps to working with the Natural Language Geocoding Engine and these are harvesting, and searching .

The data harvesting and indexing process is totally separate to the interactive searching of the index where the user enters a query. Searching can also be performed on the indexes while they are being updated. Once the index is updated, the new version is used on the next search. This capability allows for the service to remain up for long periods. Weave also has the capability to schedule the updating of indexes so that the indexes are maintained automatically.

Searches can be done across all indexes with merged results or specific ones with or without the user needing to know neither where the data resides nor how to gain access to it. Weave provides ranked searching which enables the best results to be returned first.

The indexing process is rapid with current users able to index more than 3 millon features in approximately 9 minutes. This indexing can be performed as a background, batch or incremental process. Incremental harvesting is as fast as batch harvesting and index size is roughly 20% to 30% the size of data being harvested

By making use of the Natural Language Geocoding Engine that is provided with Weave, a user site can expect a typical search to take between 3 to 30ms for more than 3 million records to be searched.

Like all data displays in Weave, search results can be requested in pages to help minimise bandwidth. i.e. only get 10 or 20 results at a time.

In summary, some key aspects of the Weave search capability are as follows:

  1. ranked searching with best results returned first
  2. many powerful query types: phrase queries, wildcard queries, proximity queries, range queries and more
  3. field based searching (e.g. title, author, contents)
  4. date-range searching
  5. sorting by any field
  6. multiple-index searching with merged results
  7. simultaneous index update and searching