System Metrics

As of version 2.6.4 Weave supports metrics, which are counters, gauges and histograms that can be used to monitor the performance of the Weave server.

The metrics can be published to one of four different metrics “databases”, Prometheus, InfluxDB, Datadog and JAMon, or not published at all (which removes the overhead of collecting the metrics if you don’t intend to use them).

 

Long Term Monitoring

Prometheus, InfluxDB and Datadog provide long term monitoring because they use a database and the metrics are stored in the database so they are not lost after a service or system restart.

These tools provide basic graphing, but are primarily for collecting the metrics over a long period; another tool would be used to display and monitor that information, for example Grafana https://grafana.com/. A basic dashboard for Grafana that uses Prometheus is available for import to Grafana from here.

Short Term Monitoring

JAMon does short term monitoring but not monitoring of your system over the long term. The monitoring is described as short term because once the Weave services are restarted, the metrics will be lost as they are cleared after a restart.

The JAMon metrics registry is for sites that don't have one of the proper metrics registries but want to be able to at least see what the metrics are doing. If you want to make use the metrics to their full potential, you should use one of the other registries.

Installation

If you perform an upgrade to 2.6.4 from a previous 2.6 release then the metrics will not be installed.

You need to perform a clean install of 2.6.4 and choose to install one of the metrics database providers during installation from the available Extensions, or, after you upgrade to 2.6.4 re-run the 2.6.4 installer (not the updater), uncheck the Main Components and then choose the metrics database provider you wish to utilise from the available Extensions.

There may be an issue with the installer and you should also click on the System Metrics check box to ensure all the components are installed.

That is you should make sure that the System Metrics box has a check mark and not a green square before continuing, otherwise the required metrics plugins will not be installed.

You can only install a single registry provider, JAMon, Datadog, InfluxDB or Prometheus at a time.

If you have previously installed one of the above registries you should remove the corresponding plugin before installing another or the metrics recording will not work.

The registry plugins will be named com.cohga.server.metric.registry.*_x.y.z.jar, where * is either jamon, datadog, influxdb or prometheus.

If you do not see the metrics you expect check if there is more than one plugin with that above names.

Note that the com.cohga.server.metric.registry.api_x.y.z.jar and com.cohga.server.metric.registry.micrometer_x.y.z.jar plugins should always remain, along with the other com.cohga.server.metric.*_x.y.z.jar files.

Configuration

JAMon

JAMon is an internal metrics database that stores the metrics and makes them available via the existing Weave server status page (under the Timing Summary page). JAMon requires no configuration.

Prometheus

Out of the box, the Prometheus database requires no configuration to start it working, and it then presents the available metrics for collection from the /weave/metrics URL. To utilise the metrics you need to point your Prometheus server to access that URL to periodically collect the metrics from the Weave server.

The following is a working prometheus.yml configuration file that can be used with a Prometheus server, you just need to change <hostname> to the Weave server hostname and <port> to the port that Weave can be access at. Note you may need to change the scheme attribute if you’re not directly accessing Weave over http and are using https.
While this example is enough to get Prometheus working with a Weave server and can serve as a complete Prometheus server configuration file you would likely include more than this, this is intended to just show what is required to add a connection to a Weave server.

scrape_configs: - job_name: 'weave' scrape_interval: 1m metrics_path: '/weave/metrics' static_configs: - targets: - <hostname>:<port>

If you’re going to use a security.xml file from a previous version of Weave it may not include an entry for the Prometheus /metrics endpoint, if this is the case add the following entry to the filterInvocationDefinitionSource property in the filterChainProxy, after PATTERN_TYPE_APACHE_ANT but before /**=httpSessionContextIntegrationFilter,logoutFilter,...

/metrics=#NONE#

The default

Other configuration options:

  • descriptions

    • Should meter descriptions be sent to Prometheus? Default is true, set to false to minimise the amount of data sent on each scrape

  • step

    • The step size to use in computing windowed statistics like max. The default is 1 minute. To get the most out of these statistics, align the step interval to be close to your scrape interval.

    • The formats accepted are based on the ISO-8601 duration format PnDTnHnMn.nS with days considered to be exactly 24 hours, e.g

      • "PT20.345S" -- parses as "20.345 seconds"

      • "PT15M" -- parses as "15 minutes" (where a minute is 60 seconds)

      • "PT10H" -- parses as "10 hours" (where an hour is 3600 seconds)

      • "P2D" -- parses as "2 days" (where a day is 24 hours or 86400 seconds)

      • "P2DT3H4M" -- parses as "2 days, 3 hours and 4 minutes"

Datadog

Datadog must at least be configured to provide an apiKey and applicationKey, but it also supports a number of other configuration options.

<config xmlns="urn:com.cohga.server.config#1.0" xmlns:datadog="urn:com.cohga.server.metric.datadog#1.0"> <datadog:config> <apiKey>INCLUDE_YOUR_API_KEY_HERE</apiKey> <applicationKey>INCLUDE_YOUR_APPLICATION_KEY_HERE</applicationKey> </datadog:config> </config>

Other configuration options:

  • hostTag

    • The tag that will be mapped to "host" when shipping metrics to datadog, default is no tag

  • uri

    • URL to push metrics to, default is https://app.datadoghq.com

  • descriptions

    • Should meter descriptions be sent to Datadog? Default is true, set to false to minimise the amount of data sent on each scrape

  • step

    • The step size (reporting frequency) to use. The default is 1 minute.

    • The formats accepted are based on the ISO-8601 duration format PnDTnHnMn.nS with days considered to be exactly 24 hours, e.g

      • "PT20.345S" -- parses as "20.345 seconds"

      • "PT15M" -- parses as "15 minutes" (where a minute is 60 seconds)

      • "PT10H" -- parses as "10 hours" (where an hour is 3600 seconds)

      • "P2D" -- parses as "2 days" (where a day is 24 hours or 86400 seconds)

      • "P2DT3H4M" -- parses as "2 days, 3 hours and 4 minutes"

  • enabled

    • If publishing is enabled. Default is true.

  • numThreads

    • The number of threads to use with the scheduler. The default is 2 threads.

  • connectTimeout

    • The connection timeout for requests to the backend. The default is 1 second.

  • readTimeout

    • The read timeout for requests to the backend. The default is 10 seconds.

  • batchSize

    • The number of measurements per request to use for the backend. If more measurements are found, then multiple requests will be made. The default is 10,000.

InfluxDB

InfluxDB must be configured to at least provide a userName and password. InfluxDB also supports a number of other configuration options.

<config xmlns="urn:com.cohga.server.config#1.0" xmlns:influx="urn:com.cohga.server.metric.influxdb#1.0"> <influx:config> <userName>INCLUDE_YOUR_USERNAME_KEY_HERE</enabled> <password>INCLUDE_YOUR_PASSWORD_KEY_HERE</password> </influx:config> </config>

Other configuration options:

  • db

    • The db to send metrics to. Defaults to "mydb".

  • consistency

    • Sets the write consistency for each point. The Influx default is 'one'. Must be one of 'any', 'one', 'quorum', or 'all'. Only available for InfluxEnterprise clusters.

  • retentionPolicy

    • Influx writes to the DEFAULT retention policy if one is not specified.

  • retentionDuration

    • Time period for which influx should retain data in the current database (e.g. 2h, 52w).

  • retentionReplicationFactor

    • How many copies of the data are stored in the cluster. Must be 1 for a single node instance.

  • retentionShardDuration

    • The time range covered by a shard group (e.g. 2h, 52w).

  • uri

    • The URI for the Influx backend. The default is http://localhost:8086

  • compressed

    • if metrics publish batches should be GZIP compressed, default is true.

  • autoCreateDb

    • if Micrometer should check if db exists before attempting to publish metrics to it, creating it if it does not exist.

  • step

    • The step size (reporting frequency) to use. The default is 1 minute.

    • The formats accepted are based on the ISO-8601 duration format PnDTnHnMn.nS with days considered to be exactly 24 hours, e.g

      • "PT20.345S" -- parses as "20.345 seconds"

      • "PT15M" -- parses as "15 minutes" (where a minute is 60 seconds)

      • "PT10H" -- parses as "10 hours" (where an hour is 3600 seconds)

      • "P2D" -- parses as "2 days" (where a day is 24 hours or 86400 seconds)

      • "P2DT3H4M" -- parses as "2 days, 3 hours and 4 minutes"

  • enabled

    • If publishing is enabled. Default is true.

  • numThreads

    • The number of threads to use with the scheduler. The default is 2 threads.

  • connectTimeout

    • The connection timeout for requests to the backend. The default is 1 second.

  • readTimeout

    • The read timeout for requests to the backend. The default is 10 seconds.

  • batchSize

    • The number of measurements per request to use for the backend. If more measurements are found, then multiple requests will be made. The default is 10,000.