System Metrics
As of version 2.6.4 Weave supports metrics, which are counters, gauges and histograms that can be used to monitor the performance of the Weave server.
The metrics can be published to one of four different metrics “databases”, Prometheus, InfluxDB, Datadog and JAMon, or not published at all (which removes the overhead of collecting the metrics if you don’t intend to use them).
Long Term Monitoring
Prometheus, InfluxDB and Datadog provide long term monitoring because they use a database and the metrics are stored in the database so they are not lost after a service or system restart.
These tools provide basic graphing, but are primarily for collecting the metrics over a long period; another tool would be used to display and monitor that information, for example Grafana https://grafana.com/. A basic dashboard for Grafana that uses Prometheus is available for import to Grafana from here.
Short Term Monitoring
JAMon does short term monitoring but not monitoring of your system over the long term. The monitoring is described as short term because once the Weave services are restarted, the metrics will be lost as they are cleared after a restart.
The JAMon metrics registry is for sites that don't have one of the proper metrics registries but want to be able to at least see what the metrics are doing. If you want to make use the metrics to their full potential, you should use one of the other registries.
Installation
If you perform an upgrade to 2.6.4 from a previous 2.6 release then the metrics will not be installed.
You need to perform a clean install of 2.6.4 and choose to install one of the metrics database providers during installation from the available Extensions, or, after you upgrade to 2.6.4 re-run the 2.6.4 installer (not the updater), uncheck the Main Components and then choose the metrics database provider you wish to utilise from the available Extensions.
There may be an issue with the installer and you should also click on the System Metrics check box to ensure all the components are installed.
That is you should make sure that the System Metrics box has a check mark and not a green square before continuing, otherwise the required metrics plugins will not be installed.
You can only install a single registry provider, JAMon, Datadog, InfluxDB or Prometheus at a time.
If you have previously installed one of the above registries you should remove the corresponding plugin before installing another or the metrics recording will not work.
The registry plugins will be named com.cohga.server.metric.registry.*_x.y.z.jar
, where *
is either jamon
, datadog
, influxdb
or prometheus
.
If you do not see the metrics you expect check if there is more than one plugin with that above names.
Note that the com.cohga.server.metric.registry.api_x.y.z.jar
and com.cohga.server.metric.registry.micrometer_x.y.z.jar
plugins should always remain, along with the other com.cohga.server.metric.*_x.y.z.jar
files.
Configuration
JAMon
JAMon is an internal metrics database that stores the metrics and makes them available via the existing Weave server status page (under the Timing Summary page). JAMon requires no configuration.
Prometheus
Out of the box, the Prometheus database requires no configuration to start it working, and it then presents the available metrics for collection from the /weave/metrics
URL. To utilise the metrics you need to point your Prometheus server to access that URL to periodically collect the metrics from the Weave server.
The following is a working prometheus.yml configuration file that can be used with a Prometheus server, you just need to change <hostname> to the Weave server hostname and <port> to the port that Weave can be access at. Note you may need to change the scheme attribute if you’re not directly accessing Weave over http and are using https.
While this example is enough to get Prometheus working with a Weave server and can serve as a complete Prometheus server configuration file you would likely include more than this, this is intended to just show what is required to add a connection to a Weave server.
scrape_configs:
- job_name: 'weave'
scrape_interval: 1m
metrics_path: '/weave/metrics'
static_configs:
- targets:
- <hostname>:<port>
If you’re going to use a security.xml file from a previous version of Weave it may not include an entry for the Prometheus /metrics endpoint, if this is the case add the following entry to the filterInvocationDefinitionSource property in the filterChainProxy, after PATTERN_TYPE_APACHE_ANT but before /**=httpSessionContextIntegrationFilter,logoutFilter,...
/metrics=#NONE#
The default
Other configuration options:
descriptions
Should meter descriptions be sent to Prometheus? Default is
true
, set tofalse
to minimise the amount of data sent on each scrape
step
The step size to use in computing windowed statistics like max. The default is 1 minute. To get the most out of these statistics, align the step interval to be close to your scrape interval.
The formats accepted are based on the ISO-8601 duration format
PnDTnHnMn.nS
with days considered to be exactly 24 hours, e.g"PT20.345S" -- parses as "20.345 seconds"
"PT15M" -- parses as "15 minutes" (where a minute is 60 seconds)
"PT10H" -- parses as "10 hours" (where an hour is 3600 seconds)
"P2D" -- parses as "2 days" (where a day is 24 hours or 86400 seconds)
"P2DT3H4M" -- parses as "2 days, 3 hours and 4 minutes"
Datadog
Datadog must at least be configured to provide an apiKey
and applicationKey
, but it also supports a number of other configuration options.
<config xmlns="urn:com.cohga.server.config#1.0"
xmlns:datadog="urn:com.cohga.server.metric.datadog#1.0">
<datadog:config>
<apiKey>INCLUDE_YOUR_API_KEY_HERE</apiKey>
<applicationKey>INCLUDE_YOUR_APPLICATION_KEY_HERE</applicationKey>
</datadog:config>
</config>
Other configuration options:
hostTag
The tag that will be mapped to "host" when shipping metrics to datadog, default is no tag
uri
URL to push metrics to, default is
https://app.datadoghq.com
descriptions
Should meter descriptions be sent to Datadog? Default is
true
, set tofalse
to minimise the amount of data sent on each scrape
step
The step size (reporting frequency) to use. The default is 1 minute.
The formats accepted are based on the ISO-8601 duration format
PnDTnHnMn.nS
with days considered to be exactly 24 hours, e.g"PT20.345S" -- parses as "20.345 seconds"
"PT15M" -- parses as "15 minutes" (where a minute is 60 seconds)
"PT10H" -- parses as "10 hours" (where an hour is 3600 seconds)
"P2D" -- parses as "2 days" (where a day is 24 hours or 86400 seconds)
"P2DT3H4M" -- parses as "2 days, 3 hours and 4 minutes"
enabled
If publishing is enabled. Default is
true
.
numThreads
The number of threads to use with the scheduler. The default is 2 threads.
connectTimeout
The connection timeout for requests to the backend. The default is 1 second.
readTimeout
The read timeout for requests to the backend. The default is 10 seconds.
batchSize
The number of measurements per request to use for the backend. If more measurements are found, then multiple requests will be made. The default is 10,000.
InfluxDB
InfluxDB must be configured to at least provide a userName
and password
. InfluxDB also supports a number of other configuration options.
<config xmlns="urn:com.cohga.server.config#1.0"
xmlns:influx="urn:com.cohga.server.metric.influxdb#1.0">
<influx:config>
<userName>INCLUDE_YOUR_USERNAME_KEY_HERE</enabled>
<password>INCLUDE_YOUR_PASSWORD_KEY_HERE</password>
</influx:config>
</config>
Other configuration options:
db
The db to send metrics to. Defaults to "mydb".
consistency
Sets the write consistency for each point. The Influx default is 'one'. Must be one of 'any', 'one', 'quorum', or 'all'. Only available for InfluxEnterprise clusters.
retentionPolicy
Influx writes to the DEFAULT retention policy if one is not specified.
retentionDuration
Time period for which influx should retain data in the current database (e.g. 2h, 52w).
retentionReplicationFactor
How many copies of the data are stored in the cluster. Must be 1 for a single node instance.
retentionShardDuration
The time range covered by a shard group (e.g. 2h, 52w).
uri
The URI for the Influx backend. The default is
http://localhost:8086
compressed
if metrics publish batches should be GZIP compressed, default is
true
.
autoCreateDb
if Micrometer should check if
db
exists before attempting to publish metrics to it, creating it if it does not exist.
step
The step size (reporting frequency) to use. The default is 1 minute.
The formats accepted are based on the ISO-8601 duration format
PnDTnHnMn.nS
with days considered to be exactly 24 hours, e.g"PT20.345S" -- parses as "20.345 seconds"
"PT15M" -- parses as "15 minutes" (where a minute is 60 seconds)
"PT10H" -- parses as "10 hours" (where an hour is 3600 seconds)
"P2D" -- parses as "2 days" (where a day is 24 hours or 86400 seconds)
"P2DT3H4M" -- parses as "2 days, 3 hours and 4 minutes"
enabled
If publishing is enabled. Default is
true
.
numThreads
The number of threads to use with the scheduler. The default is 2 threads.
connectTimeout
The connection timeout for requests to the backend. The default is 1 second.
readTimeout
The read timeout for requests to the backend. The default is 10 seconds.
batchSize
The number of measurements per request to use for the backend. If more measurements are found, then multiple requests will be made. The default is 10,000.