Detailled documentation

The first part of this documentation is to explain how to install, run and update Panto. Then, every Probe will be presented in depth. Finally, we'll see how to configure States and Alerts in the application.

⚠️ Note: all commands prepended with a # prompt might need to be run as root or using sudo. The commands started with a $ prompt could be run as any unprivileged user, except stated otherwise.

Installing Panto

Panto is distributed in different flavors, that you can pick based on your needs or your taste. If you want to run the full Panto suite (panto-server and panto-agent), or if you are in a hurry to see it live, we recommend the Docker Compose setup. If you are in the SaaS configuration, and only need to run the Agent on your Debian server, we recomment you choose the packaged setup, with apt.

Docker

Panto hosts 4 Docker images on the GitLab registry.

  • registry.gitlab.com/pantomath-io/panto/panto-server: The Panto server
  • registry.gitlab.com/pantomath-io/panto/panto-agent: The Panto agent
  • registry.gitlab.com/pantomath-io/panto/panto-web: The Panto web client
  • registry.gitlab.com/pantomath-io/panto/panto: All-in-one image

The images named registry.gitlab.com/pantomath-io/panto/panto-server or registry.gitlab.com/pantomath-io/panto/panto-web are the stable images built on the master branch. To get the latest development build use registry.gitlab.com/pantomath-io/panto/panto-server:edge or registry.gitlab.com/pantomath-io/panto/panto-web:edge. Other tags include :latest (aliases of the master build), :x.y and :x.y.z where X.Y.Z is a specific version number (e.g. :1.1.2).

⚠️ Running Panto in Docker stores all your data to the containers. Destroying the containers will delete your data. Make sure to save all your data before destroying the containers, or use Docker volumes.

Server

The Panto server image runs a server in a docker container. It exposes port 7575 for the gRPC API and port 7576 for the REST API. Set the INFLUXDB_ADDRESS environment variable inside the container to a URL where the container can reach an InfluxDB server.

panto-server exposes 2 TCP ports by default: * 7575 for its gRPC interface * 7576 for its REST interface

Run the container by typing:

$ docker run -p 7575:7575 -p 7576:7576 --env INFLUXDB_ADDRESS=http://influxdb:8086 registry.gitlab.com/pantomath-io/panto/panto-server

Note that you probably want to configure the panto with a persistent configuration file. Use Docker volumes to mount the configuration file in /etc/panto/panto.yaml.

Agent

The Panto agent image runs an Agent in a docker container. Set the PANTO_SERVER_ADDRESS environment variable inside the container to a host:port where the container can reach a Panto server.

Run the container by typing:

$ docker run --env PANTO_SERVER_ADDRESS=localhost:7575 registry.gitlab.com/pantomath-io/panto/panto-agent

Note that you probably want to configure the panto-agent with a persistent configuration file. Use Docker volumes to mount the configuration file in /etc/panto/panto-agent.yaml.

Web client

The Panto web client image runs a static web server in a docker container. It exposes port 8080. Set the PANTO_ADDRESS environment variable inside the container to a URL where the user's web browser can reach a Panto server's REST API.

panto-web exposes 1 TCP port by default: * 8080 for the web server

Run the container by typing:

$ docker run -p 8080:8080 --env PANTO_ADDRESS=http://panto.yourdomain.com registry.gitlab.com/pantomath-io/panto/panto-web

All-in-one

The "all-in-one" image contains:

  • The Panto server
  • The Panto agent
  • The Panto web client
  • An InfluxDB server

This "all-in-one" exposes 3 TCP ports by default: * 7575 for panto-server, the gRPC interface * 7576 for panto-server, the REST interface * 8080 for panto-web, the web application

Run the container by typing:

$ docker run -p 7575:7575 -p 7576:7576 -p 8080:8080 --env PANTO_ADDRESS=http://panto.yourdomain.com registry.gitlab.com/pantomath-io/panto/panto

To get the Agent working, please refer to the section below.

Docker Compose

A more modular, multi-container setup is also available, using Docker Compose. After making sure docker-compose is installed, download the docker-compose.yml file to your computer and type:

$ docker-compose up

from the same directory as the docker-compose.yml file.

This will download all the necessary images from the official repositories, and run all containers and connect the pieces for you.

By default, this will start all the components (the panto-server, the TSDB, one panto-agent and the panto-web), and will expose the following ports:

  • TCP/7575: panto-server, gRPC interface
  • TCP/7576: panto-server, REST interface
  • TCP/8080: panto-web, the web application

When running for the first time, you need to initialize the persistent database. Use the following command, once the docker compose is up:

$ docker exec -it panto-server /usr/bin/panto-ctl init data --db-path=/var/lib/panto/panto.sqlite --organization-name=Organization

To get the Agent working, please refer to the section below.

Linux packages

To install Panto, first make sure you have installed all of the prerequisites.

Prerequisites

Panto uses InfluxDB to store results from Probes. Ensure you have a running instance of InfluxDB reachable by Panto.

Panto uses Caddy to serve the web client. Download and install Caddy from the official website to deploy the web client.

APT repository

Debian packages (stretch) are available in a dedicated apt repository:

$ wget -qO- https://packages.panto.app/pantomath.key | sudo apt-key add -
$ echo "deb https://packages.panto.app/debian stretch main" | sudo tee /etc/apt/sources.list.d/panto.list
$ sudo apt update

You can now install the packages:

$ sudo apt install panto

Binaries

Pre-build panto binaries (linux/x86_64 only) can be downloaded directly from the downloads page. To install Panto, first make sure you have installed all of the prerequisites.

Panto environment

The recommended setup is to run Panto as a dedicated non privileged user. Create this user and setup the home directory for Panto:

[root]# groupadd panto
[root]# useradd -g panto -d /opt/panto -m -s /bin/false panto

Assuming default locations, it is also recommended to create the main directories Panto will need:

[root]# mkdir -p /var/lib/panto /var/log/panto /etc/panto
[root]# chown -R panto:panto /var/lib/panto /var/log/panto /etc/panto

Download binaries

Download the release from the releases page. Official binary distributions are only available for Linux 4.9 and later on x86_64. For other platforms, you will have to build from source.

After downloading the archive, extract it to a destination path. For example, using /opt/panto:

[panto]$ tar -C /opt/panto -xzf panto-$(VERSION)-$(OS)_$(ARCH).tar.gz
[panto]$ ln -sf /opt/panto/panto-$(VERSION)-$(OS)_$(ARCH)/* /opt/panto/

⚠️ These commands should be run as panto, the dedicated Panto user, who will run the server/agent.

Setup systemd

The convenient way to run Panto is to integrate it in systemd. Assuming default locations, this systemd unit file should do. Copy it to /etc/systemd/system/panto.service and adapt the values.

systemd needs to be reloaded:

[root]# chmod 664 /etc/systemd/system/panto.service
[root]# systemctl daemon-reload

And you need to enable the unit, to auto-start it on reboot:

[root]# systemctl enable panto

The Panto daemon can now be controled as any other systemd service:

[root]# systemctl status panto
[root]# systemctl start panto
[root]# systemctl stop panto

This can be adapted to run the panto-agent daemon.

Setup init script

For init users, the convenient way to run panto-agent is to integrate it in initd. Assuming default locations, this initd script should do. Copy it to /etc/init.d/panto-agent and adapt the values.

initd needs to be reloaded:

[root]# chmod 755 /etc/init.d/panto-agent
[root]# update-rc.d panto-agent defaults

The panto-agent daemon can now be controled as any other initd script:

[root]# service panto-agent start
[root]# service panto-agent status
[root]# service panto-agent stop

This can be adapted to run the panto daemon.

Build from source

For a detailed walkthrough, see the developer documentation.

Initialization

Server initialization

The panto-ctl tool can be used to initialize your server environment and configuration file before running Panto for the first time. Type:

[panto]$ panto-ctl init

⚠️ This command should be run as panto, the dedicated Panto user, who will run the server/agent.

⚠️ If you are using a Docker version, you need to run a command like:

$ docker exec -it panto-server /usr/bin/panto-ctl init

A command-line dialog will guide you through the steps of initializing your installation, setting up the database, and configuring the server. See the panto-ctl documentation for more details.

Agent initialization

The Agent does not need to be initialized, but it must be configured with its name, given by the Server. So an Agent must be declared on the Server (the API returns its name), and the Agent should run with this name configured.

In the package configuration or binaries configuration or Docker container, the name should be used in the configuration file (/etc/panto/panto-agent.yaml).

In the all-in-one or the Docker Compose configuration, when running for the first time, the panto-agent is not declared on the panto-server and won't be working properly. To setup the panto-agent, you need to:

  • launch the all-in-one image / docker-compose containers;
  • get your Organization API name:
curl http://localhost:7576/v1/organizations

The result should be something like {"organizations":[{"name":"organizations/pUNH8aZDRki_KQPigL7Eww","display_name":"Organization"}],"next_page_token":""}. The name field is the Organization name, from the API standpoint.

  • register the Agent:
curl -XPOST http://localhost:7576/v1/organizations/pUNH8aZDRki_KQPigL7Eww/agents --data '{"agent": {"display_name": "agent.local"}}'

The result should be something like {"name":"organizations/pUNH8aZDRki_KQPigL7Eww/agents/cUITkjXIQbisdhbuj31cdw","display_name":"agent.local","last_activity":null,"last_version":""}. The name field is the Agent name, from the API standpoint.

  • add the Agent name in the configuration file;
  • restart the container(s).

Running Panto

For the panto and panto-agent executables, the configuration file format is YAML. Configuration files are hierarchical in nature and paths are separated using a dot ('.'). E.g. for log.file:

log:
  file: /var/log/panto/panto.log

Executables

As a general rule, run any executable with --help for more information.

panto

The main server executable. Run this to deploy the Panto service on a machine.

Configuration file Command-line Environment variable Usage Default Required
--quiet, -q suppress all output false optional
--version, -V display version and exit false optional
--conf path to configuration file optional
verbose --verbose, -v PANTO_VERBOSE set verbosity level (0: silent, 1: warnings and errors, 2: verbose, 3: debug) 1 optional
log.file --log-file PANTO_LOG_FILE path of a file to log Panto output optional
log.syslog PANTO_LOG_SYSLOG log Panto output to syslog false optional
server.grpc-address --grpc-address address to bind gRPC API to :7575 required
server.rest-address --rest-address address to bind REST API to :7576 required
server.allow-origin --allow-origin comma-separated list of origin addresses to allow in the via CORS required
server.certfile --certfile PANTO_CERTFILE path to a TLS certificate file optional
server.certkey --certkey PANTO_CERTKEY path to a TLS private key file optional
server.no-tls --no-tls disable SSL/TLS false required
server-info.public-grpc-address PANTO_PUBLIC_GRPC_ADDRESS the public address where a client can reach the gRPC API optional
server-info.public-rest-address PANTO_PUBLIC_REST_ADDRESS the public address where a client can reach the REST API optional
influxdb.address --influxdb-address address of an InfluxDB server http://localhost:8086 required
influxdb.database --influxdb-database name of the InfluxDB database panto required
db.path --db-path path to a SQLite configuration database file /var/lib/panto/panto.sqlite required
smtp.server PANTO_SMTP_SERVER address of a SMTP server optional
smtp.port PANTO_SMTP_PORT port to use for SMTP server 587 optional
smtp.username PANTO_SMTP_USERNAME username for SMTP server optional
smtp.password PANTO_SMTP_PASSWORD password to use for SMTP server optional
smtp.from PANTO_SMTP_FROM email address to send the mails from hello@panto.app optional

panto-agent

The executable for the agents. Typically runs on a target machine and gathers metrics that it reports to the server. It can also be run on a dedicated machine, to monitor a remote target.

Configuration file Command-line Environment variable Usage Default Required
--quiet, -q suppress all output false optional
--version, -V display version and exit false optional
--conf path to configuration file optional
verbose --verbose, -v PANTO_AGENT_VERBOSE set verbosity level (0: silent, 1: warnings and errors, 2: verbose, 3: debug) 1 optional
log.file --log-file PANTO_AGENT_LOG_FILE path of a file to log output optional
log.syslog PANTO_AGENT_LOG_SYSLOG log output to syslog false optional
agent.name --name, -n PANTO_AGENT_NAME name of the agent in the Panto API required
agent.timeout --timeout PANTO_AGENT_TIMEOUT timeout duration 45s optional
agent.max-spooled-results --max-spooled-results PANTO_MAX_SPOOLED_RESULTS maximum number of results spooled while server can't be reached 100 optional
agent.spooler-dump-path --spooler-dump-path PANTO_SPOOLER_DUMP_PATH path of the dump file for the spooled results optional
server.address address of Panto server (host:port) required
server.certfile --certfile path to a TLS certificate file optional
server.no-tls --no-tls disable SSL/TLS false optional

panto-ctl

The administration tool for the panto server. Manipulates configuration files and the database. panto-ctl has 2 subcommands.

init subcommand

Run panto-ctl init to initialize configuration and database before running Panto for the first time.

Configuration file Command-line Environment variable Usage Default Required
dry-run --dry-run, -n do not perform any operations, just output what would be done false optional
install-prefix --install-prefix the default prefix for the Panto installation / optional

dbconf subcommand

Run panto-ctl dbconf to perform operations on the database. Subcommands:

  • check: makes sure the current version of the configuration database is the configured one
  • list-migrations: lists the pending migrations to be applied on the configuration database
  • upgrade: applies all the migrations to the configuration database
  • downgrade: reverts the last migration on the configuration database
  • version: displays the current version of the configuration database
Configuration file Command-line Environment variable Usage Default Required
db.path --db-path path to the SQLite database file required
--conf path to the configuration file optional

Web client

The main frontend for Panto, a static website served with Caddy. Set the PANTO_ADDRESS environment variable inside the container to a URL where the user's web browser can reach a Panto server's REST API. Set the (optional) PANTO_GA_ID environment variable inside the container to a valid Google Analytics tracking ID to enable it.

$ PANTO_GA_ID=UA-01234567-8 PANTO_ADDRESS=http://panto.yourdomain.com caddy -conf=/opt/panto/www/Caddyfile

Updating Panto

Depending on your setup, updating Panto is more or less like installing it.

Update

Docker

You simply need to refresh the image:

$ docker pull registry.gitlab.com/pantomath-io/panto/panto-server
$ docker stop panto-server
$ docker rm panto-server
$ docker run -p 7575:7575 -p 7576:7576 --env INFLUXDB_ADDRESS=http://influxdb:8086 registry.gitlab.com/pantomath-io/panto/panto-server

Docker Compose

Since this is only a combination of images, the procedure is almost the same as for Docker:

$ docker-compose pull
$ docker-compose up --force-recreate --build -d

Linux packages

You can rely on apt system to upgrade your packages:

$ sudo apt update
$ sudo apt install panto panto-agent

Binaries

Download the release from the releases page. Official binary distributions are only available for Linux 4.9 and later on x86_64. For other platforms, you will have to build from source.

After downloading the archive, extract it to a destination path. For example, using /opt/panto:

[panto]$ tar -C /opt/panto -xzf panto$(VERSION)-$(OS)_$(ARCH).tar.gz

⚠️ This command should be run as panto, the dedicated Panto user, who will run the server/agent.

Migrations

Once the application itself is up to date, migrations should be applied to update the databases schema.

panto-ctl is here to help you:

[panto]$ /opt/panto/panto-ctl dbconf upgrade --conf /etc/panto/panto.yaml

⚠️ If you are using a Docker version, you need to run a command like:

$ docker exec -it panto-server /usr/bin/panto-ctl dbconf upgrade --conf /etc/panto/panto.yaml

List of probes

Probe configuration parameters and results are Go types. Their conversion to JSON is specific to each Go types Marshal/Unmarshal functions. See Go documentation for details.

Probe tags are always strings.

Ping

Ping sends an ICMP Echo message to a host and reports statistics on the round trip.

Configuration

name description type required
address Target address. An IP or a hostname. string required
interval The wait time between each packet send. time.Duration optional, default 1s
count The number of ICMP packets to send. int optional, default 4
timeout The time to run the ping, until it exits. If the timeout occurs before the packet count has been reached, the probe exits anyway. time.Duration optional, default 10s

Results

name description type
sent the number of packets sent int
received the number of packets received int
min the minimum round trip time, in nanoseconds int64
max the maximum round trip time, in nanoseconds int64
avg the average round trip time, in nanoseconds int64
stddev the standard deviation of round trip times, in nanoseconds int64

Tags

name description
address the address of the target that was pinged

File checksum

Checksum computes the checksum of a file.

Configuration

name description type required
path the path of the file to check string required
hash the hash algorithm to use string (crc32,md5,sha1,sha256) required

Results

name description type
checksum the checksum of the file as a hexadecimal string string

Tags

name description
path the path of the file that was checked

CPU Usage

CPU collects CPU usage statistics.

Configuration

name description type required
per-cpu true collects utilization for each CPU/core, false collects utilization averaged over all CPUs/cores bool required
interval interval over which CPU utilization is calculated time.Duration optional, default 1s

Results

name description type
busy % of time spent running (sum of all "non-idle" times) float32
idle % of time spent idle float32
user % of time spent in user mode float32
system % of time spent in system mode, which is the time spent executing kernel code float32
nice % of time spent in user mode with low priority (nice) float32
iowait % of time waiting for I/O to complete float32
irq % of time servicing interrupts float32
softirq % of time servicing softirqs float32

Tags

name description
cpu the number of the CPU/core for this result, "all" represents usage averaged over all CPUs/cores

Disk Usage

Disk Usage collects disk usage statistics. Note that the list of available partitions is filtered based on the filesystem. The following filesystems will be removed from the results:

  • autofs
  • binfmt_misc
  • cgroup
  • debugfs
  • devfs
  • devpts
  • devtmpfs
  • dfsfuse_DFS
  • hugetlbfs
  • mqueue
  • nullfs
  • proc
  • pstore
  • securityfs
  • sysfs
  • tmpfs
  • udev

Configuration

Results

name description type
free free space on the disk, in bytes uint64
used used space on the disk, in bytes uint64
used-percent % of disk space left, free / (free + used) float32
inodes-free number of free inodes on the disk uint64
inodes-used number of used inodes on the disk uint64
inodes-used-percent % of inodes left, free / (free + used) float32

Tags

name description
path the mount point of the disk, e.g. /

Disk I/O

Disk I/O collects disk throughput statistics.

Configuration

name description type required
interval interval over which Disk utilization is calculated time.Duration optional, default 1s

Results

name description type
iops Average I/O operations per second float32
bytes-read Average bytes read per second float32
bytes-written Average bytes written per second float32
latency Average time for I/O requests issued to the device to be served, in nanoseconds int64

Tags

name description
device the name of the device, e.g. sda

HTTP request

HTTP sends an HTTP request and collects statistics on the response.

Configuration

name description type required
method HTTP request method (GET, POST, etc.) string optional, default GET
url HTTP request URL string required
body HTTP body of the request string optional
timeout duration before the HTTP request times out time.Duration optional
response-content content from the response to include in the results string (none,headers-only,body-only,full) optional, default none

Results

name description type
rtt time between sending the HTTP request and receiving the response, in nanoseconds int64
status-code the HTTP response's status code int
response-content the HTTP response's content (depending on the probe's configuration) string

System load

Load collects average system load. Load is defined as the number of processes waiting for I/O or in the run queue, averaged over a period of time.

Configuration

None.

Results

name description type
load1 System load over the last minute float32
load5 System load over the last 5 minutes float32
load15 System load over the last 15 minutes float32
proc-running Number of processes currently running int
proc-blocked Number of processes currently blocked int

Memory usage

Memory collects memory usage statistics.

Configuration

None.

Results

name description type
total Total amount of RAM on this system, in bytes uint64
available Estimate amount of RAM available to programs, in bytes uint64
used Total amount of RAM used by programs, in bytes uint64
used-percent Percentage of RAM used by programs float32
free Total amout of RAM not used by programs, in bytes uint64
swap-total Total amount of swap memory on this system, in bytes uint64
swap-free Total amout of swap memory not used by programs, in bytes uint64
swap-used Total amount of swap memory used by programs, in bytes uint64
swap-used-percent Percentage of swap memory used by programs float32

Network statistics

Network collects statistics about the network interfaces.

Configuration

name description type required
interfaces A comma-separated list of interfaces to collect information about. If the list is empty, return info about all interfaces. If the parameter is missing, return global information for all interfaces. string optional (see description)

Results

name description type
bytes-sent Number of bytes sent through the interface unit64
bytes-received Number of bytes received through the interface unit64
packets-sent Number of network packets sent through the interface unit64
packets-received Number of network packets receives through the interface unit64
error-in Number of errors while receiving through the interface unit64
error-out Number of errors while sending through the interface unit64
tcp-connections Number of open TCP connections on this interface unit64
udp-connections Number of open UDP connections on this interface unit64

Tags

name description
interface The network interface, e.g. eth0

Process statistics

Process collects statistics and details about the running processes.

Configuration

name description type required
names a comma-separated list of names of the processes to collect info about. A process name is usually the name of the executable that was launched. string required

Results

name description type
command-line The full command-line used to launch this process string
pid The PID of this process int32
create-time The exact time this process started int64
status A character representing the current status of the process. R: Running, S: Sleep, T: Stop, I: Idle, Z: Zombie, W: Wait, L: Lock. string
cpu An approximate percentage of CPU power used by this process float64
memory An approximate percentage of RAM used by this process float64
rss The "Resident Set Size" is the amount of RAM used by this process, in bytes uint64
vms The "Virtual Memory Size" is the total size of memory addressable by this process, in bytes uint64
threads The number of threads this process is currently running uint64
files The number of files currently opened by this process uint64
connections The number of network connections open by this process uint64

Tags

name description
name The name of the process

Docker

Docker requests the Docker API and gathers metrics

Configuration

name description type required
address Docker host string required

Results

name description type
container-count-created the number of container with state created in the Docker daemon (state is a tag) int64
container-count-running the number of container with state running in the Docker daemon (state is a tag) int64
container-count-paused the number of container with state paused in the Docker daemon (state is a tag) int64
container-count-restarting the number of container with state restarting in the Docker daemon (state is a tag) int64
container-count-removing the number of container with state removing in the Docker daemon (state is a tag) int64
container-count-exited the number of container with state exited in the Docker daemon (state is a tag) int64
container-count-dead the number of container with state dead in the Docker daemon (state is a tag) int64
cpu-totalusage Total CPU time consumed (container name is a tag) int64
mem-usage current res_counter usage for memory (container name is a tag) loat64
network-bytes-received Bytes received (container name is a tag) int64
network-errors-received Received errors (container name is a tag) int64
network-dropped-received Incoming packets dropped (container name is a tag) int64
network-bytes-sent Bytes sent (container name is a tag) int64
network-errors-sent Sent errors (container name is a tag) int64
network-dropped-sent Outgoing packets dropped (container name is a tag) int64
network-count the number of networks in the Docker daemon int64
volume-count the number of volumes in the Docker daemon int64
volume-size the number of bytes used by all the volumes in the Docker daemon int64
image-count the number of images in the Docker daemon int64
image-size the number of bytes used by all the images in the Docker daemon int64

Tags

name description
name The name of a container, e.g. "goofy_hodgkin"
interface The name of a container interface, e.g. "eth0"

Redis statistics

Redis collects statistics from a Redis store.

Configuration

name description type required
address the address of the target redis server to gather metric from string required

Results

name description type
connected-clients The number of clients currently connected to this Redis server uint64
blocked-clients The number of clients pending on a blocking call uint64
used-memory The amout of memory allocated by this Redis server, in bytes uint64
mem-frag-ratio The ratio between the memory allocated by Redis and the memory as seen by the operating system. See Redis INFO documentation for details. float32
cache-hit-ratio The cache hit ratio is the ratio between the # cache hits and the # of key requests. float32
uptime The time since the Redis server was launched, in seconds uint64
changes-since-last-save The number of changes since the last time the database was saved to disk. The number of changes that would be lost upon restart. uint64
last-save-time The UNIX timestamp of the last time the database was saved to disk. uint64
ops-per-sec The number of commands processed by the Redis server per second. uint64
rejected-connections The number of connections rejected because of the maximum connections limit. uint64
input-kbps The incoming bandwith usage of the Redis server, in kilobytes per second. uint64
output-kbps The outgoing bandwith usage of the Redis server, in kilobytes per second. uint64
expired-keys Number of keys that have been removed when reaching their expiration date uint64
evicted-keys Number of keys removed (evicted) due to reaching maximum memory. uint64
master-last-io Time in seconds since the last interaction with the master Redis server uint64
master-link-status The current status of the link to the master Redis server string
master-link-down-since The time in seconds since the link between master and slave is down uint64
connected-slaves The number of slave instances connected to the master Redis server uint64

Uptime

Uptime collects the uptime and the bootime of a server.

Configuration

None.

Results

name description type
uptime System uptime (number of seconds since last boot) uint64
boottime System last boot time (expressed in seconds since epoch) uint64

InfluxDB

InfluxDB collects statistics from an InfluxDB server.

Configuration

name description type required
address the address of the target influxdb server to collect metrics about string required

Results

Runtime Statistics

Tracks a subset of the statistics exposed by the Golang memory allocator stats

name description type
runtime-alloc Alloc is bytes of allocated heap objects uint64
runtime-frees Frees is the cumulative count of heap objects freed uint64
runtime-heap-alloc HeapAlloc is bytes of allocated heap objects uint64
runtime-heap-idle HeapIdle is bytes in idle (unused) spans uint64
runtime-heap-in-use HeapInuse is bytes in in-use spans uint64
runtime-heap-objects HeapObjects is the number of allocated heap objects uint64
runtime-heap-released HeapReleased is bytes of physical memory returned to the OS uint64
runtime-heap-sys HeapSys is bytes of heap memory obtained from the OS uint64
runtime-lookups Lookups is the number of pointer lookups performed by the runtime uint64
runtime-mallocs Mallocs is the cumulative count of heap objects allocated uint64
runtime-num-gc NumGC is the number of completed GC cycles uint32
runtime-num-goroutine NumGoroutine returns the number of goroutines that currently exist int
runtime-pause-total-ns PauseTotalNs is the cumulative nanoseconds in GC stop-the-world pauses since the program started uint64
runtime-sys Sys is the total bytes of memory obtained from the OS uint64
runtime-total-alloc TotalAlloc is cumulative bytes allocated for heap objects uint64

QueryExecutor

Tracks statistics about the query executor portion of the InfluxDB engine.

name description type
qe-queriesActive queriesActive tracks the number of queries being handled at this instant in time int
qe-queriesExecuted Number of queries that have been executed (started) int
qe-queriesFinished Number of queries that have finished int
qe-queryDurationNs queryDurationNs tracks the cumulative wall time, in nanoseconds, of every query executed int
qe-recoveredPanics Number of panics recovered by Query Executor int

Write

Tracks statistics about writes at a system level.

name description type
write-pointreq pointReq is incremented for every point that is attempted to be written, regardless of success int
write-pointreqlocal pointReqLocal is incremented for every point that is attempted to be written into a shard, regardless of success int
write-req req is incremented every time a batch of points is attempted to be written, regardless of success int
write-subwritedrop subWriteDrop is incremented every time a batch write to a subscriber is dropped due to contention or write saturation int
write-subwriteok subWriteOk is incremented every time a batch write to a subscriber succeeds int
write-writedrop writeDrop is incremented for every point dropped due to having a timestamp that does not match any existing retention policy int
write-writeerror writeError is incremented for every batch that was attempted to be written to a shard but failed int
write-writeok writeOk is incremented for every batch that was successfully written to a shard int
write-writetimeout writeTimeout is incremented every time a write failed due to timing out int

Subscriber

Tracks subscriber statistics.

name description type
subscriber-createfailures int
subscriber-pointswritten pointsWritten tracks the number of points successfully written to subscribers int
subscriber-writefailures writeFailures tracks the number of batches that failed to send to subscribers int

Continous Query

Tracks statistics about the Continuous Query executor.

name description type
cq-queryfail queryFail is incremented whenever a continuous query is executed but fails int
cq-queryok queryOk is incremented whenever a continuous query is executed without a failure int

HTTPD

Tracks statistics about the InfluxDB HTTP server.

name description type
httpd-authfail authFail indicates how many HTTP requests were aborted due to authentication being required but unsupplied or incorrect int
httpd-clienterror clientError is incremented every time InfluxDB sends an HTTP response with a 4XX status code int
httpd-pingreq pingReq is incremented every time InfluxDB serves the /ping HTTP endpoint int
httpd-pointswrittendropped Number of points dropped by the storage engine int
httpd-pointswrittenfail pointsWrittenFail is incremented for every point (not every batch) that was accepted by the /write HTTP endpoint but was unable to be persisted int
httpd-pointswrittenok pointsWrittenOK is incremented for every point (not every batch) that was accepted by the /write HTTP endpoint and persisted successfully int
httpd-queryreq queryReq is incremented every time InfluxDB serves the /query HTTP endpoint int
httpd-queryreqdurationns queryReqDurationNs tracks the cumulative wall time, in nanoseconds, of every query served int
httpd-queryrespbytes queryRespBytes is increased for every byte InfluxDB sends in a successful query response int
httpd-recoveredpanics Number of panics recovered by HTTP handler int
httpd-req req is incremented for every HTTP request InfluxDB receives int
httpd-reqactive reqActive is incremented when InfluxDB begins accepting an HTTP request and is decremented whenever InfluxDB finishes serving that request int
httpd-reqdurationns reqDurationNs tracks the cumulative wall time, in nanoseconds, of every request served int
httpd-servererror serverError is incremented every time InfluxDB sends an HTTP response with a 5XX status code int
httpd-statusreq statusReq is incremented every time InfluxDB serves the /status HTTP endpoint int
httpd-writereq writeReq is incremented every time InfluxDB serves the /write HTTP endpoint int
httpd-writereqactive writeReqActive tracks the number of write requests over HTTP being handled at this instant in time int
httpd-writereqbytes writeReqBytes tracks the total number of bytes of line protocol received by the /write endpoint int
httpd-writereqdurationns writeReqDurationNs tracks the cumulative wall time, in nanoseconds, of every write request served int

TLS Certificate

TLS Certificate collects information about a TLS Certificate.

Configuration

name description type required
host URL of the TLS host string required

Results

name description type
expires-in time before the TLS certificate expires, in nanoseconds int64
chain-expires-in time before any TLS certificate in the chain expires, in nanoseconds int64
hash the TLS certificate's hash string
signature-list list of the TLS certificat's signature algorithm in the full chain string

NTP

NTP returns metadata from an NTP time server.

Configuration

name description type required
address address of an NTP server string required

Results

name description type
clock-offset the estimated offset of the local system clock relative to the server's clock, in nanoseconds int64
rtt an estimate of the round-trip-time delay between the client and the server, in nanoseconds int64

Memcached

Memcached collects statistics from a memcached server.

Configuration

name description type required
address address of a memcached server string required

Results

name description type
uptime Number of seconds the Memcached server has been running since last restart. uint64
curr_connections Number of open connections to this Memcached server, should be the same value on all servers during normal operation. This is something like the count of mySQL's "SHOW PROCESSLIST" result rows. uint64
reserved_fds Number of misc fds used internally uint64
cmd_get Number of "get" commands received since server startup not counting if they were successful or not. uint64
cmd_set Number of "set" commands serviced since startup. uint64
cmd_flush The "flush_all" command clears the whole cache and shouldn't be used during normal operation. uint64
cmd_touch Cumulative number of touch reqs uint64
get_hits Number of successful "get" commands (cache hits) since startup, divide them by the "cmd_get" value to get the cache hitrate. uint64
get_misses Number of failed "get" requests because nothing was cached for this key or the cached value was too old. uint64
get_expired Number of items that have been requested but had already expired uint64
get_flushed Number of items that have been requested but have been flushed via flush_all uint64
delete_misses Number of "delete" commands for keys not existing within the cache. These 107k failed deletes are deletions of non existent race keys (see above). uint64
delete_hits Stored keys may be deleted using the "delete" command, this system doesn't delete cached data itself, but it's using the Memcached to avoid recaching-races and the race keys are deleted once the race is over and fresh content has been cached. uint64
bytes_read Total number of bytes received from the network by this server. uint64
bytes_written Total number of bytes send to the network by this server. uint64
bytes Number of bytes currently used for caching items, this server currently uses ~6 MB of it's maximum allowed (limit_maxbytes) 1 GB cache size. uint64
curr_items Number of items currently in this server's cache. The production system of this development environment holds more than 8 million items. uint64
evictions Number of objects removed from the cache to free up memory for new items because Memcached reached it's maximum memory setting (limit_maxbytes). uint64

PHP-FPM

PHP-FPM collects statistics from a PHP-FPM server.

Configuration

name description type required
address address of a PHP-FPM server string required
url URL of the status endpoint (default: /status) string optional

Results

name description type
listen_queue The number of request in the queue of pending connections. uint64
idle_processes The number of idle processes. uint64
active_processes The number of active processes uint64
slow_requests Enable php-fpm slow-log before you consider this. If this value is non-zero you may have slow php processes. uint64

MySQL

MySQL collects statistics from a MySQL server.

Configuration

name description type required
address MySQL server address (host:port) string required
login MySQL user account string required
password MySQL user password string required

Results

name description type
aborted-clients The number of connections that were aborted because the client died without closing the connection properly. uint64
aborted-connects The number of failed attempts to connect to the MySQL server. uint64
binlog-cache-use The number of transactions that used the binary log cache. uint64
binlog-stmt-cache-use The number of nontransactional statements that used the binary log statement cache. uint64
bytes-received The number of bytes received from all clients. uint64
bytes-sent The number of bytes sent to all clients. uint64
com-begin The number of times begin statement has been executed. uint64
com-change-db The number of times change-db statement has been executed. uint64
com-change-master The number of times change-master statement has been executed. uint64
com-commit The number of times commit statement has been executed. uint64
com-create-db The number of times create-db statement has been executed. uint64
com-delete The number of times delete statement has been executed. uint64
com-delete-multi The number of times delete-multi statement has been executed. uint64
com-insert The number of times insert statement has been executed. uint64
com-rollback The number of times rollback statement has been executed. uint64
com-select The number of times select statement has been executed. uint64
com-stmt-execute The number of times stmt-execute statement has been executed. uint64
com-stmt-fetch The number of times stmt-fetch statement has been executed. uint64
com-truncate The number of times truncate statement has been executed. uint64
com-update The number of times update statement has been executed. uint64
connection-errors-accept The number of errors that occurred during calls to accept() on the listening port. uint64
connection-errors-internal The number of connections refused due to internal errors in the server, such as failure to start a new thread or an out-of-memory condition. uint64
connection-errors-max-connections The number of connections refused because the server max-connections limit was reached. uint64
connection-errors-peer-address The number of errors that occurred while searching for connecting client IP addresses. uint64
connection-errors-select The number of errors that occurred during calls to select() or poll() on the listening port. uint64
connection-errors-tcpwrap The number of connections refused by the libwrap library. uint64
connections The number of connection attempts (successful or not) to the MySQL server. uint64
created-tmp-disk-tables The number of internal on-disk temporary tables created by the server while executing statements. uint64
created-tmp-tables The number of internal temporary tables created by the server while executing statements. uint64
flush-commands The number of times the server flushes tables, whether because a user executed a FLUSH TABLES statement or due to internal server operation. uint64
handler-read-first The number of times the first entry in an index was read. If this value is high, it suggests that the server is doing a lot of full index scans. uint64
handler-read-key The number of requests to read a row based on a key. If this value is high, it is a good indication that your tables are properly indexed for your queries. uint64
handler-read-last The number of requests to read the last key in an index. uint64
handler-read-next The number of requests to read the next row in key order. uint64
handler-read-prev The number of requests to read the previous row in key order. uint64
handler-read-rnd The number of requests to read a row based on a fixed position. This value is high if you are doing a lot of queries that require sorting of the result. uint64
handler-read-rnd-next The number of requests to read the next row in the data file. This value is high if you are doing a lot of table scans. uint64
innodb-buffer-pool-pages-data The number of pages in the InnoDB buffer pool containing data. uint64
innodb-buffer-pool-pages-dirty The current number of dirty pages in the InnoDB buffer pool. uint64
innodb-buffer-pool-pages-flushed The number of requests to flush pages from the InnoDB buffer pool. uint64
innodb-buffer-pool-pages-free The number of free pages in the InnoDB buffer pool. uint64
innodb-buffer-pool-pages-misc The number of pages in the InnoDB buffer pool that are busy because they have been allocated for administrative overhead, such as row locks or the adaptive hash index. uint64
innodb-data-fsyncs The number of fsync() operations so far. uint64
innodb-data-reads The total number of data reads (OS file reads). uint64
innodb-data-writes The total number of data writes. uint64
innodb-log-waits The number of times that the log buffer was too small and a wait was required for it to be flushed before continuing. uint64
innodb-log-writes The number of physical writes to the InnoDB redo log file. uint64
innodb-page-size InnoDB page size (default 16KB). uint64
innodb-pages-read The number of pages read from the InnoDB buffer pool by operations on InnoDB tables. uint64
innodb-pages-written The number of pages written by operations on InnoDB tables. uint64
innodb-row-lock-time-max The maximum time to acquire a row lock for InnoDB tables, in milliseconds. uint64
innodb-row-lock-waits The number of times operations on InnoDB tables had to wait for a row lock. uint64
key-blocks-unused The number of unused blocks in the MyISAM key cache. uint64
key-blocks-used The number of used blocks in the MyISAM key cache. uint64
key-reads The number of physical reads of a key block from disk into the MyISAM key cache. uint64
key-writes The number of physical writes of a key block from the MyISAM key cache to disk. uint64
locked-connects The number of attempts to connect to locked user accounts. uint64
open-files The number of files that are open. This count includes regular files opened by the server. uint64
open-streams The number of streams that are open (used mainly for logging). uint64
open-tables The number of tables that are open. uint64
prepared-stmt-count The current number of prepared statements. uint64
queries The number of statements executed by the server. uint64
select-full-join The number of joins that perform table scans because they do not use indexes. uint64
select-full-range-join The number of joins that used a range search on a reference table. uint64
slow-queries The number of queries that have taken more than long-query-time seconds. uint64
uptime The number of seconds that the server has been up. uint64

Elasticsearch

Elasticsearch collects statistics from an Elasticsearch store.

Configuration

name description type required
address address of an Elasticsearch server string required
timeout duration before the HTTP request times out time.Duration optional

Results

name description type
cluster-name ES cluster name string
status Health level of the cluster string
timed-out ? boolean
number-of-nodes Number of nodes in the cluster float64
number-of-data-nodes Number of data nodes in the cluster float64
active-primary-shards Number of primary shards in the cluster float64
active-shards Number of active shards in the cluster float64
relocating-shards Number of shards being relocated in the cluster float64
initializing-shards Number of shards being initialized in the cluster float64
unassigned-shards Number of shards currently unassigned in the cluster float64
delayed-unassigned-shards Number of unassigned shards which allocation is delayed in the cluster float64
number-of-pending-tasks Number of pending task in the cluster float64
number-of-in-flight-fetch Number of in-flight fetch operations float64
task-max-waiting-in-queue-millis Maximum number of milliseconds a task is waiting in queue float64
indices-count Count of indices in the cluster float64
docs-count Count of documents in the cluster float64
docs-deleted Count of deleted documents in the cluster float64
store-size Size of the cluster storage float64
querycache-memory Memory allocated to the cluster query cache float64
querycache-count-hit Number of hits on the cluster query cache float64
querycache-count-miss Number of miss on the cluster query cache float64
querycache-count-cache Number of cache operations on the cluster query cache float64
querycache-evictions Number of evictions from the cluster query cache float64
jvm-heap-max JVM heap max size on the cluster float64
jvm-heap-used JVM heap used size on the cluster float64
jvm-threads Number of JVM threads on the cluster float64

Nginx

Nginx collects values from Nginx stub status module.

Configuration

name description type required
url full URL of the Nginx server status string required
timeout duration before the HTTP request times out time.Duration optional

Results

name description type
active-connections The number of active client connections including waiting connections. int
accept-connections The total number of accepted connections. int
handled-connections The total number of handled connections. int
requests The total number of client requests. int
reading The current number of connections where nginx is reading the request header. int
writing The current number of connections where nginx is writing the response back to the client. int
waiting The current number of idle client connections waiting for a request. int

MongoDB

mongodb collects metrics from serverStatus command.

Configuration

name description type required
address address of the target mongodb server to gather metrics from string required
databases the comma separated list of databases to gather metrics from string optional
username a username to authenticate to the MongoDB server string optional
password a password to authenticate to the MongoDB server string optional

Results

name description type
uptime The number of seconds that the current MongoDB process has been active. int64
connections_current The number of incoming connections from clients to the database server int32
connections_available The number of unused incoming connections available int32
gl_clients_readers The number of the active client connections performing read operations int32
gl_clients_writers The number of the active client connections performing write operations int32
network_in The number of bytes that reflects the amount of network traffic received by this database int64
network_out The number of bytes that reflects the amount of network traffic sent from this database int64
ops_insert The total number of insert operations received since the mongod instance last started int32
ops_query The total number of query operations received since the mongod instance last started int32
ops_update The total number of update operations received since the mongod instance last started int32
ops_delete The total number of delete operations received since the mongod instance last started int32
ops_getmore The total number of getmore operations received since the mongod instance last started int32
ops_command The total number of command operations received since the mongod instance last started int32
mem_resident The value of mem.resident is roughly equivalent to the amount of RAM, in megabytes (MB), currently used by the database process int32
mem_virtual mem.virtual displays the quantity, in megabytes (MB), of virtual memory used by the mongod process int32

PostgreSQL

postgresql collects metrics from the statistics collector command of a PostgreSQL server.

Configuration

name description type required
address address of the target PostgreSQL server to gather metrics from string required
username a username to authenticate to the PostgreSQL server string optional
password a password to authenticate to the PostgreSQL server string optional
sslmode SSL Mode of the PostgreSQL server (see https://www.postgresql.org/docs/10/static/libpq-ssl.html) string optional

Results

name description type
current_connections Number of backends currently connected to this database uint
max_connections The maximum number of client connections allowed. uint
shared_buffer_hits Number of times disk blocks were found already in the buffer cache, so that a read was not necessary uint
shared_buffer_reads Number of disk blocks read uint
temp_files_count Number of temporary files created by queries uint
temp_file_bytes Total amount of data written to temporary files by queries uint
rows_returned Number of rows returned by queries uint
rows_fetched Number of rows fetched by queries uint
rows_inserted Number of rows inserted by queries uint
rows_updated Number of rows updated by queries uint
rows_deleted Number of rows deleted by queries uint
deadlocks Number of deadlocks detected uint
index_size Total size of index on disk uint
table_size Total size of table on disk uint
toast_size Total size of toast on disk uint
n_dead_tup Estimated number of dead rows uint
n_live_tup Estimated number of live rows uint
checkpoints_requested Number of requested checkpoints that have been performed uint
checkpoints_scheduled Number of scheduled checkpoints that have been performed uint
buffers_backend Number of buffers written directly by a backend uint
buffers_background Number of buffers written by the background writer uint
buffers_checkpoint Number of buffers written during checkpoints uint

Cassandra

cassandra returns the metrics from a Cassandra node, via Jolokia gateway to JMX.

Configuration

name description type required
address address of the jolokia bridge to cassandra server to gather metric from string required
timeout duration before the HTTP request times out time.Duration optional, default 0s

Results

name description type
heap-memory-init Amount of heap memory in bytes that the JVM initially requests from the OS float64
heap-memory-committed Amount of heap memory in bytes that is committed for the JVM to use float64
heap-memory-max Maximum amount of heap memory in bytes that can be used for memory management float64
heap-memory-used Amount of used heap memory in bytes float64
nonheap-memory-init Amount of non-heap memory in bytes that the JVM initially requests from the OS float64
nonheap-memory-committed Amount of non-heap memory in bytes that is committed for the JVM to use float64
nonheap-memory-max Maximum amount of non-heap memory in bytes that can be used for memory management float64
nonheap-memory-used Amount of used non-heap memory in bytes float64
connected-clients Number of clients connected to this nodes native protocol server float64
key-cache-hits Total number of cache hits for partition to sstable offsets float64
key-cache-requests Total number of cache requests for partition to sstable offsets float64
key-cache-entries Total number of cache entries for partition to sstable offsets float64
key-cache-size Total size of occupied cache, in bytes, for partition to sstable offsets float64
key-cache-capacity Cache capacity, in bytes, for partition to sstable offsets float64
row-cache-hits Total number of cache hits for rows kept in memory float64
row-cache-requests Total number of cache requests for rows kept in memory float64
row-cache-entries Total number of cache entries for rows kept in memory float64
row-cache-size Total size of occupied cache, in bytes, for rows kept in memory float64
row-cache-capacity Cache capacity, in bytes, for rows kept in memory float64
read-totallatency Total read latency since starting float64
write-totallatency Total write latency since starting float64
read-timeouts Number of timeouts encountered during read float64
write-timeouts Number of timeouts encountered during write float64
read-unavailables Number of unavailable exceptions encountered during read float64
write-unavailables Number of unavailable exceptions encountered during write float64
read-failures Number of failures encountered during read float64
write-failures Number of failures encountered during write float64
commitlog-pendingtasks Number of commit log messages written but yet to be fsync’d float64
commitlog-totalsize Current size, in bytes, used by all the commit log segments float64
compaction-completedtasks Number of completed compactions since server [re]start float64
compaction-pendingtasks Estimated number of compactions remaining to perform float64
compaction-bytescompacted Total number of bytes compacted since server [re]start float64
storage-load Size, in bytes, of the on disk data size this node manages float64
storage-exceptions Number of internal exceptions caught float64
tp-compactionexecutor-activetasks Number of tasks being actively worked on Compactions float64
tp-antientropystage-activetasks Number of tasks being actively worked on Builds merkle tree for repairs float64
tp-countermutationstage-pendingtasks Number of queued tasks queued up on counter writes float64
tp-countermutationstage-currentlyblockedtasks Number of tasks that are currently blocked due to queue saturation but on retry will become unblocked on counter writes float64
tp-mutationstage-pendingtasks Number of queued tasks queued up on all other writes float64
tp-mutationstage-currentlyblockedtasks Number of tasks that are currently blocked due to queue saturation but on retry will become unblocked on all other writes float64
tp-readrepairstage-pendingtasks Number of queued tasks queued up on ReadRepair float64
tp-readrepairstage-currentlyblockedtasks Number of tasks that are currently blocked due to queue saturation but on retry will become unblocked on ReadRepair float64
tp-readstage-pendingtasks Number of queued tasks queued up on Local reads float64
tp-readstage-currentlyblockedtasks Number of tasks that are currently blocked due to queue saturation but on retry will become unblocked on Local read float64
tp-requestresponsestage-pendingtasks Number of queued tasks queued up on Coordinator requests to the cluster float64
tp-requestresponsestage-currentlyblockedtasks Number of tasks that are currently blocked due to queue saturation but on retry will become unblocked on Coordinator requests to the cluster float64
table-livediskspaceused Disk space used by SSTables belonging to this table (in bytes) float64
table-totaldiskspaceused Total disk space used by SSTables belonging to this table, including obsolete ones waiting to be GC’d float64
table-readlatency Local read latency for this keyspace float64
table-coordinatorreadlatency Coordinator read latency for this keyspace float64
table-writelatency Local write latency for this keyspace float64
table-readtotallatency Local read latency for this keyspace since starting float64
table-writetotallatency Local write latency for this keyspace since starting float64

Tags

name description
keyspace The keyspace of the metric, e.g. "system"

SNMP

SNMP probe returns the values of OIDs from an SNMP server.

Configuration

name description type required
address The address of the target SNMP server to get metric from string required
port The port of the target SNMP server to get metric from int optional, default 161
community The community of the target SNMP server to get metric from string optional, default public
timeout duration before the SNMP request times out time.Duration optional, default 10s
oid A comma-separated list of OIDs to get string required

Results

name description type
value The value returned by the SNMP server string or math/big.Int

Tags

name description
keyspace The keyspace of the metric, e.g. "system"

SNMP

SNMP probe returns the values of OIDs from an SNMP server.

Configuration

name description type required
address The address of the target SNMP server to get metric from string required
port The port of the target SNMP server to get metric from float64 optional, default 161
community The community of the target SNMP server to get metric from string optional, default public
timeout duration before the SNMP request times out time.Duration optional, default 10s
oid A comma-separated list of OIDs to get string required

Results

name description type
value The value returned by the SNMP server string or math/big.Int

Tags

name description
keyspace The keyspace of the metric, e.g. "system"

Configuring States

A state is the evaluated from the conditions defined in a check on the result of a probe. It can have 4 different values:

  • Unknown: when the state can't be determined, either because there's no value, or because the conditions could not be evaluated;
  • OK: when the result is considered as normal;
  • Warning: when the result is considered as worth attention;
  • Critical: when the result shows an issue that needs a reaction;
  • Missing Data: when the server has not received any results from the agent for too long.
  • Error: when the probe encountered an error while gathering metrics

Algorithms

To determine in which state the result of probe is, you can choose between 3 different algorithms.

Threshold

The value of one metric is compared to a fixed reference (e.g. free memory is above 1GB).

Trend

The trend of the last X values of the metric is compared to a fixed reference (e.g. the rate of change of HTTP/500 errors over the last 25 occurences is below 2).

History

The value of one metric is compared to the value of the same metric taken Y seconds ago (e.g. the count of successful checkouts compared to the same count 24 hours ago).

Conditions

Each algorithm can use any number of conditions, bound together with a global logical operator AND or OR (meaning respectively all conditions should be true or any condition should be true).

Here is the lists of the supported comparison operators.

Numeric comparison

Less than (<) Less than or equal () Equal (=) Not equal () Greater than or equal () Greater than (>)

String comparison

Equal (=) Not equal () Match (regular expression match) Find (regular expression find)

Configuring Alerts

An alert may happen when the state of a check has been calculated from a result.

Algorithms

There are different algorithms to evaluate the alert conditions.

State change

If the current state differs from the previous one.

State recurrence

If the current state is not OK and lasts for more than X occurences.

State flap

If the probe has state changing more than X times in the last Y seconds.

Channels

When an alert occurs, it sends a notification through a "Channel". Channels can be created and configured in the Create Alert dialog. The configuration of the Channel depends on the type:

Email

Alerts can be notified via email. The outgoing email server (SMTP) settings are configured on the server.

Configuration options

  • dest: the To: field of the sent email, a comma-separated list of email addresses
  • cc: the Cc: field of the sent email, a comma-separated list of email addresses (optional)

Slack

Alerts can be notified on Slack, using an incoming webhook. Create an Slack incoming webhook by following the official instructions.

Configuration options

  • webhook_url: the URL of the incoming webhook for your Slack integration

PagerDuty

Alerts can be forwarded to PagerDuty to integrate with your operations management platform.

Configuration options

  • from: the email address of the person raising this incident
  • token: a PagerDuty API token
  • service: the ID of the PagerDuty service this incident is raised for

SMS

SMS alerting is provided via Twilio. You need a Twilio account to create an SMS alert Channel.

Configuration options

  • account-sid: a Twilio account SID
  • token: a Twilio Auth token
  • from: a Twilio phone number or short code that sends this message
  • messaging-service-sid: a Twilio Messaging service SID
  • to: a Phone number to send the message to

Note: Only one of from or messaging-service-sid is required.

HTTP

Alerts can be sent to an arbitrary HTTP endpoint. This Channel is used to build custom integrations with other 3rd-party services or with your own in-house tools.

Configuration options

  • url: the URL of the HTTP endpoint to send the alert to

The HTTP Channel POSTs a JSON representation of the Alert to the URL. The following properties can be found on the JSON:

  • title: a title for the Alert
  • message: why the Alert was triggered
  • target: the display name of the Target that exhibits abnormal conditions
  • agent: the display name of the Agent that ran the Probe on the Target
  • probe: the name of the Probe that got abnormal results
  • state: the current state, WARNING or CRITICAL
  • reason: the user-supplied reason in the check(s) that weren't validated
  • critical: true if state is CRITICAL
  • warning: true if state is WARNING