The first part of this documentation is to explain how to install, run and update Panto. Then, every Probe
will be presented in depth. Finally, we’ll see how to configure States
and Alerts
in the application.
⚠️ Note: all commands prepended with a #
prompt might need to be run as root
or using sudo
. The commands started with a $
prompt could be run as any unprivileged user, except stated otherwise.
Panto is distributed in different flavors, that you can pick based on your needs or your taste. If you want to run the full Panto suite (panto-server
and panto-agent
), or if you are in a hurry to see it live, we recommend the Docker Compose setup. If you are in the SaaS configuration, and only need to run the Agent on your Debian server, we recomment you choose the packaged setup, with apt
.
Panto hosts 4 Docker images on the GitLab registry.
registry.gitlab.com/pantomath-io/panto/panto-server
: The Panto serverregistry.gitlab.com/pantomath-io/panto/panto-agent
: The Panto agentregistry.gitlab.com/pantomath-io/panto/panto-web
: The Panto web clientregistry.gitlab.com/pantomath-io/panto/panto
: All-in-one imageThe images named registry.gitlab.com/pantomath-io/panto/panto-server
or registry.gitlab.com/pantomath-io/panto/panto-web
are the stable images built on the master
branch. To get the latest development build use registry.gitlab.com/pantomath-io/panto/panto-server:edge
or registry.gitlab.com/pantomath-io/panto/panto-web:edge
. Other tags include :latest
(aliases of the master
build), :x.y
and :x.y.z
where X.Y.Z is a specific version number (e.g. :1.1.2
).
⚠️ Running Panto in Docker stores all your data to the containers. Destroying the containers will delete your data. Make sure to save all your data before destroying the containers, or use Docker volumes.
The Panto server image runs a server in a docker container. It exposes port 7575 for the gRPC API and port 7576 for the REST API. Set the INFLUXDB_ADDRESS
environment variable inside the container to a URL where the container can reach an InfluxDB server.
panto-server
exposes 2 TCP ports by default: * 7575
for its gRPC interface * 7576
for its REST interface
Run the container by typing:
$ docker run -p 7575:7575 -p 7576:7576 --env INFLUXDB_ADDRESS=http://influxdb:8086 registry.gitlab.com/pantomath-io/panto/panto-server
Note that you probably want to configure the panto
with a persistent configuration file. Use Docker volumes to mount the configuration file in /etc/panto/panto.yaml
.
The Panto agent image runs an Agent in a docker container. Set the PANTO_SERVER_ADDRESS
environment variable inside the container to a host:port
where the container can reach a Panto server.
Run the container by typing:
$ docker run --env PANTO_SERVER_ADDRESS=localhost:7575 registry.gitlab.com/pantomath-io/panto/panto-agent
Note that you probably want to configure the panto-agent
with a persistent configuration file. Use Docker volumes to mount the configuration file in /etc/panto/panto-agent.yaml
.
The Panto web client image runs a static web server in a docker container. It exposes port 8080. Set the PANTO_ADDRESS
environment variable inside the container to a URL where the user’s web browser can reach a Panto server’s REST API.
panto-web
exposes 1 TCP port by default: * 8080
for the web server
Run the container by typing:
$ docker run -p 8080:8080 --env PANTO_ADDRESS=http://panto.yourdomain.com registry.gitlab.com/pantomath-io/panto/panto-web
The “all-in-one” image contains:
This “all-in-one” exposes 3 TCP ports by default: * 7575
for panto-server
, the gRPC interface * 7576
for panto-server
, the REST interface * 8080
for panto-web
, the web application
Run the container by typing:
$ docker run -p 7575:7575 -p 7576:7576 -p 8080:8080 --env PANTO_ADDRESS=http://panto.yourdomain.com registry.gitlab.com/pantomath-io/panto/panto
To get the Agent working, please refer to the section below.
A more modular, multi-container setup is also available, using Docker Compose. After making sure docker-compose
is installed, download the docker-compose.yml
file to your computer and type:
$ docker-compose up
from the same directory as the docker-compose.yml
file.
This will download all the necessary images from the official repositories, and run all containers and connect the pieces for you.
By default, this will start all the components (the panto-server
, the TSDB, one panto-agent
and the panto-web
), and will expose the following ports:
TCP/7575
: panto-server
, gRPC interfaceTCP/7576
: panto-server
, REST interfaceTCP/8080
: panto-web
, the web applicationWhen running for the first time, you need to initialize the persistent database. Use the following command, once the docker compose is up:
$ docker exec -it panto-server /usr/bin/panto-ctl init data
To get the Agent working, please refer to the section below.
To install Panto, first make sure you have installed all of the prerequisites.
Panto uses InfluxDB to store results from Probes. Ensure you have a running instance of InfluxDB reachable by Panto.
Panto uses Caddy to serve the web client. Download and install Caddy from the official website to deploy the web client.
Debian packages are available in a dedicated apt repository:
$ wget -qO- https://packages.panto.app/pantomath.key | sudo apt-key add -
$ echo "deb https://packages.panto.app/debian stable main" | sudo tee /etc/apt/sources.list.d/panto.list
$ sudo apt update
You can now install the packages:
$ sudo apt install panto
Pre-build panto binaries (linux/x86_64 only) can be downloaded directly from the downloads page. To install Panto, first make sure you have installed all of the prerequisites.
The recommended setup is to run Panto as a dedicated non privileged user. Create this user and setup the home directory for Panto:
[root]# groupadd panto
[root]# useradd -g panto -d /opt/panto -m -s /bin/false panto
Assuming default locations, it is also recommended to create the main directories Panto will need:
[root]# mkdir -p /var/lib/panto /var/log/panto /etc/panto
[root]# chown -R panto:panto /var/lib/panto /var/log/panto /etc/panto
Download the release from the releases page. Official binary distributions are only available for Linux 4.9 and later on x86_64. For other platforms, you will have to build from source.
After downloading the archive, extract it to a destination path. For example, using /opt/panto
:
[panto]$ tar -C /opt/panto -xzf panto-$(VERSION)-$(OS)_$(ARCH).tar.gz
[panto]$ ln -sf /opt/panto/panto-$(VERSION)-$(OS)_$(ARCH)/* /opt/panto/
⚠️ These commands should be run as panto
, the dedicated Panto user, who will run the server/agent.
The convenient way to run Panto is to integrate it in systemd
. Assuming default locations, this systemd
unit file should do. Copy it to /etc/systemd/system/panto.service
and adapt the values.
systemd
needs to be reloaded:
[root]# chmod 664 /etc/systemd/system/panto.service
[root]# systemctl daemon-reload
And you need to enable
the unit, to auto-start it on reboot:
[root]# systemctl enable panto
The Panto daemon can now be controled as any other systemd
service:
[root]# systemctl status panto
[root]# systemctl start panto
[root]# systemctl stop panto
This can be adapted to run the panto-agent
daemon.
For init users, the convenient way to run panto-agent
is to integrate it in initd
. Assuming default locations, this initd
script should do. Copy it to /etc/init.d/panto-agent
and adapt the values.
initd
needs to be reloaded:
[root]# chmod 755 /etc/init.d/panto-agent
[root]# update-rc.d panto-agent defaults
The panto-agent
daemon can now be controled as any other initd
script:
[root]# service panto-agent start
[root]# service panto-agent status
[root]# service panto-agent stop
This can be adapted to run the panto
daemon.
For a detailed walkthrough, see the developer documentation.
The panto-ctl
tool can be used to initialize your server environment and configuration file before running Panto for the first time. Type:
[panto]$ panto-ctl init
⚠️ This command should be run as panto
, the dedicated Panto user, who will run the server/agent.
⚠️ If you are using a Docker version, you need to run a command like:
$ docker exec -it panto-server /usr/bin/panto-ctl init data
A command-line dialog will guide you through the steps of initializing your installation, setting up the database, and configuring the server. See the panto-ctl
documentation for more details.
The Agent does not need to be initialized, but it must be configured with its name, given by the Server. So an Agent must be declared on the Server (the API returns its name
), and the Agent should run with this name
configured.
In the package configuration or binaries configuration or Docker container, the name
should be used in the configuration file (/etc/panto/panto-agent.yaml
).
In the all-in-one or the Docker Compose configuration, when running for the first time, the panto-agent
is not declared on the panto-server
and won’t be working properly. To setup the panto-agent
, you need to:
curl http://localhost:7576/v1/organizations
The result should be something like {"organizations":[{"name":"organizations/pUNH8aZDRki_KQPigL7Eww","display_name":"Organization"}],"next_page_token":""}
. The name
field is the Organization name, from the API standpoint.
curl -XPOST http://localhost:7576/v1/organizations/pUNH8aZDRki_KQPigL7Eww/agents --data '{"agent": {"display_name": "agent.local"}}'
The result should be something like {"name":"organizations/pUNH8aZDRki_KQPigL7Eww/agents/cUITkjXIQbisdhbuj31cdw","display_name":"agent.local","last_activity":null,"last_version":""}
. The name
field is the Agent name, from the API standpoint.
For the panto
and panto-agent
executables, the configuration file format is YAML. Configuration files are hierarchical in nature and paths are separated using a dot ('.'
). E.g. for log.file
:
As a general rule, run any executable with --help
for more information.
panto
The main server executable. Run this to deploy the Panto service on a machine.
Configuration file | Command-line | Environment variable | Usage | Default | Required |
---|---|---|---|---|---|
--quiet , -q |
suppress all output | false |
optional | ||
--version , -V |
display version and exit | false |
optional | ||
--conf |
path to configuration file | optional | |||
verbose |
--verbose , -v |
PANTO_VERBOSE |
set verbosity level (0: silent, 1: warnings and errors, 2: verbose, 3: debug) | 1 |
optional |
log.file |
--log-file |
PANTO_LOG_FILE |
path of a file to log Panto output | optional | |
log.syslog |
PANTO_LOG_SYSLOG |
log Panto output to syslog | false |
optional | |
server.grpc-address |
--grpc-address |
address to bind gRPC API to | :7575 |
required | |
server.rest-address |
--rest-address |
address to bind REST API to | :7576 |
required | |
server.allow-origin |
--allow-origin |
comma-separated list of origin addresses to allow in the via CORS | required | ||
server.certfile |
--certfile |
PANTO_CERTFILE |
path to a TLS certificate file | optional | |
server.certkey |
--certkey |
PANTO_CERTKEY |
path to a TLS private key file | optional | |
server.no-tls |
--no-tls |
disable SSL/TLS | false |
required | |
server-info.public-grpc-address |
PANTO_PUBLIC_GRPC_ADDRESS |
the public address where a client can reach the gRPC API | optional | ||
server-info.public-rest-address |
PANTO_PUBLIC_REST_ADDRESS |
the public address where a client can reach the REST API | optional | ||
influxdb.address |
--influxdb-address |
address of an InfluxDB server | http://localhost:8086 |
required | |
influxdb.database |
--influxdb-database |
name of the InfluxDB database | panto |
required | |
db.path |
--db-path |
path to a SQLite configuration database file | /var/lib/panto/panto.sqlite |
required | |
smtp.server |
PANTO_SMTP_SERVER |
address of a SMTP server | optional | ||
smtp.port |
PANTO_SMTP_PORT |
port to use for SMTP server | 587 |
optional | |
smtp.username |
PANTO_SMTP_USERNAME |
username for SMTP server | optional | ||
smtp.password |
PANTO_SMTP_PASSWORD |
password to use for SMTP server | optional | ||
smtp.from |
PANTO_SMTP_FROM |
email address to send the mails from | hello@panto.app |
optional |
panto-agent
The executable for the agents. Typically runs on a target machine and gathers metrics that it reports to the server. It can also be run on a dedicated machine, to monitor a remote target.
Configuration file | Command-line | Environment variable | Usage | Default | Required |
---|---|---|---|---|---|
--quiet , -q |
suppress all output | false |
optional | ||
--version , -V |
display version and exit | false |
optional | ||
--conf |
path to configuration file | optional | |||
verbose |
--verbose , -v |
PANTO_AGENT_VERBOSE |
set verbosity level (0: silent, 1: warnings and errors, 2: verbose, 3: debug) | 1 |
optional |
log.file |
--log-file |
PANTO_AGENT_LOG_FILE |
path of a file to log output | optional | |
log.syslog |
PANTO_AGENT_LOG_SYSLOG |
log output to syslog | false |
optional | |
agent.name |
--name , -n |
PANTO_AGENT_NAME |
name of the agent in the Panto API | required | |
agent.timeout |
--timeout |
PANTO_AGENT_TIMEOUT |
timeout duration | 45s |
optional |
agent.max-spooled-results |
--max-spooled-results |
PANTO_MAX_SPOOLED_RESULTS |
maximum number of results spooled while server can’t be reached | 100 |
optional |
agent.spooler-dump-path |
--spooler-dump-path |
PANTO_SPOOLER_DUMP_PATH |
path of the dump file for the spooled results | optional | |
server.address |
address of Panto server (host:port) | required | |||
server.certfile |
--certfile |
path to a TLS certificate file | optional | ||
server.no-tls |
--no-tls |
disable SSL/TLS | false |
optional |
panto-ctl
The administration tool for the panto server. Manipulates configuration files and the database. panto-ctl
has 2 subcommands.
init
subcommandRun panto-ctl init
to initialize configuration and database before running Panto for the first time.
Configuration file | Command-line | Environment variable | Usage | Default | Required |
---|---|---|---|---|---|
dry-run |
--dry-run , -n |
do not perform any operations, just output what would be done | false |
optional | |
install-prefix |
--install-prefix |
the default prefix for the Panto installation | / |
optional |
dbconf
subcommandRun panto-ctl dbconf
to perform operations on the database. Subcommands:
check
: makes sure the current version of the configuration database is the configured onelist-migrations
: lists the pending migrations to be applied on the configuration databaseupgrade
: applies all the migrations to the configuration databasedowngrade
: reverts the last migration on the configuration databaseversion
: displays the current version of the configuration databaseConfiguration file | Command-line | Environment variable | Usage | Default | Required |
---|---|---|---|---|---|
db.path |
--db-path |
path to the SQLite database file | required | ||
--conf |
path to the configuration file | optional |
user
subcommandRun panto-ctl user
to manage users for panto-web
. Subcommands:
create
: creates a new user in the databaseupdate
: update an existing user in the databasedelete
: delete an existing user in the databaseConfiguration file | Command-line | Environment variable | Usage | Default | Required |
---|---|---|---|---|---|
db.path |
--db-path |
path to the SQLite database file | required |
The main frontend for Panto, a static website served with Caddy. Set the PANTO_ADDRESS
environment variable inside the container to a URL where the user’s web browser can reach a Panto server’s REST API. Set the (optional) PANTO_GA_ID
environment variable inside the container to a valid Google Analytics tracking ID to enable it.
$ PANTO_GA_ID=UA-01234567-8 PANTO_ADDRESS=http://panto.yourdomain.com caddy -conf=/opt/panto/www/Caddyfile
Depending on your setup, updating Panto is more or less like installing it.
You simply need to refresh the image:
$ docker pull registry.gitlab.com/pantomath-io/panto/panto-server
$ docker stop panto-server
$ docker rm panto-server
$ docker run -p 7575:7575 -p 7576:7576 --env INFLUXDB_ADDRESS=http://influxdb:8086 registry.gitlab.com/pantomath-io/panto/panto-server
Since this is only a combination of images, the procedure is almost the same as for Docker:
$ docker-compose pull
$ docker-compose up --force-recreate --build -d
You can rely on apt
system to upgrade your packages:
$ sudo apt update
$ sudo apt install panto panto-agent
Download the release from the releases page. Official binary distributions are only available for Linux 4.9 and later on x86_64. For other platforms, you will have to build from source.
After downloading the archive, extract it to a destination path. For example, using /opt/panto
:
[panto]$ tar -C /opt/panto -xzf panto$(VERSION)-$(OS)_$(ARCH).tar.gz
⚠️ This command should be run as panto
, the dedicated Panto user, who will run the server/agent.
Once the application itself is up to date, migrations should be applied to update the databases schema.
panto-ctl
is here to help you:
[panto]$ /opt/panto/panto-ctl dbconf upgrade --conf /etc/panto/panto.yaml
⚠️ If you are using a Docker version, you need to run a command like:
$ docker exec -it panto-server /usr/bin/panto-ctl dbconf upgrade --conf /etc/panto/panto.yaml
Probe configuration parameters and results are Go types. Their conversion to JSON is specific to each Go types Marshal
/Unmarshal
functions. Note that time.Duration
has a special case that also follows time.ParseDuration
to allow conversion from both int
and string
types. See Go documentation for details.
Probe tags are always strings.
Ping
sends an ICMP Echo message to a host and reports statistics on the round trip.
name | description | type | required |
---|---|---|---|
address |
Target address. An IP or a hostname. | string |
required |
interval |
The wait time between each packet send. | time.Duration |
optional, default 1s |
count |
The number of ICMP packets to send. | int |
optional, default 4 |
timeout |
The time to run the ping, until it exits. If the timeout occurs before the packet count has been reached, the probe exits anyway. | time.Duration |
optional, default 10s |
name | description | type |
---|---|---|
sent |
the number of packets sent | int |
received |
the number of packets received | int |
min |
the minimum round trip time, in nanoseconds | int64 |
max |
the maximum round trip time, in nanoseconds | int64 |
avg |
the average round trip time, in nanoseconds | int64 |
stddev |
the standard deviation of round trip times, in nanoseconds | int64 |
name | description |
---|---|
address |
the address of the target that was pinged |
Checksum
computes the checksum of a file.
name | description | type | required |
---|---|---|---|
path |
the path of the file to check | string |
required |
hash |
the hash algorithm to use | string (crc32 ,md5 ,sha1 ,sha256 ) |
required |
name | description | type |
---|---|---|
checksum |
the checksum of the file as a hexadecimal string | string |
name | description |
---|---|
path |
the path of the file that was checked |
CPU
collects CPU usage statistics.
name | description | type | required |
---|---|---|---|
per-cpu |
true collects utilization for each CPU/core, false collects utilization averaged over all CPUs/cores |
bool |
required |
interval |
interval over which CPU utilization is calculated | time.Duration |
optional, default 1s |
name | description | type |
---|---|---|
busy |
% of time spent running (sum of all “non-idle” times) | float32 |
idle |
% of time spent idle | float32 |
user |
% of time spent in user mode | float32 |
system |
% of time spent in system mode, which is the time spent executing kernel code | float32 |
nice |
% of time spent in user mode with low priority (nice) | float32 |
iowait |
% of time waiting for I/O to complete | float32 |
irq |
% of time servicing interrupts | float32 |
softirq |
% of time servicing softirqs | float32 |
name | description |
---|---|
cpu |
the number of the CPU/core for this result, "all" represents usage averaged over all CPUs/cores |
Disk
collects disk usage statistics. Note that the list of available partitions is filtered based on the filesystem. The following filesystems will be removed from the results:
None.
name | description | type |
---|---|---|
free |
free space on the disk, in bytes | uint64 |
used |
used space on the disk, in bytes | uint64 |
used-percent |
% of disk space left, free / (free + used) |
float32 |
inodes-free |
number of free inodes on the disk | uint64 |
inodes-used |
number of used inodes on the disk | uint64 |
inodes-used-percent |
% of inodes left, free / (free + used) |
float32 |
name | description |
---|---|
path | the mount point of the disk, e.g. / |
HTTP
sends an HTTP request and collects statistics on the response.
name | description | type | required |
---|---|---|---|
method |
HTTP request method (GET , POST , etc.) |
string |
optional, default GET |
url |
HTTP request URL | string |
required |
body |
HTTP body of the request | string |
optional |
timeout |
duration before the HTTP request times out | time.Duration |
optional |
response-content |
content from the response to include in the results | string (none ,headers-only ,body-only ,full ) |
optional, default none |
name | description | type |
---|---|---|
rtt |
time between sending the HTTP request and receiving the response, in nanoseconds | int64 |
status-code |
the HTTP response’s status code | int |
response-content |
the HTTP response’s content (depending on the probe’s configuration) | string |
Load
collects average system load. Load is defined as the number of processes waiting for I/O or in the run queue, averaged over a period of time.
None.
name | description | type |
---|---|---|
load1 |
System load over the last minute | float32 |
load5 |
System load over the last 5 minutes | float32 |
load15 |
System load over the last 15 minutes | float32 |
proc-running |
Number of processes currently running | int |
proc-blocked |
Number of processes currently blocked | int |
Memory
collects memory usage statistics.
None.
name | description | type |
---|---|---|
total |
Total amount of RAM on this system, in bytes | uint64 |
available |
Estimate amount of RAM available to programs, in bytes | uint64 |
used |
Total amount of RAM used by programs, in bytes | uint64 |
used-percent |
Percentage of RAM used by programs | float32 |
free |
Total amout of RAM not used by programs, in bytes | uint64 |
swap-total |
Total amount of swap memory on this system, in bytes | uint64 |
swap-free |
Total amout of swap memory not used by programs, in bytes | uint64 |
swap-used |
Total amount of swap memory used by programs, in bytes | uint64 |
swap-used-percent |
Percentage of swap memory used by programs | float32 |
Network
collects statistics about the network interfaces.
name | description | type | required |
---|---|---|---|
interfaces |
A comma-separated list of interfaces to collect information about. If the list is empty, return info about all interfaces. If the parameter is missing, return global information for all interfaces. | string |
optional (see description) |
name | description | type |
---|---|---|
bytes-sent |
Number of bytes sent through the interface | unit64 |
bytes-received |
Number of bytes received through the interface | unit64 |
packets-sent |
Number of network packets sent through the interface | unit64 |
packets-received |
Number of network packets receives through the interface | unit64 |
error-in |
Number of errors while receiving through the interface | unit64 |
error-out |
Number of errors while sending through the interface | unit64 |
tcp-connections |
Number of open TCP connections on this interface | unit64 |
udp-connections |
Number of open UDP connections on this interface | unit64 |
name | description |
---|---|
interface |
The network interface, e.g. eth0 |
Process
collects statistics and details about the running processes.
name | description | type | required |
---|---|---|---|
names |
a comma-separated list of names of the processes to collect info about. A process name is usually the name of the executable that was launched. | string |
required |
name | description | type |
---|---|---|
command-line |
The full command-line used to launch this process | string |
pid |
The PID of this process | int32 |
create-time |
The exact time this process started | int64 |
status |
A character representing the current status of the process. R: Running, S: Sleep, T: Stop, I: Idle, Z: Zombie, W: Wait, L: Lock. | string |
cpu |
An approximate percentage of CPU power used by this process | float64 |
memory |
An approximate percentage of RAM used by this process | float64 |
rss |
The “Resident Set Size” is the amount of RAM used by this process, in bytes | uint64 |
vms |
The “Virtual Memory Size” is the total size of memory addressable by this process, in bytes | uint64 |
threads |
The number of threads this process is currently running | uint64 |
files |
The number of files currently opened by this process | uint64 |
connections |
The number of network connections open by this process | uint64 |
name | description |
---|---|
name |
The name of the process |
Docker
requests the Docker API and gathers metrics
name | description | type | required |
---|---|---|---|
address |
Docker host | string |
required |
name | description | type |
---|---|---|
container-count-created |
the number of container with state created in the Docker daemon (state is a tag) | int64 |
container-count-running |
the number of container with state running in the Docker daemon (state is a tag) | int64 |
container-count-paused |
the number of container with state paused in the Docker daemon (state is a tag) | int64 |
container-count-restarting |
the number of container with state restarting in the Docker daemon (state is a tag) | int64 |
container-count-removing |
the number of container with state removing in the Docker daemon (state is a tag) | int64 |
container-count-exited |
the number of container with state exited in the Docker daemon (state is a tag) | int64 |
container-count-dead |
the number of container with state dead in the Docker daemon (state is a tag) | int64 |
cpu-totalusage |
Total CPU time consumed (container name is a tag) | int64 |
mem-usage |
current res_counter usage for memory (container name is a tag) | loat64 |
network-bytes-received |
Bytes received (container name is a tag) | int64 |
network-errors-received |
Received errors (container name is a tag) | int64 |
network-dropped-received |
Incoming packets dropped (container name is a tag) | int64 |
network-bytes-sent |
Bytes sent (container name is a tag) | int64 |
network-errors-sent |
Sent errors (container name is a tag) | int64 |
network-dropped-sent |
Outgoing packets dropped (container name is a tag) | int64 |
network-count |
the number of networks in the Docker daemon | int64 |
volume-count |
the number of volumes in the Docker daemon | int64 |
volume-size |
the number of bytes used by all the volumes in the Docker daemon | int64 |
image-count |
the number of images in the Docker daemon | int64 |
image-size |
the number of bytes used by all the images in the Docker daemon | int64 |
name | description |
---|---|
name |
The name of a container, e.g. "goofy_hodgkin" |
interface |
The name of a container interface, e.g. "eth0" |
Redis
collects statistics from a Redis store.
name | description | type | required |
---|---|---|---|
address |
the address of the target redis server to gather metric from | string |
required |
name | description | type |
---|---|---|
connected-clients |
The number of clients currently connected to this Redis server | uint64 |
blocked-clients |
The number of clients pending on a blocking call | uint64 |
used-memory |
The amout of memory allocated by this Redis server, in bytes | uint64 |
mem-frag-ratio |
The ratio between the memory allocated by Redis and the memory as seen by the operating system. See Redis INFO documentation for details. |
float32 |
cache-hit-ratio |
The cache hit ratio is the ratio between the # cache hits and the # of key requests. | float32 |
uptime |
The time since the Redis server was launched, in seconds | uint64 |
changes-since-last-save |
The number of changes since the last time the database was saved to disk. The number of changes that would be lost upon restart. | uint64 |
last-save-time |
The UNIX timestamp of the last time the database was saved to disk. | uint64 |
ops-per-sec |
The number of commands processed by the Redis server per second. | uint64 |
rejected-connections |
The number of connections rejected because of the maximum connections limit. | uint64 |
input-kbps |
The incoming bandwith usage of the Redis server, in kilobytes per second. | uint64 |
output-kbps |
The outgoing bandwith usage of the Redis server, in kilobytes per second. | uint64 |
expired-keys |
Number of keys that have been removed when reaching their expiration date | uint64 |
evicted-keys |
Number of keys removed (evicted) due to reaching maximum memory. | uint64 |
master-last-io |
Time in seconds since the last interaction with the master Redis server | uint64 |
master-link-status |
The current status of the link to the master Redis server | string |
master-link-down-since |
The time in seconds since the link between master and slave is down | uint64 |
connected-slaves |
The number of slave instances connected to the master Redis server | uint64 |
Uptime
collects the uptime and the bootime of a server.
None.
name | description | type |
---|---|---|
uptime |
System uptime (number of seconds since last boot) | uint64 |
boottime |
System last boot time (expressed in seconds since epoch) | uint64 |
InfluxDB
collects statistics from an InfluxDB server.
name | description | type | required |
---|---|---|---|
address |
the address of the target influxdb server to collect metrics about | string |
required |
Tracks a subset of the statistics exposed by the Golang memory allocator stats
name | description | type |
---|---|---|
runtime-alloc |
Alloc is bytes of allocated heap objects | uint64 |
runtime-frees |
Frees is the cumulative count of heap objects freed | uint64 |
runtime-heap-alloc |
HeapAlloc is bytes of allocated heap objects | uint64 |
runtime-heap-idle |
HeapIdle is bytes in idle (unused) spans | uint64 |
runtime-heap-in-use |
HeapInuse is bytes in in-use spans | uint64 |
runtime-heap-objects |
HeapObjects is the number of allocated heap objects | uint64 |
runtime-heap-released |
HeapReleased is bytes of physical memory returned to the OS | uint64 |
runtime-heap-sys |
HeapSys is bytes of heap memory obtained from the OS | uint64 |
runtime-lookups |
Lookups is the number of pointer lookups performed by the runtime | uint64 |
runtime-mallocs |
Mallocs is the cumulative count of heap objects allocated | uint64 |
runtime-num-gc |
NumGC is the number of completed GC cycles | uint32 |
runtime-num-goroutine |
NumGoroutine returns the number of goroutines that currently exist | int |
runtime-pause-total-ns |
PauseTotalNs is the cumulative nanoseconds in GC stop-the-world pauses since the program started | uint64 |
runtime-sys |
Sys is the total bytes of memory obtained from the OS | uint64 |
runtime-total-alloc |
TotalAlloc is cumulative bytes allocated for heap objects | uint64 |
Tracks statistics about the query executor portion of the InfluxDB engine.
name | description | type |
---|---|---|
qe-queriesActive |
queriesActive tracks the number of queries being handled at this instant in time | int |
qe-queriesExecuted |
Number of queries that have been executed (started) | int |
qe-queriesFinished |
Number of queries that have finished | int |
qe-queryDurationNs |
queryDurationNs tracks the cumulative wall time, in nanoseconds, of every query executed | int |
qe-recoveredPanics |
Number of panics recovered by Query Executor | int |
Tracks statistics about writes at a system level.
name | description | type |
---|---|---|
write-pointreq |
pointReq is incremented for every point that is attempted to be written, regardless of success | int |
write-pointreqlocal |
pointReqLocal is incremented for every point that is attempted to be written into a shard, regardless of success | int |
write-req |
req is incremented every time a batch of points is attempted to be written, regardless of success | int |
write-subwritedrop |
subWriteDrop is incremented every time a batch write to a subscriber is dropped due to contention or write saturation | int |
write-subwriteok |
subWriteOk is incremented every time a batch write to a subscriber succeeds | int |
write-writedrop |
writeDrop is incremented for every point dropped due to having a timestamp that does not match any existing retention policy | int |
write-writeerror |
writeError is incremented for every batch that was attempted to be written to a shard but failed | int |
write-writeok |
writeOk is incremented for every batch that was successfully written to a shard | int |
write-writetimeout |
writeTimeout is incremented every time a write failed due to timing out | int |
Tracks subscriber statistics.
name | description | type |
---|---|---|
subscriber-createfailures |
int |
|
subscriber-pointswritten |
pointsWritten tracks the number of points successfully written to subscribers | int |
subscriber-writefailures |
writeFailures tracks the number of batches that failed to send to subscribers | int |
Tracks statistics about the Continuous Query executor.
name | description | type |
---|---|---|
cq-queryfail |
queryFail is incremented whenever a continuous query is executed but fails | int |
cq-queryok |
queryOk is incremented whenever a continuous query is executed without a failure | int |
Tracks statistics about the InfluxDB HTTP server.
name | description | type |
---|---|---|
httpd-authfail |
authFail indicates how many HTTP requests were aborted due to authentication being required but unsupplied or incorrect | int |
httpd-clienterror |
clientError is incremented every time InfluxDB sends an HTTP response with a 4XX status code | int |
httpd-pingreq |
pingReq is incremented every time InfluxDB serves the /ping HTTP endpoint | int |
httpd-pointswrittendropped |
Number of points dropped by the storage engine | int |
httpd-pointswrittenfail |
pointsWrittenFail is incremented for every point (not every batch) that was accepted by the /write HTTP endpoint but was unable to be persisted | int |
httpd-pointswrittenok |
pointsWrittenOK is incremented for every point (not every batch) that was accepted by the /write HTTP endpoint and persisted successfully | int |
httpd-queryreq |
queryReq is incremented every time InfluxDB serves the /query HTTP endpoint | int |
httpd-queryreqdurationns |
queryReqDurationNs tracks the cumulative wall time, in nanoseconds, of every query served | int |
httpd-queryrespbytes |
queryRespBytes is increased for every byte InfluxDB sends in a successful query response | int |
httpd-recoveredpanics |
Number of panics recovered by HTTP handler | int |
httpd-req |
req is incremented for every HTTP request InfluxDB receives | int |
httpd-reqactive |
reqActive is incremented when InfluxDB begins accepting an HTTP request and is decremented whenever InfluxDB finishes serving that request | int |
httpd-reqdurationns |
reqDurationNs tracks the cumulative wall time, in nanoseconds, of every request served | int |
httpd-servererror |
serverError is incremented every time InfluxDB sends an HTTP response with a 5XX status code | int |
httpd-statusreq |
statusReq is incremented every time InfluxDB serves the /status HTTP endpoint | int |
httpd-writereq |
writeReq is incremented every time InfluxDB serves the /write HTTP endpoint | int |
httpd-writereqactive |
writeReqActive tracks the number of write requests over HTTP being handled at this instant in time | int |
httpd-writereqbytes |
writeReqBytes tracks the total number of bytes of line protocol received by the /write endpoint | int |
httpd-writereqdurationns |
writeReqDurationNs tracks the cumulative wall time, in nanoseconds, of every write request served | int |
TLS Certificate
collects information about a TLS Certificate.
name | description | type | required |
---|---|---|---|
host |
URL of the TLS host | string |
required |
name | description | type |
---|---|---|
expires-in |
time before the TLS certificate expires, in nanoseconds | int64 |
chain-expires-in |
time before any TLS certificate in the chain expires, in nanoseconds | int64 |
hash |
the TLS certificate’s hash | string |
signature-list |
list of the TLS certificat’s signature algorithm in the full chain | string |
NTP
returns metadata from an NTP time server.
name | description | type | required |
---|---|---|---|
address |
address of an NTP server | string |
required |
name | description | type |
---|---|---|
clock-offset |
the estimated offset of the local system clock relative to the server’s clock, in nanoseconds | int64 |
rtt |
an estimate of the round-trip-time delay between the client and the server, in nanoseconds | int64 |
Memcached
collects statistics from a memcached server.
name | description | type | required |
---|---|---|---|
address |
address of a memcached server | string |
required |
name | description | type |
---|---|---|
uptime |
Number of seconds the Memcached server has been running since last restart. | uint64 |
curr_connections |
Number of open connections to this Memcached server, should be the same value on all servers during normal operation. This is something like the count of mySQL’s “SHOW PROCESSLIST” result rows. | uint64 |
reserved_fds |
Number of misc fds used internally | uint64 |
cmd_get |
Number of “get” commands received since server startup not counting if they were successful or not. | uint64 |
cmd_set |
Number of “set” commands serviced since startup. | uint64 |
cmd_flush |
The “flush_all” command clears the whole cache and shouldn’t be used during normal operation. | uint64 |
cmd_touch |
Cumulative number of touch reqs | uint64 |
get_hits |
Number of successful “get” commands (cache hits) since startup, divide them by the “cmd_get” value to get the cache hitrate. | uint64 |
get_misses |
Number of failed “get” requests because nothing was cached for this key or the cached value was too old. | uint64 |
get_expired |
Number of items that have been requested but had already expired | uint64 |
get_flushed |
Number of items that have been requested but have been flushed via flush_all | uint64 |
delete_misses |
Number of “delete” commands for keys not existing within the cache. These 107k failed deletes are deletions of non existent race keys (see above). | uint64 |
delete_hits |
Stored keys may be deleted using the “delete” command, this system doesn’t delete cached data itself, but it’s using the Memcached to avoid recaching-races and the race keys are deleted once the race is over and fresh content has been cached. | uint64 |
bytes_read |
Total number of bytes received from the network by this server. | uint64 |
bytes_written |
Total number of bytes send to the network by this server. | uint64 |
bytes |
Number of bytes currently used for caching items, this server currently uses ~6 MB of it’s maximum allowed (limit_maxbytes) 1 GB cache size. | uint64 |
curr_items |
Number of items currently in this server’s cache. The production system of this development environment holds more than 8 million items. | uint64 |
evictions |
Number of objects removed from the cache to free up memory for new items because Memcached reached it’s maximum memory setting (limit_maxbytes). | uint64 |
PHP-FPM
collects statistics from a PHP-FPM server.
name | description | type | required |
---|---|---|---|
address |
address of a PHP-FPM server | string |
required |
url |
URL of the status endpoint (default: /status) | string |
optional |
name | description | type |
---|---|---|
listen_queue |
The number of request in the queue of pending connections. | uint64 |
idle_processes |
The number of idle processes. | uint64 |
active_processes |
The number of active processes | uint64 |
slow_requests |
Enable php-fpm slow-log before you consider this. If this value is non-zero you may have slow php processes. | uint64 |
MySQL
collects statistics from a MySQL server.
name | description | type | required |
---|---|---|---|
address |
MySQL server address (host:port) | string |
required |
login |
MySQL user account | string |
required |
password |
MySQL user password | string |
required |
name | description | type |
---|---|---|
aborted-clients |
The number of connections that were aborted because the client died without closing the connection properly. | uint64 |
aborted-connects |
The number of failed attempts to connect to the MySQL server. | uint64 |
binlog-cache-use |
The number of transactions that used the binary log cache. | uint64 |
binlog-stmt-cache-use |
The number of nontransactional statements that used the binary log statement cache. | uint64 |
bytes-received |
The number of bytes received from all clients. | uint64 |
bytes-sent |
The number of bytes sent to all clients. | uint64 |
com-begin |
The number of times begin statement has been executed. | uint64 |
com-change-db |
The number of times change-db statement has been executed. | uint64 |
com-change-master |
The number of times change-master statement has been executed. | uint64 |
com-commit |
The number of times commit statement has been executed. | uint64 |
com-create-db |
The number of times create-db statement has been executed. | uint64 |
com-delete |
The number of times delete statement has been executed. | uint64 |
com-delete-multi |
The number of times delete-multi statement has been executed. | uint64 |
com-insert |
The number of times insert statement has been executed. | uint64 |
com-rollback |
The number of times rollback statement has been executed. | uint64 |
com-select |
The number of times select statement has been executed. | uint64 |
com-stmt-execute |
The number of times stmt-execute statement has been executed. | uint64 |
com-stmt-fetch |
The number of times stmt-fetch statement has been executed. | uint64 |
com-truncate |
The number of times truncate statement has been executed. | uint64 |
com-update |
The number of times update statement has been executed. | uint64 |
connection-errors-accept |
The number of errors that occurred during calls to accept() on the listening port. | uint64 |
connection-errors-internal |
The number of connections refused due to internal errors in the server, such as failure to start a new thread or an out-of-memory condition. | uint64 |
connection-errors-max-connections |
The number of connections refused because the server max-connections limit was reached. | uint64 |
connection-errors-peer-address |
The number of errors that occurred while searching for connecting client IP addresses. | uint64 |
connection-errors-select |
The number of errors that occurred during calls to select() or poll() on the listening port. | uint64 |
connection-errors-tcpwrap |
The number of connections refused by the libwrap library. | uint64 |
connections |
The number of connection attempts (successful or not) to the MySQL server. | uint64 |
created-tmp-disk-tables |
The number of internal on-disk temporary tables created by the server while executing statements. | uint64 |
created-tmp-tables |
The number of internal temporary tables created by the server while executing statements. | uint64 |
flush-commands |
The number of times the server flushes tables, whether because a user executed a FLUSH TABLES statement or due to internal server operation. | uint64 |
handler-read-first |
The number of times the first entry in an index was read. If this value is high, it suggests that the server is doing a lot of full index scans. | uint64 |
handler-read-key |
The number of requests to read a row based on a key. If this value is high, it is a good indication that your tables are properly indexed for your queries. | uint64 |
handler-read-last |
The number of requests to read the last key in an index. | uint64 |
handler-read-next |
The number of requests to read the next row in key order. | uint64 |
handler-read-prev |
The number of requests to read the previous row in key order. | uint64 |
handler-read-rnd |
The number of requests to read a row based on a fixed position. This value is high if you are doing a lot of queries that require sorting of the result. | uint64 |
handler-read-rnd-next |
The number of requests to read the next row in the data file. This value is high if you are doing a lot of table scans. | uint64 |
innodb-buffer-pool-pages-data |
The number of pages in the InnoDB buffer pool containing data. | uint64 |
innodb-buffer-pool-pages-dirty |
The current number of dirty pages in the InnoDB buffer pool. | uint64 |
innodb-buffer-pool-pages-flushed |
The number of requests to flush pages from the InnoDB buffer pool. | uint64 |
innodb-buffer-pool-pages-free |
The number of free pages in the InnoDB buffer pool. | uint64 |
innodb-buffer-pool-pages-misc |
The number of pages in the InnoDB buffer pool that are busy because they have been allocated for administrative overhead, such as row locks or the adaptive hash index. | uint64 |
innodb-data-fsyncs |
The number of fsync() operations so far. | uint64 |
innodb-data-reads |
The total number of data reads (OS file reads). | uint64 |
innodb-data-writes |
The total number of data writes. | uint64 |
innodb-log-waits |
The number of times that the log buffer was too small and a wait was required for it to be flushed before continuing. | uint64 |
innodb-log-writes |
The number of physical writes to the InnoDB redo log file. | uint64 |
innodb-page-size |
InnoDB page size (default 16KB). | uint64 |
innodb-pages-read |
The number of pages read from the InnoDB buffer pool by operations on InnoDB tables. | uint64 |
innodb-pages-written |
The number of pages written by operations on InnoDB tables. | uint64 |
innodb-row-lock-time-max |
The maximum time to acquire a row lock for InnoDB tables, in milliseconds. | uint64 |
innodb-row-lock-waits |
The number of times operations on InnoDB tables had to wait for a row lock. | uint64 |
key-blocks-unused |
The number of unused blocks in the MyISAM key cache. | uint64 |
key-blocks-used |
The number of used blocks in the MyISAM key cache. | uint64 |
key-reads |
The number of physical reads of a key block from disk into the MyISAM key cache. | uint64 |
key-writes |
The number of physical writes of a key block from the MyISAM key cache to disk. | uint64 |
open-files |
The number of files that are open. This count includes regular files opened by the server. | uint64 |
open-streams |
The number of streams that are open (used mainly for logging). | uint64 |
open-tables |
The number of tables that are open. | uint64 |
prepared-stmt-count |
The current number of prepared statements. | uint64 |
queries |
The number of statements executed by the server. | uint64 |
select-full-join |
The number of joins that perform table scans because they do not use indexes. | uint64 |
select-full-range-join |
The number of joins that used a range search on a reference table. | uint64 |
slow-queries |
The number of queries that have taken more than long-query-time seconds. | uint64 |
uptime |
The number of seconds that the server has been up. | uint64 |
Elasticsearch
collects statistics from an Elasticsearch store.
name | description | type | required |
---|---|---|---|
address |
address of an Elasticsearch server | string |
required |
timeout |
duration before the HTTP request times out | time.Duration |
optional |
name | description | type |
---|---|---|
cluster-name |
ES cluster name | string |
status |
Health level of the cluster | string |
timed-out |
? | boolean |
number-of-nodes |
Number of nodes in the cluster | float64 |
number-of-data-nodes |
Number of data nodes in the cluster | float64 |
active-primary-shards |
Number of primary shards in the cluster | float64 |
active-shards |
Number of active shards in the cluster | float64 |
relocating-shards |
Number of shards being relocated in the cluster | float64 |
initializing-shards |
Number of shards being initialized in the cluster | float64 |
unassigned-shards |
Number of shards currently unassigned in the cluster | float64 |
delayed-unassigned-shards |
Number of unassigned shards which allocation is delayed in the cluster | float64 |
number-of-pending-tasks |
Number of pending task in the cluster | float64 |
number-of-in-flight-fetch |
Number of in-flight fetch operations | float64 |
task-max-waiting-in-queue-millis |
Maximum number of milliseconds a task is waiting in queue | float64 |
indices-count |
Count of indices in the cluster | float64 |
docs-count |
Count of documents in the cluster | float64 |
docs-deleted |
Count of deleted documents in the cluster | float64 |
store-size |
Size of the cluster storage | float64 |
querycache-memory |
Memory allocated to the cluster query cache | float64 |
querycache-count-hit |
Number of hits on the cluster query cache | float64 |
querycache-count-miss |
Number of miss on the cluster query cache | float64 |
querycache-count-cache |
Number of cache operations on the cluster query cache | float64 |
querycache-evictions |
Number of evictions from the cluster query cache | float64 |
jvm-heap-max |
JVM heap max size on the cluster | float64 |
jvm-heap-used |
JVM heap used size on the cluster | float64 |
jvm-threads |
Number of JVM threads on the cluster | float64 |
Nginx
collects values from Nginx stub status module.
name | description | type | required |
---|---|---|---|
url |
full URL of the Nginx server status | string |
required |
timeout |
duration before the HTTP request times out | time.Duration |
optional |
name | description | type |
---|---|---|
active-connections |
The number of active client connections including waiting connections. | int |
accept-connections |
The total number of accepted connections. | int |
handled-connections |
The total number of handled connections. | int |
requests |
The total number of client requests. | int |
reading |
The current number of connections where nginx is reading the request header. | int |
writing |
The current number of connections where nginx is writing the response back to the client. | int |
waiting |
The current number of idle client connections waiting for a request. | int |
mongodb
collects metrics from serverStatus
command.
name | description | type | required |
---|---|---|---|
address |
address of the target mongodb server to gather metrics from | string |
required |
databases |
the comma separated list of databases to gather metrics from | string |
optional |
username |
a username to authenticate to the MongoDB server | string |
optional |
password |
a password to authenticate to the MongoDB server | string |
optional |
name | description | type |
---|---|---|
uptime |
The number of seconds that the current MongoDB process has been active. | int64 |
connections_current |
The number of incoming connections from clients to the database server | int32 |
connections_available |
The number of unused incoming connections available | int32 |
gl_clients_readers |
The number of the active client connections performing read operations | int32 |
gl_clients_writers |
The number of the active client connections performing write operations | int32 |
network_in |
The number of bytes that reflects the amount of network traffic received by this database | int64 |
network_out |
The number of bytes that reflects the amount of network traffic sent from this database | int64 |
ops_insert |
The total number of insert operations received since the mongod instance last started | int64 |
ops_query |
The total number of query operations received since the mongod instance last started | int64 |
ops_update |
The total number of update operations received since the mongod instance last started | int64 |
ops_delete |
The total number of delete operations received since the mongod instance last started | int64 |
ops_getmore |
The total number of getmore operations received since the mongod instance last started | int64 |
ops_command |
The total number of command operations received since the mongod instance last started | int64 |
mem_resident |
The value of mem.resident is roughly equivalent to the amount of RAM, in megabytes (MB), currently used by the database process | int32 |
mem_virtual |
mem.virtual displays the quantity, in megabytes (MB), of virtual memory used by the mongod process | int32 |
postgresql
collects metrics from the statistics collector command of a PostgreSQL server.
name | description | type | required |
---|---|---|---|
address |
address of the target PostgreSQL server to gather metrics from | string |
required |
username |
a username to authenticate to the PostgreSQL server | string |
optional |
password |
a password to authenticate to the PostgreSQL server | string |
optional |
sslmode |
SSL Mode of the PostgreSQL server (see https://www.postgresql.org/docs/10/static/libpq-ssl.html) | string |
optional |
name | description | type |
---|---|---|
current_connections |
Number of backends currently connected to this database | uint |
max_connections |
The maximum number of client connections allowed. | uint |
shared_buffer_hits |
Number of times disk blocks were found already in the buffer cache, so that a read was not necessary | uint |
shared_buffer_reads |
Number of disk blocks read | uint |
temp_files_count |
Number of temporary files created by queries | uint |
temp_file_bytes |
Total amount of data written to temporary files by queries | uint |
rows_returned |
Number of rows returned by queries | uint |
rows_fetched |
Number of rows fetched by queries | uint |
rows_inserted |
Number of rows inserted by queries | uint |
rows_updated |
Number of rows updated by queries | uint |
rows_deleted |
Number of rows deleted by queries | uint |
deadlocks |
Number of deadlocks detected | uint |
index_size |
Total size of index on disk | uint |
table_size |
Total size of table on disk | uint |
toast_size |
Total size of toast on disk | uint |
n_dead_tup |
Estimated number of dead rows | uint |
n_live_tup |
Estimated number of live rows | uint |
checkpoints_requested |
Number of requested checkpoints that have been performed | uint |
checkpoints_scheduled |
Number of scheduled checkpoints that have been performed | uint |
buffers_backend |
Number of buffers written directly by a backend | uint |
buffers_background |
Number of buffers written by the background writer | uint |
buffers_checkpoint |
Number of buffers written during checkpoints | uint |
cassandra
returns the metrics from a Cassandra node, via Jolokia gateway to JMX.
name | description | type | required |
---|---|---|---|
address |
address of the jolokia bridge to cassandra server to gather metric from | string |
required |
timeout |
duration before the HTTP request times out | time.Duration |
optional, default 0s |
name | description | type |
---|---|---|
heap-memory-init |
Amount of heap memory in bytes that the JVM initially requests from the OS | float64 |
heap-memory-committed |
Amount of heap memory in bytes that is committed for the JVM to use | float64 |
heap-memory-max |
Maximum amount of heap memory in bytes that can be used for memory management | float64 |
heap-memory-used |
Amount of used heap memory in bytes | float64 |
nonheap-memory-init |
Amount of non-heap memory in bytes that the JVM initially requests from the OS | float64 |
nonheap-memory-committed |
Amount of non-heap memory in bytes that is committed for the JVM to use | float64 |
nonheap-memory-max |
Maximum amount of non-heap memory in bytes that can be used for memory management | float64 |
nonheap-memory-used |
Amount of used non-heap memory in bytes | float64 |
connected-clients |
Number of clients connected to this nodes native protocol server | float64 |
key-cache-hits |
Total number of cache hits for partition to sstable offsets | float64 |
key-cache-requests |
Total number of cache requests for partition to sstable offsets | float64 |
key-cache-entries |
Total number of cache entries for partition to sstable offsets | float64 |
key-cache-size |
Total size of occupied cache, in bytes, for partition to sstable offsets | float64 |
key-cache-capacity |
Cache capacity, in bytes, for partition to sstable offsets | float64 |
row-cache-hits |
Total number of cache hits for rows kept in memory | float64 |
row-cache-requests |
Total number of cache requests for rows kept in memory | float64 |
row-cache-entries |
Total number of cache entries for rows kept in memory | float64 |
row-cache-size |
Total size of occupied cache, in bytes, for rows kept in memory | float64 |
row-cache-capacity |
Cache capacity, in bytes, for rows kept in memory | float64 |
read-totallatency |
Total read latency since starting | float64 |
write-totallatency |
Total write latency since starting | float64 |
read-timeouts |
Number of timeouts encountered during read | float64 |
write-timeouts |
Number of timeouts encountered during write | float64 |
read-unavailables |
Number of unavailable exceptions encountered during read | float64 |
write-unavailables |
Number of unavailable exceptions encountered during write | float64 |
read-failures |
Number of failures encountered during read | float64 |
write-failures |
Number of failures encountered during write | float64 |
commitlog-pendingtasks |
Number of commit log messages written but yet to be fsync’d | float64 |
commitlog-totalsize |
Current size, in bytes, used by all the commit log segments | float64 |
compaction-completedtasks |
Number of completed compactions since server [re]start | float64 |
compaction-pendingtasks |
Estimated number of compactions remaining to perform | float64 |
compaction-bytescompacted |
Total number of bytes compacted since server [re]start | float64 |
storage-load |
Size, in bytes, of the on disk data size this node manages | float64 |
storage-exceptions |
Number of internal exceptions caught | float64 |
tp-compactionexecutor-activetasks |
Number of tasks being actively worked on Compactions | float64 |
tp-antientropystage-activetasks |
Number of tasks being actively worked on Builds merkle tree for repairs | float64 |
tp-countermutationstage-pendingtasks |
Number of queued tasks queued up on counter writes | float64 |
tp-countermutationstage-currentlyblockedtasks |
Number of tasks that are currently blocked due to queue saturation but on retry will become unblocked on counter writes | float64 |
tp-mutationstage-pendingtasks |
Number of queued tasks queued up on all other writes | float64 |
tp-mutationstage-currentlyblockedtasks |
Number of tasks that are currently blocked due to queue saturation but on retry will become unblocked on all other writes | float64 |
tp-readrepairstage-pendingtasks |
Number of queued tasks queued up on ReadRepair | float64 |
tp-readrepairstage-currentlyblockedtasks |
Number of tasks that are currently blocked due to queue saturation but on retry will become unblocked on ReadRepair | float64 |
tp-readstage-pendingtasks |
Number of queued tasks queued up on Local reads | float64 |
tp-readstage-currentlyblockedtasks |
Number of tasks that are currently blocked due to queue saturation but on retry will become unblocked on Local read | float64 |
tp-requestresponsestage-pendingtasks |
Number of queued tasks queued up on Coordinator requests to the cluster | float64 |
tp-requestresponsestage-currentlyblockedtasks |
Number of tasks that are currently blocked due to queue saturation but on retry will become unblocked on Coordinator requests to the cluster | float64 |
table-livediskspaceused |
Disk space used by SSTables belonging to this table (in bytes) | float64 |
table-totaldiskspaceused |
Total disk space used by SSTables belonging to this table, including obsolete ones waiting to be GC’d | float64 |
table-readlatency |
Local read latency for this keyspace | float64 |
table-coordinatorreadlatency |
Coordinator read latency for this keyspace | float64 |
table-writelatency |
Local write latency for this keyspace | float64 |
table-readtotallatency |
Local read latency for this keyspace since starting | float64 |
table-writetotallatency |
Local write latency for this keyspace since starting | float64 |
name | description |
---|---|
keyspace |
The keyspace of the metric, e.g. “system” |
SNMP
probe returns the values of OIDs from an SNMP server.
name | description | type | required |
---|---|---|---|
address |
The address of the target SNMP server to get metric from | string |
required |
port |
The port of the target SNMP server to get metric from | int |
optional, default 161 |
community |
The community of the target SNMP server to get metric from | string |
optional, default public |
timeout |
duration before the SNMP request times out | time.Duration |
optional, default 10s |
oid |
A comma-separated list of OIDs to get | string |
required |
name | description | type |
---|---|---|
value |
The value returned by the SNMP server | string or math/big.Int |
name | description |
---|---|
keyspace |
The keyspace of the metric, e.g. “system” |
Sybase
probe returns metrics from a Sybase ASE.
name | description | type | required |
---|---|---|---|
address |
Sybase server address (host:port) | string |
required |
login |
Sybase user account | string |
required |
password |
Sybase user password | string |
required |
blocked-process |
Number of seconds before a process is blocked | int |
optional, default 30 |
long-transaction |
Number of seconds before a transaction is considered long | int |
optional, default 30 |
name | description | type |
---|---|---|
version |
The version of the server. | string |
db-count |
The number of databases on the server. | int |
proc-count |
The number of processes on the server. | int |
blocked-proc-count |
The number of processes blocked for more than blocked-process seconds. |
int |
long-transaction |
The number of transactions lasting more than long-transaction seconds. |
int |
segmap |
The number of segmap for this database (see tag). | int |
allocated |
The number of allocated MB for this database (see tag). | int |
free |
The number of free MB for this database (see tag). | int |
name | description |
---|---|
dbname |
The name of the database of the metric, e.g. “master” |
A state is the evaluated from the conditions defined in a check on the result of a probe. It can have 4 different values:
Unknown
: when the state can’t be determined, either because there’s no value, or because the conditions could not be evaluated;OK
: when the result is considered as normal;Warning
: when the result is considered as worth attention;Critical
: when the result shows an issue that needs a reaction;Missing Data
: when the server has not received any results from the agent for too long.Error
: when the probe encountered an error while gathering metricsTo determine in which state the result of probe is, you can choose between 3 different algorithms.
The value of one metric is compared to a fixed reference (e.g. free memory is above 1GB).
The trend of the last X values of the metric is compared to a fixed reference (e.g. the rate of change of HTTP/500 errors over the last 25 occurences is below 2).
The value of one metric is compared to the value of the same metric taken Y seconds ago (e.g. the count of successful checkouts compared to the same count 24 hours ago).
Each algorithm can use any number of conditions, bound together with a global logical operator AND
or OR
(meaning respectively all conditions should be true or any condition should be true).
Here is the lists of the supported comparison operators.
Less than (<
) Less than or equal (≤
) Equal (=
) Not equal (≠
) Greater than or equal (≥
) Greater than (>
)
Equal (=
) Not equal (≠
) Match (regular expression match
) Find (regular expression find
)
An alert may happen when the state of a check has been calculated from a result.
There are different algorithms to evaluate the alert conditions.
If the current state differs from the previous one.
If the current state is not OK and lasts for more than X occurences.
If the probe has state changing more than X times in the last Y seconds.
When an alert occurs, it sends a notification through a “Channel”. Channels can be created and configured in the Create Alert dialog. The configuration of the Channel depends on the type:
Alerts can be notified via email. The outgoing email server (SMTP) settings are configured on the server.
dest
: the To:
field of the sent email, a comma-separated list of email addressescc
: the Cc:
field of the sent email, a comma-separated list of email addresses (optional)Alerts can be notified on Slack, using an incoming webhook. Create an Slack incoming webhook by following the official instructions.
webhook_url
: the URL of the incoming webhook for your Slack integrationAlerts can be forwarded to PagerDuty to integrate with your operations management platform.
from
: the email address of the person raising this incidenttoken
: a PagerDuty API tokenservice
: the ID of the PagerDuty service this incident is raised forSMS alerting is provided via Twilio. You need a Twilio account to create an SMS alert Channel.
account-sid
: a Twilio account SIDtoken
: a Twilio Auth tokenfrom
: a Twilio phone number or short code that sends this messagemessaging-service-sid
: a Twilio Messaging service SIDto
: a Phone number to send the message toNote: Only one of from
or messaging-service-sid
is required.
Alerts can be sent to an arbitrary HTTP endpoint. This Channel is used to build custom integrations with other 3rd-party services or with your own in-house tools.
url
: the URL of the HTTP endpoint to send the alert toThe HTTP Channel POST
s a JSON representation of the Alert to the URL. The following properties can be found on the JSON:
title
: a title for the Alertmessage
: why the Alert was triggeredtarget
: the display name of the Target that exhibits abnormal conditionsagent
: the display name of the Agent that ran the Probe on the Targetprobe
: the name of the Probe that got abnormal resultsstate
: the current state, WARNING
or CRITICAL
reason
: the user-supplied reason in the check(s) that weren’t validatedcritical
: true
if state
is CRITICAL
warning
: true
if state
is WARNING