edit

Usage

Docker Flow Monitor can be controlled by sending HTTP requests or through Docker Service labels when combined with Docker Flow Swarm Listener.

Reconfigure

Reconfigure endpoint can be used to send requests to Docker Flow Monitor with the goal of adding or modifying existing scrape targets and alerts. Parameters are divided into scrape and alert groups.

Query parameters that follow should be added to the base address [MONITOR_IP]:[MONITOR_PORT]/v1/docker-flow-monitor/reconfigure.

Scrape Parameters

Tip

Defines Prometheus scrape targets

Query Description Required
metricsPath The path of the metrics endpoint. Defaults to /metrics. No
scrapeInterval How frequently to scrape targets from this job. No
scrapeTimeout Per-scrape timeout when scraping this job. No
scrapePort The port through which metrics are exposed. Yes
serviceName The name of the service that exports metrics. Yes
scrapeType A set of targets and parameters describing how to scrape metrics. No

You can find more about scrapeType's on Scrape Config.

Alert Parameters

Tip

Defines Prometheus alerts

Query Description Required
alertAnnotations This parameter is translated to Prometheus alert ANNOTATIONS statement. Annotations are used to store longer additional information.
Example: summary=Service memory is high,description=Do something or start panicking
No
alertFor This parameter is translated to Prometheus alert FOR statement. It causes Prometheus to wait for a certain duration between first encountering a new expression output vector element (like an instance with a high HTTP error rate) and counting an alert as firing for this element. Elements that are active, but not firing yet, are in pending state. This parameter expects a number with time suffix (e.g. s for seconds, m for minutes).
Example: 30s
No
alertIf This parameter is translated to Prometheus alert IF statement. It is an expression that will be evaluated and, if it returns true, an alert will be fired.
Example: container_memory_usage_bytes{container_label_com_docker_swarm_service_name="go-demo"}/container_spec_memory_limit_bytes{container_label_com_docker_swarm_service_name="go-demo"} > 0.8
Yes
alertLabels This parameter is translated to Prometheus alert LABELS statement. It allows specifying a set of additional labels to be attached to the alert. Multiple labels can be separated with comma (,).
Example: severity=high,receiver=system
No
alertName The name of the alert. It is combined with the serviceName thus producing an unique identifier.
Example: memoryAlert
Yes
serviceName The name of the service. It is combined with the alertName thus producing an unique identifier.
Example: go-demo
Yes
alertPersistent When set to true, the alert will persist when the service is scaled to zero replicas.
Example: true
No

Those parameters can be indexed so that multiple alerts can be defined for a service. Indexing is sequential and starts from 1. An example of indexed alertName could be alertName.1=memload and alertName.2=diskload.

Please visit Alerting Overview for more information about the rules for defining Prometheus alerts.

AlertIf Parameter Shortcuts

Tip

Allows short specification of commonly used alertIf parameters

Shortcut Description
@node_fs_limit:[PERCENTAGE] Whether node file system usage is over specified percentage of the total available file system size.
Requirements: node-exporter metrics
[PERCENTAGE] must be specified as a decimal value (e.g. 0.8 equals 80%).
Example: @node_fs_limit:0.8 would be expanded to (node_filesystem_size{fstype="aufs", job="my-service"} - node_filesystem_free{fstype="aufs", job="my-service"}) / node_filesystem_size{fstype="aufs", job="my-service"} > 0.8.
@node_mem_limit:[PERCENTAGE] Whether node memory usage is over specified percentage of the total node memory.
Requirements: node-exporter metrics
[PERCENTAGE] must be specified as a decimal value (e.g. 0.8 equals 80%).
Example: @node_mem_limit:0.8 would be expanded to (sum by (instance) (node_memory_MemTotal{job="my-service"}) - sum by (instance) (node_memory_MemFree{job="my-service"} + node_memory_Buffers{job="my-service"} + node_memory_Cached{job="my-service"})) / sum by (instance) (node_memory_MemTotal{job="my-service"}) > 0.8.
@node_mem_limit_total_above:[PERCENTAGE] Whether memory usage of all the nodes is over the specified percentage of the total memory.
Requirements: node-exporter metrics
[PERCENTAGE] must be specified as a decimal value (e.g. 0.8 equals 80%).
Example: @node_mem_limit_total_above:0.8 would be expanded to (sum(node_memory_MemTotal{job="my-service"}) - sum(node_memory_MemFree{job="my-service"} + node_memory_Buffers{job="my-service"} + node_memory_Cached{job="my-service"})) / sum(node_memory_MemTotal{job="my-service"}) > 0.8.
@node_mem_limit_total_below:[PERCENTAGE] Whether memory usage of all the nodes is below the specified percentage of the total memory.
Requirements: node-exporter metrics
[PERCENTAGE] must be specified as a decimal value (e.g. 0.8 equals 80%).
Example: @node_mem_limit_total_below:0.4 would be expanded to (sum(node_memory_MemTotal{job="my-service"}) - sum(node_memory_MemFree{job="my-service"} + node_memory_Buffers{job="my-service"} + node_memory_Cached{job="my-service"})) / sum(node_memory_MemTotal{job="my-service"}) < 0.4.
@replicas_running Whether the number of running replicas is as desired.
Requirements: cAdvisor metrics and a service running in the replicated mode. The alert uses container_memory_usage_bytes metric only as a way to count the number of running containers.
Example: @replicas_running for a service with the number of desired replicas set to 3 would be expanded to count(container_memory_usage_bytes{container_label_com_docker_swarm_service_name="my-service"}) != 3.
@replicas_more_than Whether the number of running replicas is more than desired.
Requirements: cAdvisor metrics and a service running in the replicated mode. The alert uses container_memory_usage_bytes metric only as a way to count the number of running containers.
Example: @replicas_running for a service with the number of desired replicas set to 3 would be expanded to count(container_memory_usage_bytes{container_label_com_docker_swarm_service_name="my-service"}) > 3.
@replicas_less_than Whether the number of running replicas is less than desired.
Requirements: cAdvisor metrics and a service running in the replicated mode. The alert uses container_memory_usage_bytes metric only as a way to count the number of running containers.
Example: @replicas_running for a service with the number of desired replicas set to 3 would be expanded to count(container_memory_usage_bytes{container_label_com_docker_swarm_service_name="my-service"}) < 3.
@resp_time_above:[QUANTILE],[RATE_DURATION],[PERCENTAGE] Whether response time of a given quantile over the specified rate duration is above the set percentage.
Requirements: histogram with the name http_server_resp_time and with response times expessed in seconds.
[QUANTILE] must be one of the quantiles defined in the metric.
[RATE_DURATION] can be in any format supported by Prometheus (e.g. 5m).
[PERCENTAGE] must be specified as a decimal value (e.g. 0.8 equals 80%).
Example: @resp_time_above:0.1,5m,0.9999 would be expanded to sum(rate(http_server_resp_time_bucket{job="my-service", le="0.1"}[5m])) / sum(rate(http_server_resp_time_count{job="my-service"}[5m])) < 0.9999.
@resp_time_below:[QUANTILE],[RATE_DURATION],[PERCENTAGE] Whether response time of a given quantile over the specified rate duration is below the set percentage.
Requirements: histogram with the name http_server_resp_time and with response times expessed in seconds.
[QUANTILE] must be one of the quantiles defined in the metric.
[RATE_DURATION] can be in any format supported by Prometheus (e.g. 5m).
[PERCENTAGE] must be specified as a decimal value (e.g. 0.8 equals 80%).
Example: @resp_time_below:0.025,5m,0.75 would be expanded to sum(rate(http_server_resp_time_bucket{job="my-service", le="0.025"}[5m])) / sum(rate(http_server_resp_time_count{job="my-service"}[5m])) > 0.75.
@resp_time_server_error:[RATE_DURATION],[PERCENTAGE] Whether error rate over the specified rate duration is below the set percentage.
Requirements: histogram with the name http_server_resp_time and with label code set to value of the HTTP response code.
[RATE_DURATION] can be in any format supported by Prometheus (e.g. 5m).
[PERCENTAGE] must be specified as a decimal value (e.g. 0.8 equals 80%).
Example: @resp_time_server_error:5m,0.001 would be expanded to sum(rate(http_server_resp_time_count{job="my-service", code=~"^5..$$"}[5m])) / sum(rate(http_server_resp_time_count{job="my-service"}[5m])) > 0.001.
@service_mem_limit:[PERCENTAGE] Whether service memory usage is over specified percentage of the service memory limit.
Requirements: cAdvisor metrics and service memory limit specified as service resource.
[PERCENTAGE] must be specified as a decimal value (e.g. 0.8 equals 80%).
Example: If serviceName is set to my-service, @service_mem_limit:0.8 would be expanded to container_memory_usage_bytes{container_label_com_docker_swarm_service_name="my-service"}/container_spec_memory_limit_bytes{container_label_com_docker_swarm_service_name="my-service"} > 0.8.
@service_mem_limit_nobuff:[PERCENTAGE] Whether service memory usage without linux buffer is over specified percentage of the service memory limit.
Requirements: cAdvisor metrics and service memory limit specified as service resource.
[PERCENTAGE] must be specified as a decimal value (e.g. 0.8 equals 80%).
Example: If serviceName is set to my-service, @service_mem_limit_nobuff:0.8 would be expanded to (container_memory_usage_bytes{container_label_com_docker_swarm_service_name="my-service"}-container_memory_cache{container_label_com_docker_swarm_service_name="my-service"})/container_spec_memory_limit_bytes{container_label_com_docker_swarm_service_name="my-service"} > 0.8.

Note

I hope that the number of shortcuts will grow with time thanks to community contributions. Please create an issue with the alertIf statement and the suggested shortcut and I'll add it to the code as soon as possible.

AlertIf Secrets Configuration

Docker Flow Monitor supports Docker Secrets for adding custom alertIf shortcuts. Only secrets with names that start with alertif- or alertif_ will be considered. alertIf shortcuts are configured as a yaml file with a series of dictionaries. The key of each dictionary is your custom alertIf shortcut which must begin with the @ character. The value of each dictionary consist of three keys: expanded, annotations and labels. expanded contains the expanded alert using go templates. annotations and labels contains a dictionary with the alert's annotations and labels. For example @service_mem_limit is defined by the following yaml:

"@service_mem_limit":
  expanded: container_memory_usage_bytes{container_label_com_docker_swarm_service_name="{{ .Alert.ServiceName }}"}/container_spec_memory_limit_bytes{container_label_com_docker_swarm_service_name="{{ .Alert.ServiceName }}"} > {{ index .Values 0 }}
  annotations:
    summary: Memory of the service {{ .Alert.ServiceName }} is over {{ index .Values 0 }}
  labels:
    receiver: system
    service: "{{ .Alert.ServiceName }}"

Tip

AlertIf shortcuts defined in secrets will take priority over default shortcuts.

AlertIf Logical Operators

The logical operators and, unless, and or can be used in combinations with AlertIf Parameter Shortcuts. For example, to create an alert that triggers when response time is low unless response time is high, set alertIf=@resp_time_below:0.025,5m,0.75_unless_@resp_time_above:0.1,5m,0.99. This alert prevents @resp_time_below from triggering while @resp_time_above is triggering. The summary annotation for this alert will be merged with the and operator: "Response time of the service my-service is below 0.025 unless Response time of the service my-service is above 0.1". When using logical operators, there are no default alert labels. The alert labels will have to be manually set by using the alertLabels query parameter.

More information on the logical operators can be found on Prometheus's querying documentation.

Remove

Tip

Removes Prometheus scrapes and alerts

Remove endpoint can be used to send request to Docker Flow Monitor with the goal of removing scrapes and alerts related to a service.

Query parameters that follow should be added to the base address [MONITOR_IP]:[MONITOR_PORT]/v1/docker-flow-monitor/remove.

Query Description Required
serviceName The name of the service that should be removed. Yes