In this document we describe general tasks using the command line that you can accomplish with wavectl
using your alerts and dashboards.
Many of these examples can be accomplished by using the Wavefront gui too. wavectl
enables accomplishing these tasks in a command line interface and in conjunction with powerful tools like grep
, awk
, sed
, jq
, or similar
wavectl
can be used to list all alerts in a one line summary form.
For example:
$ wavectl show alert
ID NAME STATUS SEVERITY
1523082347619 Kubernetes - Node Network Utilization - HIGH (Prod) CHECKING WARN
1523082347824 Kubernetes - Node Cpu Utilization - HIGH (Prod) CHECKING WARN
1523082348005 Kubernetes - Node Memory Swap Utilization - HIGH (Prod) SNOOZED WARN
1523082348172 Wavefront Freshness CHECKING WARN
...
The short summary form contains the alert state and the severity in addition to the name and unique id of the alert. Once you have a structured columned print of your alerts, you can do all sorts of processing with them.
For example:
$ wavectl show alert | grep FIRING
1523082348708 Orion Response time more than 2 seconds FIRING INFO
This could be used from a script too. For example, an operator may want to ensure no alerts from "kubernetes" are firing before executing a script that is going to downtime one of the kubernetes control plane hosts.
$ wavectl show --no-header alert | wc -l
11
In addition to printing one line summaries the show
command can also print detailed state of your alerts in json form:
$ wavectl show -o json alert
{
"additionalInformation": "This alert tracks the used network bandwidth percentage for all the compute-* (compute-master and compute-node) machines. If the cpu utilization exceeds 80%, this alert fires.",
"condition": "ts(proc.net.percent,server_type=\"compute-*\" and env=\"live\") > 80",
"displayExpression": "ts(proc.net.percent,server_type=\"compute-*\" and env=\"live\")",
"id": "1523082347619",
"minutes": 2,
"name": "Kubernetes - Node Network Utilization - HIGH (Prod)",
"resolveAfterMinutes": 2,
"severity": "WARN",
"tags": {
"customerTags": [
"kubernetes",
"skynet"
]
},
"target": "pd: 05fe8ebacf8c44e881ea2f6e44dbf2d2"
}
{
"additionalInformation": "This alert tracks the used cpu percentage for all the compute-* (compute-master and compute-node) machines. If the cpu utilization exceeds 80%, this alert fires.",
...
One you have an easy way to retrieve the json representation of alerts, dashboards, this can lead to various powerful use cases with using text processing tools like jq
or grep. For example:
Print the name and the condition for each alert.
$ wavectl show -o json alert | jq '{name,condition}'
{
"name": "Kubernetes - Node Network Utilization - HIGH (Prod)",
"condition": "ts(proc.net.percent,server_type=\"compute-*\" and env=\"live\") > 80"
}
{
"name": "Kubernetes - Node Cpu Utilization - HIGH (Prod)",
"condition": "ts(proc.stat.cpu.percentage_used,server_type=\"compute-*\" and env=\"live\") > 80"
}
{
"name": "Kubernetes - Node Memory Swap Utilization - HIGH (Prod)",
"condition": "ts(proc.meminfo.percentage_swapused,server_type=\"compute-*\" and env=\"live\") > 10"
...
You may want to see a metric's usages in all dashboard queries. You may be unsure about the semantics of a metric and seeing its correct usages definitely helps.
Dashboards' json state can be inspected similarly to alerts. Seeing all dashboard queries regarding haproxy backends:
$ wavectl show -o json dashboard | grep haproxy_backend
"query": "ts(octoproxy.haproxy_backend_connections_total, ${dev})",
"query": "rate(ts(octoproxy.haproxy_backend_connections_total, ${dev}))",
"query": "ts(octoproxy.haproxy_backend_response_errors_total, ${dev})",
"query": "ts(octoproxy.haproxy_backend_retry_warnings_total, ${dev})",
"query": "ts(octoproxy.haproxy_backend_up, ${dev})",
"query": "ts(octoproxy.haproxy_backend_connections_total, env=live)",
"query": "rate(ts(octoproxy.haproxy_backend_connections_total, env=live))",
"query": "ts(octoproxy.haproxy_backend_retry_warnings_total, env=live)",
"query": "ts(octoproxy.haproxy_backend_response_errors_total, env=live)",
"query": "ts(octoproxy.haproxy_backend_up, env=live)",
Some advanced functions in Wavefront query language are not the easiest to learn. It is always helpful to see existing usages of a Wavefront function by your colleagues before writing your own. Take the taggify as an example.
$ wavectl show -o json dashboard | grep taggify
"query": "rawsum(aliasMetric(taggify(${RdyCalicUncrdndNdsWithPod},metric,pod,0),tagk,node,\"(.*)\",\"$1\"),pod,dc,metrics)",
"query": "rawsum(taggify(${NotReadyCalicoPodsWithNodeInMetrics},metric,node,0),node,dc)",
"query": "taggify(${RdyCalicUncrdndNds},tagk,node,node,0)",
"query": "rawsum(taggify(${BirdUp},source,node,0),node,dc)",
"query": "taggify(${RdyCalicUncrdndNds},tagk,node,node,0)",
"query": "rawsum(taggify(${BgpState},source,node,0),node,dc)",
"query": "taggify(${RdyCalicUncrdndNds},tagk,node,node,0)",
After textually inspecting the alert, dashboard state you may want to jump to the Wavefront gui and see the time series there. For that you can use the wavectl browser integration.
$ wavectl show -o json dashboard | jq '{name: .name, sections: [.sections[].name]}'
{
"name": "Skynet Octoproxy Dev",
"sections": [
"Vitals",
"getEndpoints",
"lbRestart",
"getServices",
"HA Proxy Backend Metrics",
"HAProxy Frontend Metrics"
]
}
{
"name": "Data Retention",
"sections": [
"Retention",
"Disposition",
"Disposition Notifications Runmode - Sundays"
]
}
...