Kubernetes Monitoring

Goals for Monitoring
1. the first and foremost goal of monitoring is reliability
2. Important to have proper alerting in place (有适当的报警)
3. In addition to reliability, another significant feature of a monitoring system is providing observability into Kubernetes Cluster ---除了可靠性之外，监控系统的另一个重要特性是为您的Kubernetes集群提供可观察性。
4. another important use case for cluster monitoring is that of providing users with insight into the operation of the cluster
These metrics are collected by the lightweight, short-term, in-memory metrics-server and are exposed via themetrics.k8s.ioAPI. metrics-server discovers all nodes on the cluster and queries each node’s kubelet for CPU and memory usage.
Metrics-Server是集群核心监控数据的聚合器
Kubernetes 1.7.3之前，cAdvisor的metrics数据集成在kubelet的metrics中，通过节点开放的4194端口获取数据 Kubernetes 1.7.3之后，cAdvisor的metrics被从kubelet的metrics独立出来了，在prometheus采集的时候变成两个scrape的job。网上很多文档记录都说在node节点会开放4194端口，可以通过该端口获取cAdvisor的metrics数据，新版本kubelet中的cadvisor没有对外开放4194端口，只能通过apiserver提供的api做代理获取监控指标metrics。
Node service discovery is useful for monitoring the infrastructure of and under Kubernetes, but not much use for monitoring your applications running on Kubernetes --- 节点服务发现对于监视Kubernetes及其以下的基础结构很有用，但对于监视Kubernetes上运行的应用程序没有多大用处。
It is common to **have separate Prometheus servers for network, infrastructure, and application monitoring**. This is known as vertical sharding and it is the best way to scale Prometheus

PromQL

join different metrics together for arithmetic operations against them --- 将不同的度量组合在一起进行算术运算
The values for each timestamp will be the values recorded in the time series back in time, taken from the timestamp for the length of time given in the range duration --- 每个时间戳记的值都将是按时间回溯记录在时间序列中的值, 时间戳的取值来自于时间范围确定的持续时间。
A range-vector is typically generated in order to then apply a function to it to get an instant-vector, which can be graphed (only instant vectors can be graphed).
group_leftorgroup_rightkeywords convert the match into a many-to-one or one-to-many matching respectively. The left and right indicate the side that has the higher cardinality. So a group_left means that multiple series on the left side can match a single series on the right. The result of this is that the returned instant-vector contains all of the labels from the side with the higher cardinality, even if they don’t match any label on the right.
sometimes a time series with no suffix with a quantile label. 有时是没有后缀且带有分位数标签的时间序列。
Aggregation operators work only on instant vectors, and they also output instant vectors. 聚合运算符仅适用于即时向量，并且它们还输出即时向量。
When a PromQL operator or function could change the value or meaning of a time series, the metric name is removed.
The main use of the standard deviation in monitoring is to detect outliers.标准偏差在监测中的主要用途是检测异常值。
allowing classes of analysis that few other metrics systems offer --- 支持其他指标系统无法提供的分析类别
All the logical operators (and,or, unless) work in a many-to-many fashion, and they are the only operators that work many-to-many
Prometheus works entirely in UTC, and has no notion of time zones
a time series continues beyond the bound of the range if the first/last samples is within 110% of the average interval of the data. If this is not the case, it is presumed the time series exists for 50% of an interval beyond the samples you have, but not with the value going below zero 如果第一个/最后一个采样在数据平均间隔的110％以内，则时间序列将继续超出范围的边界。如果不是这种情况，则假定时间序列存在的时间间隔超出了您拥有的样本的间隔的50％，但不存在小于零的值

Kubernetes Monitoring

Kubernetes Monitoring

PromQL

Blogs

Logging

PromQL (Instrumentation)

Netflix Mantis Monitoring System

results matching ""

No results matching ""