Prometheus Monitoring(Borgmon)开源监控告警解决方案

  1. Prometheus由Go语言编写而成,采用Pull方式获取监控信息,并提供了多维度的数据模型和灵活的查询接口。Prometheus不仅可以通过静态文件配置监控对象,还支持自动发现机制,能通过Kubernetes、Consl、DNS等多种方式动态获取监控对象。在数据采集方面,借助Go语言的高并发特性,单机Prometheus可以采取数百个节点的监控数据;在数据存储方面,随着本地时序数据库的不断优化,单机Prometheus每秒可以采集一千万个指标,如果需要存储大量的历史监控数据,则还支持远程存储。
    提供强大的数据采集、数据存储、数据展示、告警等,天生完美支持kubernetes

  2. Metrics provide a way of monitoring and understanding behavior in aggregate : 指标提供了一种监视和理解总体行为的方式

  3. This means that labels _represent multiple dimensions of a metric_. A combination of a metric name and a label yields a single metric. In other words, each time you create a new key-value pair on a metric you will get a new timeseries in the database. Hence, be very careful that your key-value pairs are constrained. Do not store things such as IDs or email addresses, which are unbounded.

  4. Prometheus caters for different types of measurements by having four different types of metrics --- Prometheus拥有四种不同类型的指标,可满足不同类型的测量需求。

    1. Counter(累加指标): A cumulative metric that only ever increases. (E.g. requests served, tasks completed, errors occured)
      --计数器:只会不断增加的累计指标。 (例如,处理的请求,完成的任务,发生的错误)
    2. Gauge(测量指标): A metric that can arbitrarily go up or down. (E.g. temperature, memory usage)
      --量表:可以任意增加或减少的指标。 (例如温度,内存使用情况);其为瞬时的,与时间没有关系的,可以任意变化的数据。
    3. Histogram(直方图): Binned measurement of a continuous variable. (E.g. latency, request duration, age)
      --直方图:连续变量的装箱测量。 (例如,延迟,请求持续时间,年龄);用于观察结果采样,分组及统计,如:请求持续时间,响应大小。其主要用于表示一段时间内对数据的采样,并能够对其指定区间及总数进行统计。根据统计区间计算
    4. Summary(概略图): Similar to a histogram, except the bins are converted into an aggregate (e.g. 99% percentile) immediately--与直方图相似,不同之处在于bins会立即转换为汇总(例如99%百分位数)。用于表示一段时间内数据采样结果,其直接存储quantile数据,而不是根据统计区间计算出来的。
      不需要计算,直接存储结果
  5. Instrumentation in Prometheus terms means adding client libraries to your application in order for them to expose metrics to Prometheus. While instrumenting, you will essentially create memory objects (like gauges or counters) that you will increment or decrement on the fly

  6. prometheus存储的是时序数据,即按相同时序(相同名称和标签),以时间维度存储连续的数据的集合。时序(time series)是由名字(Metric)以及一组key/value标签定义的,具有相同的名字以及标签属于相同时序。

  7. 样本:按照某个时序以时间维度采集的数据,称之为样本。实际的时间序列,每个序列包括一个float64的值和一个毫秒级的时间戳

  8. Prometheus的基本原理是通过HTTP周期性抓取被监控组件的状态,任意组件只要提供对应的HTTP接口并符合Prometheus定义的数据格式,就可以介入Prometheus监控; Prometheus可以在各个层面实现监控,如下:

    • 基础设施层:监控各个主机服务器资源(包括Kubernetes的Node和非Kubernetes的Node),如CPU,内存,网络吞吐和带宽占用,磁盘I/O和磁盘使用等指标。

    • 中间件层监控独立部署于Kubernetes集群之外的中间件,例如:MySQL、Redis、RabbitMQ、ElasticSearch、Nginx等。

    • Kubernetes集群:监控Kubernetes集群本身的关键指标

    • Kubernetes集群上部署的应用:监控部署在Kubernetes集群上的应用

  9. Prometheus actively screens targets in order to retrieve metrics from them --- Prometheus主动筛选目标,以便从目标中检索指标。

  10. Prometheus is not designed to catch individual and punctual events in time (such as a service outage for example) but it is designed to gather pre-aggregated metrics about your services --- Prometheus并非旨在及时捕获单个事件和准时事件(例如,服务中断),而是旨在收集有关您的服务的预汇总指标。

  11. there is a little bit of a gotcha with those points --- 这些要点有点困惑。

  12. Instrumentation in Prometheus terms means adding client libraries to your application in order for them to expose metrics to Prometheus.

  13. By itself, the Prometheus operator doesn’t really deploy a complete monitoring solution; kube-prometheus takes it one step further by being prescriptive in what you monitor and how. And — in addition to pre-configuring alertmanager — it automatically deploys kube-state-metrics, which is pretty rad. --- 就其本身而言,Prometheus Operator并没有真正部署完整的监控解决方案。 kube-prometheus通过对您所监视的内容和方式进行规定,从而进一步迈出了一步。 并且-除了预配置alertmanager之外,它还会自动部署kube-state-metrics,这很不错。
  14. kube-state-metrics vs. metrics-server

  15. Kubernetes monitoring: the data sources

  16. The Kubelet ships with built-in support for cAdvisor, which collects, aggregates, processes and exports metrics (such as CPU, memory, file and network usage) about running containers on a given node

  17. Etcd is a core component to Kubernetes and should be secured appropriately --- etcd是kubernetes的核心组件,应该得到适当的保护

  18. Kube-state-metrics interrogates the Kubernetes API server, and exposes a bunch of states about all the Kubernetes objects. Because it is an exporter, the kube-state-metrics package does this in the Prometheus metrics exposition format --- Kube-state-metrics询问Kubernetes API服务器,并公开有关所有Kubernetes对象的一堆状态。 由于它是导出器,因此kube-state-metrics包以Prometheus指标展示格式进行此操作。

  19. indexes metric names and labels to time series in the chunk files --- 将度量名称和标签索引到块文件中的时间序列

  20. Prometheus按2小时一个block进行存储,每个block由一个目录组成,该目录里包含:一个或者多个chunk文件(保存timeseries数据)、一个metadata文件、一个index文件(通过metric name和labels查找timeseries数据在chunk文件的位置)。

  21. rule file: 报警规则文

  22. time-series是按照时间戳和值的序列顺序存放的,我们称之为向量(vector)。每条time-series通过指标名称(metrics name)和一组标签集(labelset)命名。在time-series中的每一个点称为一个样本(sample),样本由以下三部分组成:

    1. 指标(metric):metric name和描述当前样本特征的labelsets;

    2. 时间戳(timestamp):一个精确到毫秒的时间戳;

    3. 样本值(value): 一个folat64的浮点型数据表示当前样本的值。

  23. count observations falling into particular buckets of observation values --- 计算落入特定观察值类别的观察值。

  24. These labels designate different latency percentiles and target group intervals --- 这些标签指定不同的延迟百分比和目标组间隔。

  25. at a technical level Kubernetes exports all of its internal metrics in the native Prometheus format --- 在技术层面上,Kubernetes以原生Prometheus格式导出其所有内部指标。

  26. Role of Prometheus Operator in Cluster Monitoring

    1. Enables seamless installation of Prometheus Operator with Kubernetes-native configuration options --- 使用Kubernetes原生配置选项实现Prometheus Operator的无缝安装。
    2. Enables creating and destroying the Prometheus instance in the Kubernetes namespace, a specific application or team easily using the Operator.

    3. Enables to preconfigure configuration including versions, persistence, retention policies and replicas from a native Kubernetes resource. --- 可以使用原生Kubernetes资源预配置configuration ,包括版本,持久性,保留策略和副本。

    4. Enables to discover the target services using labels with automatic generation of monitoring target configurations based on familiar Kubernetes label queries 能够使用标签发现目标服务,并根据熟悉的Kubernetes标签查询自动生成监视目标配置

  27. For alert evaluation this situation does not change anything, as alerts are typically only fired when a certain query triggers for a period of time --- 对于警报评估,这种情况不会改变任何内容,因为警报通常只在特定查询触发一段时间后才会触发
  28. Prometheus 黑盒监控:https://github.com/prometheus/blackbox_exporter
    https://zhangguanzhang.github.io/2018/12/04/black-box-exporter/

    Thanos: https://github.com/thanos-io/thanos

  29. 我们按照instrumentation、exposition、collection、query这样的流程构建监控系统,instrumentation关注的是如何测量应用的指标,有哪些指标需要测量;exposition关注的是如何通过http协议将指标暴露出来;collection关注的是如何采集指标;query关注的是如何构建查询时序数据的PromQL表达式。

  30. The largest payoffs you will get from Prometheus are through instrumenting your own applications using direct instrumentation and a client library --- 您将从Prometheus获得的最大收益是通过使用直接检测和客户端库对自己的应用程序进行检测。

  31. 标签共有两大类:监控标签(instrumentation label)和目标标签(target label)。监控标签来自被监控的资源 - 例如,对于与HTTP相关的时间序列,标签可能会显示所使用的特定HTTP谓词。这些标签在被抓取之前被添加到时间序列中,例如由客户端或exporter。目标标签更多地与架构相关 - 它们可能会识别时间序列所在的数据中心。目标标签在Prometheus抓取期间和之后添加。Direct instrumentation 是作为程序源代码的一部分内联添加的

References

  1. Prometheus_Kubernetes中文社区
  2. Prometheus 原理介绍 (Good)
  3. Prometheus Monitoring : The Definitive Guide in 2019
  4. coreos Prometheus Operator
  5. kube-prometheus
  6. Monitoring Kubernetes + Docker, part 2: Prometheus (Great)

  7. Prometheus监控外接ETCD集群 (重要)

  8. 使用 Prometheus Operator 监控 Kubernetes Etcd

  9. 使用Prometheus Operator 监控Kubernetes (docker save)

  10. Prometheus监控Kubernetes 集群节点及应用
  11. Node Exporter (Series article)
  12. Kubernetes metrics (cAdvisor)
  13. prometheus-book HA
  14. Robust Perception Blog (Prometheue training Blog 重要)
  15. scrape configuration for running Prometheus on a Kubernetes cluster

Monitoring Kubernetes internal components with Prometheus

  1. Prometheus Operator Cluster Monitoring (kubelet service)
  2. 优雅的使用Prometheus Operator(非常有用)
  3. Scaling out Grafana with Kubernetes and AWS (How to override Gafana Configuration)
  4. [stable/prometheus-operator] Expose grafana, prometheus, and alertmanager via nginx-ingress #13020 (Ingress Configuration)
  5. 使用prometheus采集ingress-nginx数据grafan展示效果

results matching ""

    No results matching ""