OpenTelemetry Collector¶
视图预览¶
OpenTelemetry Collector 性能指标展示:collector 在线时长、内存使用情况、exporter 相关指标、receiver 相关指标 等。
版本支持¶
操作系统:Linux / Windows
OpenTelemetry Collector 版本:>=0.46.0
前置条件¶
- OpenTelemetry Collector 服务器 <安装 Datakit>
安装配置¶
说明:示例 OpenTelemetry Collector 版本为 Linux 环境 docker 方式安装,各个不同版本指标可能存在差异。
部署实施¶
(Linux / Windows 环境相同)
指标采集 (必选)¶
DataKit 有两种方案支持 otel-collector 指标采集,两种方案采集结果一致。
方案一 :通过 prom 采集 OpentTelemetry Collector 指标 方案二:通过 OpenTelemetry 采集器采集 OpenTelemetry Collector 指标
方案一 :通过 prom 采集 OpenTelemetry Collector 指标¶
1、 开启 OpenTelemetry Collector 指标端口,默认端口为:8888
version: '3.3'
services:
# Collector
otel-collector:
image: otel/opentelemetry-collector-contrib:0.46.0
command: ["--config=/etc/otel-collector-config.yaml"]
volumes:
- ./otel-collector-config.yaml:/etc/otel-collector-config.yaml
ports:
- "1888:1888" # pprof extension
- "8888:8888" # Prometheus metrics exposed by the collector
- "8889:8889" # Prometheus exporter metrics
- "13133:13133" # health_check extension
- "4350:4317" # OTLP gRPC receiver
- "55670:55679" # zpages extension
- "4318:4318"
2、 访问 OpenTelemetry Collector 指标 , curl http://otel-collector-host:8888/metrics
。
3、 开启 Datakit prom 插件,复制 sample 文件。·
4、 修改 prom-otelcol.conf
配置文件
主要参数说明
- url:otel-collector 指标地址
- interval:采集频率
- source : 指标器别名
- response_timeout:响应超时时间 (默认5秒)
[[inputs.prom]]
## Exporter URLs
urls = ["http://127.0.0.1:8888/metrics"]
## 忽略对 url 的请求错误
ignore_req_err = false
## 采集器别名
source = "otel-prom"
## 采集数据输出源
# 配置此项,可以将采集到的数据写到本地文件而不将数据打到中心
# 之后可以直接用 datakit --prom-conf /path/to/this/conf 命令对本地保存的指标集进行调试
# 如果已经将 url 配置为本地文件路径,则 --prom-conf 优先调试 output 路径的数据
# output = "/abs/path/to/file"
## 采集数据大小上限,单位为字节
# 将数据输出到本地文件时,可以设置采集数据大小上限
# 如果采集数据的大小超过了此上限,则采集的数据将被丢弃
# 采集数据大小上限默认设置为32MB
# max_file_size = 0
## 指标类型过滤, 可选值为 counter, gauge, histogram, summary, untyped
# 默认只采集 counter 和 gauge 类型的指标
# 如果为空,则不进行过滤
metric_types = []
## 指标名称筛选:符合条件的指标将被保留下来
# 支持正则,可以配置多个,即满足其中之一即可
# 如果为空,则不进行筛选,所有指标均保留
# metric_name_filter = ["cpu"]
## 指标集名称前缀
# 配置此项,可以给指标集名称添加前缀
measurement_prefix = ""
## 指标集名称
# 默认会将指标名称以下划线"_"进行切割,切割后的第一个字段作为指标集名称,剩下字段作为当前指标名称
# 如果配置measurement_name, 则不进行指标名称的切割
# 最终的指标集名称会添加上measurement_prefix前缀
# measurement_name = "prom"
## 采集间隔 "ns", "us" (or "µs"), "ms", "s", "m", "h"
interval = "10s"
## 过滤 tags, 可配置多个tag
# 匹配的tag将被忽略
# tags_ignore = ["xxxx"]
## TLS 配置
tls_open = false
# tls_ca = "/tmp/ca.crt"
# tls_cert = "/tmp/peer.crt"
# tls_key = "/tmp/peer.key"
## 自定义认证方式,目前仅支持 Bearer Token
# token 和 token_file: 仅需配置其中一项即可
# [inputs.prom.auth]
# type = "bearer_token"
# token = "xxxxxxxx"
# token_file = "/tmp/token"
## 自定义指标集名称
# 可以将包含前缀 prefix 的指标归为一类指标集
# 自定义指标集名称配置优先 measurement_name 配置项
#[[inputs.prom.measurements]]
# prefix = "cpu_"
# name = "cpu"
# [[inputs.prom.measurements]]
# prefix = "mem_"
# name = "mem"
## 重命名 prom 数据中的 tag key
[inputs.prom.tags_rename]
overwrite_exist_tags = false
[inputs.prom.tags_rename.mapping]
# tag1 = "new-name-1"
# tag2 = "new-name-2"
# tag3 = "new-name-3"
## 自定义Tags
[inputs.prom.tags]
# some_tag = "some_value"
# more_tag = "some_other_value"
5、 重启 Datakit (如果需要开启日志,请配置日志采集再重启)
6、 OpenTelemetry Collector 指标采集验证,使用命令 /usr/local/datakit/datakit -M |egrep "最近采集|otel"
指标预览
方案二:通过 OpenTelemetry 采集器采集 OpenTelemetry Collector 指标¶
1、 collector 新增 otlp exporter。
receivers:
otlp:
protocols:
grpc:
http:
cors:
allowed_origins:
- http://*
- https://*
exporters:
otlp:
endpoint: "http://192.168.91.11:4319"
tls:
insecure: true
compression: none # 不开启gzip
processors:
batch:
extensions:
health_check:
pprof:
endpoint: :1888
zpages:
endpoint: :55679
service:
extensions: [pprof, zpages, health_check]
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [otlp]
metrics:
receivers: [otlp]
processors: [batch]
exporters: [otlp]
参数说明
otlp.endpoint:配置datakit opentelemetry grpc地址
2、 开启 prom 插件,复制 Sample 文件
3、 修改 opentelemetry.conf
[[inputs.opentelemetry]]
## 在创建'trace',Span','resource'时,会加入很多标签,这些标签最终都会出现在'Span'中
## 当您不希望这些标签太多造成网络上不必要的流量损失时,可选择忽略掉这些标签
## 支持正则表达,注意:将所有的'.'替换成'_'
## When creating 'trace', 'span' and 'resource', many labels will be added, and these labels will eventually appear in all 'spans'
## When you don't want too many labels to cause unnecessary traffic loss on the network, you can choose to ignore these labels
## Support regular expression. Note!!!: all '.' Replace with '_'
# ignore_attribute_keys = ["os_*","process_*"]
## Keep rare tracing resources list switch.
## If some resources are rare enough(not presend in 1 hour), those resource will always send
## to data center and do not consider samplers and filters.
# keep_rare_resource = false
## Ignore tracing resources map like service:[resources...].
## The service name is the full service name in current application.
## The resource list is regular expressions uses to block resource names.
# [inputs.opentelemetry.close_resource]
# service1 = ["resource1", "resource2", ...]
# service2 = ["resource1", "resource2", ...]
# ...
## Sampler config uses to set global sampling strategy.
## priority uses to set tracing data propagation level, the valid values are -1, 0, 1
## -1: always reject any tracing data send to datakit
## 0: accept tracing data and calculate with sampling_rate
## 1: always send to data center and do not consider sampling_rate
## sampling_rate used to set global sampling rate
# [inputs.opentelemetry.sampler]
# priority = 0
# sampling_rate = 1.0
# [inputs.opentelemetry.tags]
# key1 = "value1"
# key2 = "value2"
# ...
[inputs.opentelemetry.expectedHeaders]
## 如有header配置 则请求中必须要携带 否则返回状态码500
## 可作为安全检测使用,必须全部小写
# ex_version = xxx
# ex_name = xxx
# ...
## grpc
[inputs.opentelemetry.grpc]
## trace for grpc
trace_enable = true
## metric for grpc
metric_enable = true
## grpc listen addr
# addr = "127.0.0.1:4317"
addr = "0.0.0.0:4319"
## http
[inputs.opentelemetry.http]
## if enable=true
## http path (do not edit):
## trace : /otel/v1/trace
## metric: /otel/v1/metric
## use as : http://127.0.0.1:9529/otel/v11/trace . Method = POST
enable = true
## return to client status_ok_code :200/202
http_status_ok = 200
- trace_enable:true #开启grpc trace
- metric_enable: true #开启grpc metric
- addr: 0.0.0.0:4319 #开启端口
4、 重启 DataKit
插件标签 (非必选)¶
参数说明
- 该配置为自定义标签,可以填写任意 key-value 值
- 以下示例配置完成后,所有 OpenTelemetry Collector 指标都会带有 env= dev 的标签,可以进行快速查询
- 相关文档 <DataFlux Tag 应用最佳实践>
重启datakit
场景视图¶
<场景 - 新建仪表板 - 模板库 - 系统视图 - Opentelemetry Collector 监控视图>
指标详解¶
指标 | 描述 |
---|---|
process_uptime | 在线时长 |
process_memory_rss | 内存使用 |
exporter_sent_log_records | exporter 发送 log 记录数 |
exporter_sent_metric_points | exporter 发送 metric 记录数 |
exporter_sent_spans | exporter 发送 span 记录数 |
receiver_accepted_log_records | reveiver 接收 log 记录数 |
receiver_accepted_metric_points | reveiver 接收 metric 记录数 |
receiver_accepted_spans | reveiver 接收 span 记录数 |
常见问题排查¶
<无数据上报排查>