跳转至

ElasticSearch


·


ElasticSearch 采集器主要采集节点运行情况、集群健康、JVM 性能状况、索引性能、检索性能等。

前置条件

  • ElasticSearch 版本 >= 6.0.0
  • ElasticSearch 默认采集 Node Stats 指标,如果需要采集 Cluster-Health 相关指标,需要设置 cluster_health = true
  • 设置 cluster_health = true 可产生如下指标集
  • elasticsearch_cluster_health

  • 设置 cluster_stats = true 可产生如下指标集

  • elasticsearch_cluster_stats

用户权限配置

如果开启账号密码访问,需要配置相应的权限,否则会导致监控信息获取失败错误。目前支持 Elasticsearch , Open Distro for Elasticsearch 和 OpenSearch。

Elasticsearch

  • 创建角色monitor,设置如下权限
  {
    "applications": [],
    "cluster": [
        "monitor"
    ],
    "global": [],
    "indices": [
        {
            "allow_restricted_indices": false,
            "names": [
                "all"
            ],
            "privileges": [
                "manage_ilm",
                "monitor"
            ]
        },
    ],
    "run_as": []
  }
  • 创建自定义用户,并赋予新创建的monitor角色。
  • 其他信息请参考配置文件说明

Open Distro for Elasticsearch

  • 创建用户
  • 创建角色 monitor, 设置如下权限:
PUT _opendistro/_security/api/roles/monitor
{
  "description": "monitor es cluster",
  "cluster_permissions": [
    "cluster:admin/opendistro/ism/managedindex/explain",
    "cluster_monitor",
    "cluster_composite_ops_ro"
  ],
  "index_permissions": [
    {
      "index_patterns": [
        "*"
      ],
      "fls": [],
      "masked_fields": [],
      "allowed_actions": [
        "read",
        "indices_monitor"
      ]
    }
  ],
  "tenant_permissions": []
}
  • 设置角色与用户之间的映射关系

OpenSearch

  • 创建用户
  • 创建角色 monitor, 设置如下权限:
PUT _plugins/_security/api/roles/monitor
{
  "description": "monitor es cluster",
  "cluster_permissions": [
    "cluster:admin/opendistro/ism/managedindex/explain",
    "cluster_monitor",
    "cluster_composite_ops_ro"
  ],
  "index_permissions": [
    {
      "index_patterns": [
        "*"
      ],
      "fls": [],
      "masked_fields": [],
      "allowed_actions": [
        "read",
        "indices_monitor"
      ]
    }
  ],
  "tenant_permissions": []
}
  • 设置角色与用户之间的映射关系

进入 DataKit 安装目录下的 conf.d/db 目录,复制 elasticsearch.conf.sample 并命名为 elasticsearch.conf。示例如下:

[[inputs.elasticsearch]]
  ## Elasticsearch服务器配置
  # 支持Basic认证:
  # servers = ["http://user:pass@localhost:9200"]
  servers = ["http://localhost:9200"]

  ## 采集间隔
  # 单位 "ns", "us" (or "µs"), "ms", "s", "m", "h"
  interval = "10s"

  ## HTTP超时设置
  http_timeout = "5s"

  ## 发行版本: elasticsearch, opendistro, opensearch
  distribution = "elasticsearch"

  ## 默认local是开启的,只采集当前Node自身指标,如果需要采集集群所有Node,需要将local设置为false
  local = true

  ## 设置为true可以采集cluster health
  cluster_health = false

  ## cluster health level 设置,indices (默认) 和 cluster
  # cluster_health_level = "indices"

  ## 设置为true时可以采集cluster stats.
  cluster_stats = false

  ## 只从master Node获取cluster_stats,这个前提是需要设置 local = true
  cluster_stats_only_from_master = true

  ## 需要采集的Indices, 默认为 _all
  indices_include = ["_all"]

  ## indices级别,可取值:"shards", "cluster", "indices"
  indices_level = "shards"

  ## node_stats可支持配置选项有"indices", "os", "process", "jvm", "thread_pool", "fs", "transport", "http", "breaker"
  # 默认是所有
  # node_stats = ["jvm", "http"]

  ## HTTP Basic Authentication 用户名和密码
  # username = ""
  # password = ""

  ## TLS Config
  tls_open = false
  # tls_ca = "/etc/telegraf/ca.pem"
  # tls_cert = "/etc/telegraf/cert.pem"
  # tls_key = "/etc/telegraf/key.pem"
  ## Use TLS but skip chain & host verification
  # insecure_skip_verify = false

  ## Set true to enable election
  election = true

  # [inputs.elasticsearch.log]
  # files = []
  # #grok pipeline script path
  # pipeline = "elasticsearch.p"

  [inputs.elasticsearch.tags]
    # some_tag = "some_value"
    # more_tag = "some_other_value"

配置好后,重启 DataKit 即可。

目前可以通过 ConfigMap 方式注入采集器配置来开启采集器。

指标集

以下所有数据采集,默认会追加名为 host 的全局 tag(tag 值为 DataKit 所在主机名),也可以在配置中通过 [inputs.elasticsearch.tags] 指定其它标签:

[inputs.elasticsearch.tags]
# some_tag = "some_value"
# more_tag = "some_other_value"
# ...

elasticsearch_node_stats

  • 标签
标签名 描述
cluster_name Name of the cluster, based on the Cluster name setting setting.
node_attribute_ml.enabled Set to true (default) to enable machine learning APIs on the node.
node_attribute_ml.machine_memory The machine’s memory that machine learning may use for running analytics processes.
node_attribute_ml.max_open_jobs The maximum number of jobs that can run simultaneously on a node.
node_attribute_xpack.installed Show whether xpack is installed.
node_host Network host for the node, based on the network.host setting.
node_id The id for the node.
node_name Human-readable identifier for the node.
  • 指标列表
指标 描述 数据类型 单位
fs_data_0_available_in_gigabytes Total number of gigabytes available to this Java virtual machine on this file store. float B
fs_data_0_free_in_gigabytes Total number of unallocated gigabytes in the file store. float B
fs_data_0_total_in_gigabytes Total size (in gigabytes) of the file store. float B
fs_io_stats_devices_0_operations The total number of read and write operations for the device completed since starting Elasticsearch. float count
fs_io_stats_devices_0_read_kilobytes The total number of kilobytes read for the device since starting Elasticsearch. float count
fs_io_stats_devices_0_read_operations The total number of read operations for the device completed since starting Elasticsearch. float count
fs_io_stats_devices_0_write_kilobytes The total number of kilobytes written for the device since starting Elasticsearch. float count
fs_io_stats_devices_0_write_operations The total number of write operations for the device completed since starting Elasticsearch. float count
fs_io_stats_total_operations The total number of read and write operations across all devices used by Elasticsearch completed since starting Elasticsearch. float count
fs_io_stats_total_read_kilobytes The total number of kilobytes read across all devices used by Elasticsearch since starting Elasticsearch. float count
fs_io_stats_total_read_operations The total number of read operations for across all devices used by Elasticsearch completed since starting Elasticsearch. float count
fs_io_stats_total_write_kilobytes The total number of kilobytes written across all devices used by Elasticsearch since starting Elasticsearch. float count
fs_io_stats_total_write_operations The total number of write operations across all devices used by Elasticsearch completed since starting Elasticsearch. float count
fs_timestamp Last time the file stores statistics were refreshed. Recorded in milliseconds since the Unix Epoch. float msec
fs_total_available_in_gigabytes Total number of gigabytes available to this Java virtual machine on all file stores. float B
fs_total_free_in_gigabytes Total number of unallocated gigabytes in all file stores. float B
fs_total_total_in_gigabytes Total size (in gigabytes) of all file stores. float B
http_current_open Current number of open HTTP connections for the node. float count
indices_fielddata_evictions Total number of evictions from the field data cache across all shards assigned to selected nodes. float count
indices_fielddata_memory_size_in_bytes Total amount, in bytes, of memory used for the field data cache across all shards assigned to selected nodes. float B
indices_get_missing_time_in_millis Time in milliseconds spent performing failed get operations. float ms
indices_get_missing_total Total number of failed get operations. float count
jvm_gc_collectors_old_collection_count Number of JVM garbage collectors that collect old generation objects. float count
jvm_gc_collectors_old_collection_time_in_millis Total time in milliseconds spent by JVM collecting old generation objects. float ms
jvm_gc_collectors_young_collection_count Number of JVM garbage collectors that collect young generation objects. float count
jvm_gc_collectors_young_collection_time_in_millis Total time in milliseconds spent by JVM collecting young generation objects. float ms
jvm_mem_heap_committed_in_bytes Amount of memory, in bytes, available for use by the heap. float B
jvm_mem_heap_used_percent Percentage of memory currently in use by the heap. float count
os_cpu_load_average_15m Fifteen-minute load average on the system (field is not present if fifteen-minute load average is not available). float count
os_cpu_load_average_1m One-minute load average on the system (field is not present if one-minute load average is not available). float count
os_cpu_load_average_5m Five-minute load average on the system (field is not present if five-minute load average is not available). float count
os_cpu_percent Recent CPU usage for the whole system, or -1 if not supported. float count
os_mem_total_in_bytes Total amount of physical memory in bytes. float B
os_mem_used_in_bytes Amount of used physical memory in bytes. float B
os_mem_used_percent Percentage of used memory. float percent
process_open_file_descriptors Number of opened file descriptors associated with the current or -1 if not supported. float count
thread_pool_force_merge_queue Number of tasks in queue for the thread pool float count
thread_pool_force_merge_rejected Number of tasks rejected by the thread pool executor. float count
thread_pool_rollup_indexing_queue Number of tasks in queue for the thread pool float count
thread_pool_rollup_indexing_rejected Number of tasks rejected by the thread pool executor. float count
thread_pool_search_queue Number of tasks in queue for the thread pool float count
thread_pool_search_rejected Number of tasks rejected by the thread pool executor. float count
thread_pool_transform_indexing_queue Number of tasks in queue for the thread pool float count
thread_pool_transform_indexing_rejected Number of tasks rejected by the thread pool executor. float count
transport_rx_size_in_bytes Size of RX packets received by the node during internal cluster communication. float B
transport_tx_size_in_bytes Size of TX packets sent by the node during internal cluster communication. float B

elasticsearch_indices_stats

  • 标签
标签名 描述
cluster_name Name of the cluster, based on the Cluster name setting setting.
index_name Name of the index. The name '_all' target all data streams and indices in a cluster.
  • 指标列表
指标 描述 数据类型 单位
total_flush_total Number of flush operations. float count
total_flush_total_time_in_millis Total time in milliseconds spent performing flush operations. float ms
total_get_missing_total Total number of failed get operations. float count
total_indexing_index_current Number of indexing operations currently running. float count
total_indexing_index_time_in_millis Total time in milliseconds spent performing indexing operations. float ms
total_indexing_index_total Total number of indexing operations. float count
total_merges_current_docs Number of document merges currently running. float count
total_merges_total Total number of merge operations. float count
total_merges_total_docs Total number of merged documents. float count
total_merges_total_time_in_millis Total time in milliseconds spent performing merge operations. float ms
total_refresh_total Total number of refresh operations. float count
total_refresh_total_time_in_millis Total time in milliseconds spent performing refresh operations. float ms
total_search_fetch_current Number of fetch operations currently running. float count
total_search_fetch_time_in_millis Time in milliseconds spent performing fetch operations. float ms
total_search_fetch_total Total number of fetch operations. float count
total_search_query_current Number of query operations currently running. float count
total_search_query_time_in_millis Time in milliseconds spent performing query operations. float ms
total_search_query_total Total number of query operations. float count
total_store_size_in_bytes Total size, in bytes, of all shards assigned to selected nodes. float B

elasticsearch_cluster_stats

  • 标签
标签名 描述
cluster_name Name of the cluster, based on the cluster.name setting.
node_name Name of the node.
status Health status of the cluster, based on the state of its primary and replica shards.
  • 指标列表
指标 描述 数据类型 单位
nodes_process_open_file_descriptors_avg Average number of concurrently open file descriptors. Returns -1 if not supported. float count

elasticsearch_cluster_health

  • 标签
标签名 描述
cluster_name Name of the cluster.
cluster_status The cluster status: red, yellow, green.
  • 指标列表
指标 描述 数据类型 单位
active_primary_shards The number of active primary shards in the cluster. int count
active_shards The number of active shards in the cluster. int count
indices_lifecycle_error_count The number of indices that are managed by ILM and are in an error state. int count
initializing_shards The number of shards that are currently initializing. int count
number_of_data_nodes The number of data nodes in the cluster. int count
number_of_pending_tasks The total number of pending tasks. int count
relocating_shards The number of shards that are relocating from one node to another. int count
status_code The health as a number: red = 3, yellow = 2, green = 1. int count
unassigned_shards The number of shards that are unassigned to a node. int count

日志采集

Attention

日志采集仅支持采集已安装 DataKit 主机上的日志

如需采集 ElasticSearch 的日志,可在 elasticsearch.conf 中 将 files 打开,并写入 ElasticSearch 日志文件的绝对路径。比如:

[[inputs.elasticsearch]]
  ...
[inputs.elasticsearch.log]
files = ["/path/to/your/file.log"]

开启日志采集以后,默认会产生日志来源(source)为 elasticsearch 的日志。

日志 pipeline 功能切割字段说明

  • ElasticSearch 通用日志切割

通用日志文本示例:

[2021-06-01T11:45:15,927][WARN ][o.e.c.r.a.DiskThresholdMonitor] [master] high disk watermark [90%] exceeded on [A2kEFgMLQ1-vhMdZMJV3Iw][master][/tmp/elasticsearch-cluster/nodes/0] free: 17.1gb[7.3%], shards will be relocated away from this node; currently relocating away shards totalling [0] bytes; the node is expected to continue to exceed the high disk watermark when these relocations are complete

切割后的字段列表如下:

字段名 字段值 说明
time 1622519115927000000 日志产生时间
name o.e.c.r.a.DiskThresholdMonitor 组件名称
status WARN 日志等级
nodeId master 节点名称
  • ElastiSearch 搜索慢日志切割

搜索慢日志文本示例:

[2021-06-01T11:56:06,712][WARN ][i.s.s.query              ] [master] [shopping][0] took[36.3ms], took_millis[36], total_hits[5 hits], types[], stats[], search_type[QUERY_THEN_FETCH], total_shards[1], source[{"query":{"match":{"name":{"query":"Nariko","operator":"OR","prefix_length":0,"max_expansions":50,"fuzzy_transpositions":true,"lenient":false,"zero_terms_query":"NONE","auto_generate_synonyms_phrase_query":true,"boost":1.0}}},"sort":[{"price":{"order":"desc"}}]}], id[], 

切割后的字段列表如下:

字段名 字段值 说明
time 1622519766712000000 日志产生时间
name i.s.s.query 组件名称
status WARN 日志等级
nodeId master 节点名称
index shopping 索引名称
duration 36000000 请求耗时,单位ns
  • ElasticSearch 索引慢日志切割

索引慢日志文本示例:

[2021-06-01T11:56:19,084][WARN ][i.i.s.index              ] [master] [shopping/X17jbNZ4SoS65zKTU9ZAJg] took[34.1ms], took_millis[34], type[_doc], id[LgC3xXkBLT9WrDT1Dovp], routing[], source[{"price":222,"name":"hello"}]

切割后的字段列表如下:

字段名 字段值 说明
time 1622519779084000000 日志产生时间
name i.i.s.index 组件名称
status WARN 日志等级
nodeId master 节点名称
index shopping 索引名称
duration 34000000 请求耗时,单位ns

更多阅读