Redis¶

·

Redis 指标采集器，采集以下数据：

开启 AOF 数据持久化，会收集相关指标
RDB 数据持久化指标
Slowlog 监控指标
bigkey scan 监控
主从replication

前置条件¶

Redis 版本 v5.0+

在采集主从架构下数据时，请配置从节点的主机信息进行数据采集，可以得到主从相关的指标信息。

创建监控用户

redis6.0+ 进入redis-cli命令行,创建用户并且授权

ACL SETUSER username >password
ACL SETUSER username on +@dangerous
ACL SETUSER username on +ping

配置¶

主机安装Kubernetes

进入 DataKit 安装目录下的 conf.d/db 目录，复制 redis.conf.sample 并命名为 redis.conf。示例如下：

[[inputs.redis]]
  host = "localhost"
  port = 6379
  # unix_socket_path = "/var/run/redis/redis.sock"
  # 配置多个db，配置了dbs，db也会放入采集列表。dbs=[]或者不配置则会采集redis中所有非空的db
  # dbs=[]
  # username = "<USERNAME>"
  # password = "<PASSWORD>"

  ## @param connect_timeout - number - optional - default: 10s
  # connect_timeout = "10s"

  ## @param service - string - optional
  # service = "<SERVICE>"

  ## @param interval - number - optional - default: 15
  interval = "15s"

  ## @param keys - list of strings - optional
  ## The length is 1 for strings.
  ## The length is zero for keys that have a type other than list, set, hash, or sorted set.
  #
  # keys = ["KEY_1", "KEY_PATTERN"]

  ## @param warn_on_missing_keys - boolean - optional - default: true
  ## If you provide a list of 'keys', set this to true to have the Agent log a warning
  ## when keys are missing.
  #
  # warn_on_missing_keys = true

  ## @param slow_log - boolean - optional - default: false
  slow_log = true

  ## @param slowlog-max-len - integer - optional - default: 128
  slowlog-max-len = 128

  ## @param command_stats - boolean - optional - default: false
  ## Collect INFO COMMANDSTATS output as metrics.
  # command_stats = false

  ## Set true to enable election
  election = true

  # [inputs.redis.log]
  # #required, glob logfiles
  # files = ["/var/log/redis/*.log"]

  ## glob filteer
  #ignore = [""]

  ## grok pipeline script path
  #pipeline = "redis.p"

  ## optional encodings:
  ##    "utf-8", "utf-16le", "utf-16le", "gbk", "gb18030" or ""
  #character_encoding = ""

  ## The pattern should be a regexp. Note the use of '''this regexp'''
  ## regexp link: https://golang.org/pkg/regexp/syntax/#hdr-Syntax
  #match = '''^\S.*'''

  [inputs.redis.tags]
  # some_tag = "some_value"
  # more_tag = "some_other_value"

配置好后，重启 DataKit 即可。

目前可以通过 ConfigMap 方式注入采集器配置来开启采集器。

Attention

如果是阿里云 Redis，且设置了对应的用户名密码，conf 中的 <PASSWORD> 应该设置成 your-user:your-password，如 datakit:Pa55W0rd

指标集¶

以下所有数据采集，默认会追加名为 host 的全局 tag（tag 值为 DataKit 所在主机名），也可以在配置中通过 [inputs.redis.tags] 指定其它标签：

 [inputs.redis.tags]
  # some_tag = "some_value"
  # more_tag = "some_other_value"
  # ...

指标¶

`redis_bigkey`¶

标签

标签名	描述
`db_name`	db
`key`	monitor key
`server`	Server addr

字段列表

指标	描述	数据类型	单位
`value_length`	Key length	int	-

`redis_client`¶

标签

标签名	描述
`name`	The name set by the client with CLIENT SETNAME, default unknown
`server`	Server addr

字段列表

指标	描述	数据类型	单位
`addr`	Address/port of the client	string	-
`age`	Total duration of the connection in seconds	int	count
`fd`	File descriptor corresponding to the socket	int	count
`id`	AN unique 64-bit client ID	string	-
`idle`	Idle time of the connection in seconds	int	count
`psub`	Number of pattern matching subscriptions	int	count
`sub`	Number of channel subscriptions	int	count

`redis_cluster`¶

标签

标签名	描述
`server`	Server addr

字段列表

指标	描述	数据类型	单位
`cluster_current_epoch`	The local Current Epoch variable. This is used in order to create unique increasing version numbers during fail overs.	int	-
`cluster_known_nodes`	The total number of known nodes in the cluster, including nodes in HANDSHAKE state that may not currently be proper members of the cluster.	int	count
`cluster_my_epoch`	The Config Epoch of the node we are talking with. This is the current configuration version assigned to this node.	int	-
`cluster_size`	The number of master nodes serving at least one hash slot in the cluster.	int	count
`cluster_slots_assigned`	Number of slots which are associated to some node (not unbound). This number should be 16384 for the node to work properly, which means that each hash slot should be mapped to a node.	int	count
`cluster_slots_fail`	Number of hash slots mapping to a node in FAIL state. If this number is not zero the node is not able to serve queries unless cluster-require-full-coverage is set to no in the configuration.	int	count
`cluster_slots_ok`	Number of hash slots mapping to a node not in FAIL or PFAIL state.	int	count
`cluster_slots_pfail`	Number of hash slots mapping to a node in PFAIL state. Note that those hash slots still work correctly, as long as the PFAIL state is not promoted to FAIL by the failure detection algorithm. PFAIL only means that we are currently not able to talk with the node, but may be just a transient error.	int	count
`cluster_state`	State is ok if the node is able to receive queries. fail if there is at least one hash slot which is unbound (no node associated), in error state (node serving it is flagged with FAIL flag), or if the majority of masters can't be reached by this node.	int	-
`cluster_stats_messages_received`	Number of messages received via the cluster node-to-node binary bus.	int	count
`cluster_stats_messages_sent`	Number of messages sent via the cluster node-to-node binary bus.	int	count

`redis_command_stat`¶

标签

标签名	描述
`method`	Command type
`server`	Server addr

字段列表

指标	描述	数据类型	单位
`calls`	The number of calls that reached command execution	int	count
`usec`	The total CPU time consumed by these commands	int	μs
`usec_per_call`	The average CPU consumed per command execution	float	μs

`redis_db`¶

标签

标签名	描述
`db`	db name

字段列表

指标	描述	数据类型	单位
`avg_ttl`	avg ttl	int	-
`expires`	过期时间	int	-
`keys`	key	int	-

`redis_info`¶

标签

标签名	描述
`server`	Server addr

字段列表

指标	描述	数据类型	单位
`active_defrag_hits`	Number of value reallocations performed by active the defragmentation process	int	count
`active_defrag_key_hits`	Number of keys that were actively defragmented	int	count
`active_defrag_key_misses`	Number of keys that were skipped by the active defragmentation process	int	count
`active_defrag_misses`	Number of aborted value reallocations started by the active defragmentation process	int	count
`active_defrag_running`	Flag indicating if active defragmentation is active	bool	count
`aof_buffer_length`	Size of the AOF buffer	float	B
`aof_current_size`	AOF current file size	float	B
`aof_last_rewrite_time_sec`	Duration of the last AOF rewrite operation in seconds	int	count
`aof_rewrite_in_progress`	Flag indicating a AOF rewrite operation is on-going	bool	count
`blocked_clients`	Number of clients pending on a blocking call (BLPOP, BRPOP, BRPOPLPUSH, BLMOVE, BZPOPMIN, BZPOPMAX)	int	count
`client_biggest_input_buf`	Biggest input buffer among current client connections	int	B
`client_longest_output_list`	Longest output list among current client connections	int	count
`connected_clients`	Number of client connections (excluding connections from replicas)	int	count
`connected_slaves`	Number of connected replicas	int	count
`evicted_keys`	Number of evicted keys due to maxmemory limit	int	count
`expired_keys`	Total number of key expiration events	int	count
`info_latency_ms`	The latency of the redis INFO command.	float	ms
`keyspace_hits`	Number of successful lookup of keys in the main dictionary	int	count
`keyspace_misses`	Number of failed lookup of keys in the main dictionary	int	count
`latest_fork_usec`	Duration of the latest fork operation in microseconds	int	ms
`loading_eta_seconds`	ETA in seconds for the load to be complete	int	s
`loading_loaded_bytes`	Number of bytes already loaded	float	B
`loading_loaded_perc`	Same value expressed as a percentage	float	percent
`loading_total_bytes`	Total file size	float	B
`master_last_io_seconds_ago`	Number of seconds since the last interaction with master	int	s
`master_repl_offset`	The server's current replication offset	int	count
`master_sync_in_progress`	Indicate the master is syncing to the replica	bool	-
`master_sync_left_bytes`	Number of bytes left before syncing is complete (may be negative when master_sync_total_bytes is 0)	float	B
`maxmemory`	The value of the maxmemory configuration directive	float	B
`mem_fragmentation_ratio`	Ratio between used_memory_rss and used_memory	float	percent
`pubsub_channels`	Global number of pub/sub channels with client subscriptions	int	count
`pubsub_patterns`	Global number of pub/sub pattern with client subscriptions	int	count
`rdb_bgsave_in_progress`	Flag indicating a RDB save is on-going	bool	-
`rdb_changes_since_last_save`	Refers to the number of operations that produced some kind of changes in the dataset since the last time either SAVE or BGSAVE was called.	int	count
`rdb_last_bgsave_time_sec`	Duration of the last RDB save operation in seconds	int	s
`redis_version`	Version of the Redis server	string	-
`rejected_connections`	Number of connections rejected because of maxclients limit	int	count
`repl_backlog_histlen`	Size in bytes of the data in the replication backlog buffer	float	B
`slave_repl_offset`	The replication offset of the replica instance	int	count
`total_net_input_bytes`	The total number of bytes read from the network	int	count
`total_net_output_bytes`	The total number of bytes written to the network	int	count
`used_cpu_sys`	System CPU consumed by the Redis server, which is the sum of system CPU consumed by all threads of the server process (main thread and background threads)	float	percent
`used_cpu_sys_children`	System CPU consumed by the background processes	float	percent
`used_cpu_user`	User CPU consumed by the Redis server, which is the sum of user CPU consumed by all threads of the server process (main thread and background threads)	float	percent
`used_cpu_user_children`	User CPU consumed by the background processes	float	percent
`used_memory`	Total number of bytes allocated by Redis using its allocator (either standard libc, jemalloc, or an alternative allocator such as tcmalloc)	float	B
`used_memory_lua`	Number of bytes used by the Lua engine	float	B
`used_memory_overhead`	The sum in bytes of all overheads that the server allocated for managing its internal data structures	float	B
`used_memory_peak`	Peak memory consumed by Redis (in bytes)	float	B
`used_memory_rss`	Number of bytes that Redis allocated as seen by the operating system (a.k.a resident set size)	float	B
`used_memory_startup`	Initial amount of memory consumed by Redis at startup in bytes	float	B

日志¶

Version-1.4.6

`redis_latency`¶

标签

标签名	描述
`server`	Server addr

字段列表

指标	描述	数据类型	单位
`cost_time`	Latest event latency in millisecond.	int	ms
`event_name`	Event name.	string	-
`max_cost_time`	All-time maximum latency for this event.	int	ms
`occur_time`	Unix timestamp of the latest latency spike for the event.	int	sec

`redis_slowlog`¶

Redis 慢查询命令历史，这里我们将其以日志的形式采集

标签

标签名	描述
`host`	host
`message`	log message
`server`	server

字段列表

指标	描述	数据类型	单位
`command`	slow command	int	μs
`slowlog_id`	slowlog unique id	int	-
`slowlog_micros`	cost time	int	μs

日志采集¶

需要采集 Redis 日志，需要开启 Redis redis.config中日志文件输出配置：

[inputs.redis.log]
    # 日志路径需要填入绝对路径
    files = ["/var/log/redis/*.log"]

Attention

在配置日志采集时，需要将 DataKit 安装在 Redis 服务同一台主机中，或使用其它方式将日志挂载到 DataKit 所在机器。

在 K8s 中，可以将 Redis 日志暴露到 stdout，DataKit 能自动找到其对应的日志。

Pipeline 日志切割¶

原始日志为

122:M 14 May 2019 19:11:40.164 * Background saving terminated with success

切割后的字段列表如下：

字段名	字段值	说明
`pid`	`122`	进程id
`role`	`M`	角色
`serverity`	`*`	服务
`statu`	`notice`	日志级别
`msg`	`Background saving terminated with success`	日志内容
`time`	`1557861100164000000`	纳秒时间戳（作为行协议时间）

Redis¶

前置条件¶

配置¶

指标集¶

指标¶

redis_bigkey¶

redis_client¶

redis_cluster¶

redis_command_stat¶

redis_db¶

redis_info¶

日志¶

redis_latency¶

redis_slowlog¶