Redis¶
Redis 指标采集器,采集以下数据:
- 开启 AOF 数据持久化,会收集相关指标
- RDB 数据持久化指标
- Slowlog 监控指标
- bigkey scan 监控
- 主从replication
前置条件¶
- Redis 版本 v5.0+
在采集主从架构下数据时,请配置从节点的主机信息进行数据采集,可以得到主从相关的指标信息。
创建监控用户
redis6.0+ 进入redis-cli命令行,创建用户并且授权
配置¶
进入 DataKit 安装目录下的 conf.d/db
目录,复制 redis.conf.sample
并命名为 redis.conf
。示例如下:
[[inputs.redis]]
host = "localhost"
port = 6379
# unix_socket_path = "/var/run/redis/redis.sock"
# 配置多个db,配置了dbs,db也会放入采集列表。dbs=[]或者不配置则会采集redis中所有非空的db
# dbs=[]
# username = "<USERNAME>"
# password = "<PASSWORD>"
## @param connect_timeout - number - optional - default: 10s
# connect_timeout = "10s"
## @param service - string - optional
# service = "<SERVICE>"
## @param interval - number - optional - default: 15
interval = "15s"
## @param keys - list of strings - optional
## The length is 1 for strings.
## The length is zero for keys that have a type other than list, set, hash, or sorted set.
#
# keys = ["KEY_1", "KEY_PATTERN"]
## @param warn_on_missing_keys - boolean - optional - default: true
## If you provide a list of 'keys', set this to true to have the Agent log a warning
## when keys are missing.
#
# warn_on_missing_keys = true
## @param slow_log - boolean - optional - default: false
slow_log = true
## @param slowlog-max-len - integer - optional - default: 128
slowlog-max-len = 128
## @param command_stats - boolean - optional - default: false
## Collect INFO COMMANDSTATS output as metrics.
# command_stats = false
## Set true to enable election
election = true
# [inputs.redis.log]
# #required, glob logfiles
# files = ["/var/log/redis/*.log"]
## glob filteer
#ignore = [""]
## grok pipeline script path
#pipeline = "redis.p"
## optional encodings:
## "utf-8", "utf-16le", "utf-16le", "gbk", "gb18030" or ""
#character_encoding = ""
## The pattern should be a regexp. Note the use of '''this regexp'''
## regexp link: https://golang.org/pkg/regexp/syntax/#hdr-Syntax
#match = '''^\S.*'''
[inputs.redis.tags]
# some_tag = "some_value"
# more_tag = "some_other_value"
配置好后,重启 DataKit 即可。
目前可以通过 ConfigMap 方式注入采集器配置来开启采集器。
Attention
如果是阿里云 Redis,且设置了对应的用户名密码,conf 中的 <PASSWORD>
应该设置成 your-user:your-password
,如 datakit:Pa55W0rd
指标集¶
以下所有数据采集,默认会追加名为 host
的全局 tag(tag 值为 DataKit 所在主机名),也可以在配置中通过 [inputs.redis.tags]
指定其它标签:
指标¶
redis_bigkey
¶
- 标签
标签名 | 描述 |
---|---|
db_name |
db |
key |
monitor key |
server |
Server addr |
- 字段列表
指标 | 描述 | 数据类型 | 单位 |
---|---|---|---|
value_length |
Key length | int | - |
redis_client
¶
- 标签
标签名 | 描述 |
---|---|
name |
The name set by the client with CLIENT SETNAME, default unknown |
server |
Server addr |
- 字段列表
指标 | 描述 | 数据类型 | 单位 |
---|---|---|---|
addr |
Address/port of the client | string | - |
age |
Total duration of the connection in seconds | int | count |
fd |
File descriptor corresponding to the socket | int | count |
id |
AN unique 64-bit client ID | string | - |
idle |
Idle time of the connection in seconds | int | count |
psub |
Number of pattern matching subscriptions | int | count |
sub |
Number of channel subscriptions | int | count |
redis_cluster
¶
- 标签
标签名 | 描述 |
---|---|
server |
Server addr |
- 字段列表
指标 | 描述 | 数据类型 | 单位 |
---|---|---|---|
cluster_current_epoch |
The local Current Epoch variable. This is used in order to create unique increasing version numbers during fail overs. | int | - |
cluster_known_nodes |
The total number of known nodes in the cluster, including nodes in HANDSHAKE state that may not currently be proper members of the cluster. | int | count |
cluster_my_epoch |
The Config Epoch of the node we are talking with. This is the current configuration version assigned to this node. | int | - |
cluster_size |
The number of master nodes serving at least one hash slot in the cluster. | int | count |
cluster_slots_assigned |
Number of slots which are associated to some node (not unbound). This number should be 16384 for the node to work properly, which means that each hash slot should be mapped to a node. | int | count |
cluster_slots_fail |
Number of hash slots mapping to a node in FAIL state. If this number is not zero the node is not able to serve queries unless cluster-require-full-coverage is set to no in the configuration. | int | count |
cluster_slots_ok |
Number of hash slots mapping to a node not in FAIL or PFAIL state. | int | count |
cluster_slots_pfail |
Number of hash slots mapping to a node in PFAIL state. Note that those hash slots still work correctly, as long as the PFAIL state is not promoted to FAIL by the failure detection algorithm. PFAIL only means that we are currently not able to talk with the node, but may be just a transient error. | int | count |
cluster_state |
State is ok if the node is able to receive queries. fail if there is at least one hash slot which is unbound (no node associated), in error state (node serving it is flagged with FAIL flag), or if the majority of masters can't be reached by this node. | int | - |
cluster_stats_messages_received |
Number of messages received via the cluster node-to-node binary bus. | int | count |
cluster_stats_messages_sent |
Number of messages sent via the cluster node-to-node binary bus. | int | count |
redis_command_stat
¶
- 标签
标签名 | 描述 |
---|---|
method |
Command type |
server |
Server addr |
- 字段列表
指标 | 描述 | 数据类型 | 单位 |
---|---|---|---|
calls |
The number of calls that reached command execution | int | count |
usec |
The total CPU time consumed by these commands | int | μs |
usec_per_call |
The average CPU consumed per command execution | float | μs |
redis_db
¶
- 标签
标签名 | 描述 |
---|---|
db |
db name |
- 字段列表
指标 | 描述 | 数据类型 | 单位 |
---|---|---|---|
avg_ttl |
avg ttl | int | - |
expires |
过期时间 | int | - |
keys |
key | int | - |
redis_info
¶
- 标签
标签名 | 描述 |
---|---|
server |
Server addr |
- 字段列表
指标 | 描述 | 数据类型 | 单位 |
---|---|---|---|
active_defrag_hits |
Number of value reallocations performed by active the defragmentation process | int | count |
active_defrag_key_hits |
Number of keys that were actively defragmented | int | count |
active_defrag_key_misses |
Number of keys that were skipped by the active defragmentation process | int | count |
active_defrag_misses |
Number of aborted value reallocations started by the active defragmentation process | int | count |
active_defrag_running |
Flag indicating if active defragmentation is active | bool | count |
aof_buffer_length |
Size of the AOF buffer | float | B |
aof_current_size |
AOF current file size | float | B |
aof_last_rewrite_time_sec |
Duration of the last AOF rewrite operation in seconds | int | count |
aof_rewrite_in_progress |
Flag indicating a AOF rewrite operation is on-going | bool | count |
blocked_clients |
Number of clients pending on a blocking call (BLPOP, BRPOP, BRPOPLPUSH, BLMOVE, BZPOPMIN, BZPOPMAX) | int | count |
client_biggest_input_buf |
Biggest input buffer among current client connections | int | B |
client_longest_output_list |
Longest output list among current client connections | int | count |
connected_clients |
Number of client connections (excluding connections from replicas) | int | count |
connected_slaves |
Number of connected replicas | int | count |
evicted_keys |
Number of evicted keys due to maxmemory limit | int | count |
expired_keys |
Total number of key expiration events | int | count |
info_latency_ms |
The latency of the redis INFO command. | float | ms |
keyspace_hits |
Number of successful lookup of keys in the main dictionary | int | count |
keyspace_misses |
Number of failed lookup of keys in the main dictionary | int | count |
latest_fork_usec |
Duration of the latest fork operation in microseconds | int | ms |
loading_eta_seconds |
ETA in seconds for the load to be complete | int | s |
loading_loaded_bytes |
Number of bytes already loaded | float | B |
loading_loaded_perc |
Same value expressed as a percentage | float | percent |
loading_total_bytes |
Total file size | float | B |
master_last_io_seconds_ago |
Number of seconds since the last interaction with master | int | s |
master_repl_offset |
The server's current replication offset | int | count |
master_sync_in_progress |
Indicate the master is syncing to the replica | bool | - |
master_sync_left_bytes |
Number of bytes left before syncing is complete (may be negative when master_sync_total_bytes is 0) | float | B |
maxmemory |
The value of the maxmemory configuration directive | float | B |
mem_fragmentation_ratio |
Ratio between used_memory_rss and used_memory | float | percent |
pubsub_channels |
Global number of pub/sub channels with client subscriptions | int | count |
pubsub_patterns |
Global number of pub/sub pattern with client subscriptions | int | count |
rdb_bgsave_in_progress |
Flag indicating a RDB save is on-going | bool | - |
rdb_changes_since_last_save |
Refers to the number of operations that produced some kind of changes in the dataset since the last time either SAVE or BGSAVE was called. | int | count |
rdb_last_bgsave_time_sec |
Duration of the last RDB save operation in seconds | int | s |
redis_version |
Version of the Redis server | string | - |
rejected_connections |
Number of connections rejected because of maxclients limit | int | count |
repl_backlog_histlen |
Size in bytes of the data in the replication backlog buffer | float | B |
slave_repl_offset |
The replication offset of the replica instance | int | count |
total_net_input_bytes |
The total number of bytes read from the network | int | count |
total_net_output_bytes |
The total number of bytes written to the network | int | count |
used_cpu_sys |
System CPU consumed by the Redis server, which is the sum of system CPU consumed by all threads of the server process (main thread and background threads) | float | percent |
used_cpu_sys_children |
System CPU consumed by the background processes | float | percent |
used_cpu_user |
User CPU consumed by the Redis server, which is the sum of user CPU consumed by all threads of the server process (main thread and background threads) | float | percent |
used_cpu_user_children |
User CPU consumed by the background processes | float | percent |
used_memory |
Total number of bytes allocated by Redis using its allocator (either standard libc, jemalloc, or an alternative allocator such as tcmalloc) | float | B |
used_memory_lua |
Number of bytes used by the Lua engine | float | B |
used_memory_overhead |
The sum in bytes of all overheads that the server allocated for managing its internal data structures | float | B |
used_memory_peak |
Peak memory consumed by Redis (in bytes) | float | B |
used_memory_rss |
Number of bytes that Redis allocated as seen by the operating system (a.k.a resident set size) | float | B |
used_memory_startup |
Initial amount of memory consumed by Redis at startup in bytes | float | B |
日志¶
redis_latency
¶
- 标签
标签名 | 描述 |
---|---|
server |
Server addr |
- 字段列表
指标 | 描述 | 数据类型 | 单位 |
---|---|---|---|
cost_time |
Latest event latency in millisecond. | int | ms |
event_name |
Event name. | string | - |
max_cost_time |
All-time maximum latency for this event. | int | ms |
occur_time |
Unix timestamp of the latest latency spike for the event. | int | sec |
redis_slowlog
¶
Redis 慢查询命令历史,这里我们将其以日志的形式采集
- 标签
标签名 | 描述 |
---|---|
host |
host |
message |
log message |
server |
server |
- 字段列表
指标 | 描述 | 数据类型 | 单位 |
---|---|---|---|
command |
slow command | int | μs |
slowlog_id |
slowlog unique id | int | - |
slowlog_micros |
cost time | int | μs |
日志采集¶
需要采集 Redis 日志,需要开启 Redis redis.config
中日志文件输出配置:
Attention
在配置日志采集时,需要将 DataKit 安装在 Redis 服务同一台主机中,或使用其它方式将日志挂载到 DataKit 所在机器。
在 K8s 中,可以将 Redis 日志暴露到 stdout,DataKit 能自动找到其对应的日志。
Pipeline 日志切割¶
原始日志为
切割后的字段列表如下:
字段名 | 字段值 | 说明 |
---|---|---|
pid |
122 |
进程id |
role |
M |
角色 |
serverity |
* |
服务 |
statu |
notice |
日志级别 |
msg |
Background saving terminated with success |
日志内容 |
time |
1557861100164000000 |
纳秒时间戳(作为行协议时间) |