Stack Over Cloud in Blog On septiembre 28, 2019, 8:05 am

How To Analyze Managed Redis Database Statistics Using the Elastic Stack on Ubuntu 18.04

The author selected the Free and Open Source Fund to receive a donation as part of the Write for DOnations program.

Introduction

Database monitoring is the continuous process of systematically tracking various metrics that show how the database is performing. By observing performance data, you can gain valuable insights and identify possible bottlenecks, as well as find additional ways of improving database performance. Such systems often implement alerting that notifies administrators when things go wrong. Gathered statistics can be used to not only improve the configuration and workflow of the database, but also those of client applications.

The benefit of using the Elastic Stack (ELK stack) for monitoring your managed database is its excellent support for searching and the ability to ingest new data very quickly. It does not excel at updating the data, but this trade-off is acceptable for monitoring and logging purposes, where past data is almost never changed. Elasticsearch offers a powerful means of querying the data, which you can use through Kibana to get a better understanding of how the database fares through different time periods. This will allow you to correlate database load with real-life events to gain insight into how the database is being used.

In this tutorial, you’ll import database metrics, generated by the Redis INFO command, into Elasticsearch via Logstash. This entails configuring Logstash to periodically run the command, parse its output and send it to Elasticsearch for indexing immediately afterward. The imported data can later be analyzed and visualized in Kibana. By the end of the tutorial, you’ll have an automated system pulling in Redis statistics for later analysis.

Prerequisites

An Ubuntu 18.04 server with at least 4 GB RAM, root privileges, and a secondary, non-root account. You can set this up by following this initial server setup guide. For this tutorial the non-root user is sammy.
Java 8 installed on your server. For installation instructions, visit How To Install Java with apt on Ubuntu 18.04.
Nginx installed on your server. For a guide on how to do that, see How To Install Nginx on Ubuntu 18.04.
Elasticsearch and Kibana installed on your server. Complete the first two steps of the How To Install Elasticsearch, Logstash, and Kibana (Elastic Stack) on Ubuntu 18.04 tutorial.
A Redis managed database provisioned from DigitalOcean with connection information available. Make sure that your server’s IP address is on the whitelist. To learn more about DigitalOcean Managed Databases, visit the product docs.
Redli installed on your server according to the How To Connect to a Managed Database on Ubuntu 18.04 tutorial.

Step 1 — Installing and Configuring Logstash

In this section, you will install Logstash and configure it to pull statistics from your Redis database cluster, then parse them to send to Elasticsearch for indexing.

Start off by installing Logstash with the following command:


sudo apt install logstash -y

Once Logstash is installed, enable the service to automatically start on boot:


sudo systemctl enable logstash

Before configuring Logstash to pull the statistics, let’s see what the data itself looks like. To connect to your Redis database, head over to your Managed Database Control Panel, and under the Connection details panel, select Flags from the dropdown:

You’ll be shown a preconfigured command for the Redli client, which you’ll use to connect to your database. Click Copy and run the following command on your server, replacing redli_flags_command with the command you have just copied:


redli_flags_command info

Since the output from this command is long, we’ll explain this broken down into its different sections:

In the output of the Redis info command, sections are marked with #, which signifies a comment. The values are populated in the form of key:value, which makes them relatively easy to parse.


Output
# Server
redis_version:5.0.4
redis_git_sha1:ab60b2b1
redis_git_dirty:1
redis_build_id:7909f4de3561dc50
redis_mode:standalone
os:Linux 5.2.14-200.fc30.x86_64 x86_64
arch_bits:64
multiplexing_api:epoll
atomicvar_api:atomic-builtin
gcc_version:9.1.1
process_id:72
run_id:ddb7b96c93bbd0c369c6d06ce1c02c78902e13cc
tcp_port:25060
uptime_in_seconds:1733
uptime_in_days:0
hz:10
configured_hz:10
lru_clock:8687593
executable:/usr/bin/redis-server
config_file:/etc/redis.conf

# Clients
connected_clients:3
client_recent_max_input_buffer:2
client_recent_max_output_buffer:0
blocked_clients:0

. . .

The Server section contains technical information about the Redis build, such as its version and the Git commit it’s based on. While the Clients section provides the number of currently opened connections.


Output
. . .
# Memory
used_memory:941560
used_memory_human:919.49K
used_memory_rss:4931584
used_memory_rss_human:4.70M
used_memory_peak:941560
used_memory_peak_human:919.49K
used_memory_peak_perc:100.00%
used_memory_overhead:912190
used_memory_startup:795880
used_memory_dataset:29370
used_memory_dataset_perc:20.16%
allocator_allocated:949568
allocator_active:1269760
allocator_resident:3592192
total_system_memory:1030356992
total_system_memory_human:982.62M
used_memory_lua:37888
used_memory_lua_human:37.00K
used_memory_scripts:0
used_memory_scripts_human:0B
number_of_cached_scripts:0
maxmemory:463470592
maxmemory_human:442.00M
maxmemory_policy:allkeys-lru
allocator_frag_ratio:1.34
allocator_frag_bytes:320192
allocator_rss_ratio:2.83
allocator_rss_bytes:2322432
rss_overhead_ratio:1.37
rss_overhead_bytes:1339392
mem_fragmentation_ratio:5.89
mem_fragmentation_bytes:4093872
mem_not_counted_for_evict:0
mem_replication_backlog:0
mem_clients_slaves:0
mem_clients_normal:116310
mem_aof_buffer:0
mem_allocator:jemalloc-5.1.0
active_defrag_running:0
lazyfree_pending_objects:0
. . .

Here Memory confirms how much RAM Redis has allocated for itself, as well as the maximum amount of memory it can possibly use. If it starts running out of memory, it will free up keys using the strategy you specified in the Control Panel (shown in the maxmemory_policy field in this output).


Output
. . .
# Persistence
loading:0
rdb_changes_since_last_save:0
rdb_bgsave_in_progress:0
rdb_last_save_time:1568966978
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:0
rdb_current_bgsave_time_sec:-1
rdb_last_cow_size:217088
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok
aof_last_cow_size:0

# Stats
total_connections_received:213
total_commands_processed:2340
instantaneous_ops_per_sec:1
total_net_input_bytes:39205
total_net_output_bytes:776988
instantaneous_input_kbps:0.02
instantaneous_output_kbps:2.01
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:0
expired_stale_perc:0.00
expired_time_cap_reached_count:0
evicted_keys:0
keyspace_hits:0
keyspace_misses:0
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:353
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0
. . .

In the Persistence section, you can see the last time Redis saved the keys it stores to disk, and if it was successful. The Stats section provides numbers related to client and in-cluster connections, the number of times the requested key was (or wasn’t) found, and so on.


Output
. . .
# Replication
role:master
connected_slaves:0
master_replid:9c1d345a46d29d08537981c4fc44e312a21a160b
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:0
second_repl_offset:-1
repl_backlog_active:0
repl_backlog_size:46137344
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
. . .

Note: The Redis project uses the terms “master” and “slave” in its documentation and in various commands. DigitalOcean generally prefers the alternative terms “primary” and “replica.”
This guide will default to the terms “primary” and “replica” whenever possible, but note that there are a few instances where the terms “master” and “slave” unavoidably come up.

By looking at the role under Replication, you’ll know if you’re connected to a primary or replica node. The rest of the section provides the number of currently connected replicas and the amount of data that the replica is lacking in regards to the primary. There may be additional fields if the instance you are connected to is a replica.


Output
. . .
# CPU
used_cpu_sys:1.972003
used_cpu_user:1.765318
used_cpu_sys_children:0.000000
used_cpu_user_children:0.001707

# Cluster
cluster_enabled:0

# Keyspace

Under CPU, you’ll see the amount of system (used_cpu_sys) and user (used_cpu_user) CPU Redis is consuming at the moment. The Cluster section contains only one unique field, cluster_enabled, which serves to indicate that the Redis cluster is running.

Logstash will be tasked to periodically run the info command on your Redis database (similar to how you just did), parse the results, and send them to Elasticsearch. You’ll then be able to access them later from Kibana.

You’ll store the configuration for indexing Redis statistics in Elasticsearch in a file named redis.conf under the /etc/logstash/conf.d directory, where Logstash stores configuration files. When started as a service, it will automatically run them in the background.

Create redis.conf using your favorite editor (for example, nano):


sudo nano /etc/logstash/conf.d/redis.conf

Add the following lines:

/etc/logstash/conf.d/redis.conf

input {
    exec {
        command => "redis_flags_command info"
        interval => 10
        type => "redis_info"
    }
}

filter {
    kv {
        value_split => ":"
        field_split => "rn"
        remove_field => [ "command", "message" ]
    }

    ruby {
        code =>
        "
        event.to_hash.keys.each { |k|
            if event.get(k).to_i.to_s == event.get(k) # is integer?
                event.set(k, event.get(k).to_i) # convert to integer
            end
            if event.get(k).to_f.to_s == event.get(k) # is float?
                event.set(k, event.get(k).to_f) # convert to float
            end
        }
        puts 'Ruby filter finished'
        "
    }
}

output {
    elasticsearch {
        hosts => "http://localhost:9200"
        index => "%{type}"
    }
}

Remember to replace redis_flags_command with the command shown in the control panel that you used earlier in the step.

You define an input, which is a set of filters that will run on the collected data, and an output that will send the filtered data to Elasticsearch. The input consists of the exec command, which will run a command on the server periodically, after a set time interval (expressed in seconds). It also specifies a type parameter that defines the document type when indexed in Elasticsearch. The exec block passes down an object containing two fields, command and message string. The command field will contain the command that was run, and the message will contain its output.

There are two filters that will run sequentially on the data collected from the input. The kv filter stands for key-value filter, and is built-in to Logstash. It is used for parsing data in the general form of keyvalue_separatorvalue and provides parameters for specifying what are considered a value and field separators. The field separator pertains to strings that separate the data formatted in the general form from each other. In the case of the output of the Redis INFO command, the field separator (field_split) is a new line, and the value separator (value_split) is :. Lines that do not follow the defined form will be discarded, including comments.

To configure the kv filter, you pass : to thevalue_split parameter, and rn (signifying a new line) to the field_split parameter. You also order it to remove the command and message fields from the current data object by passing them to remove_field as elements of an array, because they contain data that are now useless.

The kv filter represents the value it parsed as a string (text) type by design. This raises an issue because Kibana can’t easily process string types, even if it’s actually a number. To solve this, you’ll use custom Ruby code to convert the number-only strings to numbers, where possible. The second filter is a ruby block that provides a code parameter accepting a string containing the code to be run.

event is a variable that Logstash provides to your code, and contains the current data in the filter pipeline. As was noted before, filters run one after another, meaning that the Ruby filter will receive the parsed data from the kv filter. The Ruby code itself converts the event to a Hash and traverses through the keys, then checks if the value associated with the key could be represented as an integer or as a float (a number with decimals). If it can, the string value is replaced with the parsed number. When the loop finishes, it prints out a message (Ruby filter finished) to report progress.

The output sends the processed data to Elasticsearch for indexing. The resulting document will be stored in the redis_info index, defined in the input and passed in as a parameter to the output block.

Save and close the file.

You’ve installed Logstash using apt and configured it to periodically request statistics from Redis, process them, and send them to your Elasticsearch instance.

Step 2 — Testing the Logstash Configuration

Now you’ll test the configuration by running Logstash to verify it will properly pull the data.

Logstash supports running a specific configuration by passing its file path to the -f parameter. Run the following command to test your new configuration from the last step:


sudo /usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/redis.conf

It may take some time to show the output, but you’ll soon see something similar to the following:


Output
WARNING: Could not find logstash.yml which is typically located in $LS_HOME/config or /etc/logstash. You can specify the path using --path.settings. Continuing using the defaults
Could not find log4j2 configuration at path /usr/share/logstash/config/log4j2.properties. Using default config which logs errors to the console
[WARN ] 2019-09-20 11:59:53.440 [LogStash::Runner] multilocal - Ignoring the 'pipelines.yml' file because modules or command line options are specified
[INFO ] 2019-09-20 11:59:53.459 [LogStash::Runner] runner - Starting Logstash {"logstash.version"=>"6.8.3"}
[INFO ] 2019-09-20 12:00:02.543 [Converge PipelineAction::Create] pipeline - Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>2, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50}
[INFO ] 2019-09-20 12:00:03.331 [[main]-pipeline-manager] elasticsearch - Elasticsearch pool URLs updated {:changes=>{:removed=>[], :added=>[http://localhost:9200/]}}
[WARN ] 2019-09-20 12:00:03.727 [[main]-pipeline-manager] elasticsearch - Restored connection to ES instance {:url=>"http://localhost:9200/"}
[INFO ] 2019-09-20 12:00:04.015 [[main]-pipeline-manager] elasticsearch - ES Output version determined {:es_version=>6}
[WARN ] 2019-09-20 12:00:04.020 [[main]-pipeline-manager] elasticsearch - Detected a 6.x and above cluster: the `type` event field won't be used to determine the document _type {:es_version=>6}
[INFO ] 2019-09-20 12:00:04.071 [[main]-pipeline-manager] elasticsearch - New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["http://localhost:9200"]}
[INFO ] 2019-09-20 12:00:04.100 [Ruby-0-Thread-5: :1] elasticsearch - Using default mapping template
[INFO ] 2019-09-20 12:00:04.146 [Ruby-0-Thread-5: :1] elasticsearch - Attempting to install template {:manage_template=>{"template"=>"logstash-*", "version"=>60001, "settings"=>{"index.refresh_interval"=>"5s"}, "mappings"=>{"_default_"=>{"dynamic_templates"=>[{"message_field"=>{"path_match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false}}}, {"string_fields"=>{"match"=>"*", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false, "fields"=>{"keyword"=>{"type"=>"keyword", "ignore_above"=>256}}}}}], "properties"=>{"@timestamp"=>{"type"=>"date"}, "@version"=>{"type"=>"keyword"}, "geoip"=>{"dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip"}, "location"=>{"type"=>"geo_point"}, "latitude"=>{"type"=>"half_float"}, "longitude"=>{"type"=>"half_float"}}}}}}}}
[INFO ] 2019-09-20 12:00:04.295 [[main]-pipeline-manager] exec - Registering Exec Input {:type=>"redis_info", :command=>"...", :interval=>10, :schedule=>nil}
[INFO ] 2019-09-20 12:00:04.315 [Converge PipelineAction::Create] pipeline - Pipeline started successfully {:pipeline_id=>"main", :thread=>"#>>>>

Next Read: AWS IQ – Get Help from AWS Certified Third Party Experts on Demand »