How To Analyze Managed Redis Database Statistics Using the Elastic Stack on Ubuntu 18.04

The author selected the Free and Open Source Fund to receive a donation as part of the Write for DOnations program.

Introduction

Database monitoring is the continuous process of systematically tracking various metrics that show how the database is performing. By observing performance data, you can gain valuable insights and identify possible bottlenecks, as well as find additional ways of improving database performance. Such systems often implement alerting that notifies administrators when things go wrong. Gathered statistics can be used to not only improve the configuration and workflow of the database, but also those of client applications.

The benefit of using the Elastic Stack (ELK stack) for monitoring your managed database is its excellent support for searching and the ability to ingest new data very quickly. It does not excel at updating the data, but this trade-off is acceptable for monitoring and logging purposes, where past data is almost never changed. Elasticsearch offers a powerful means of querying the data, which you can use through Kibana to get a better understanding of how the database fares through different time periods. This will allow you to correlate database load with real-life events to gain insight into how the database is being used.

In this tutorial, you’ll import database metrics, generated by the Redis INFO command, into Elasticsearch via Logstash. This entails configuring Logstash to periodically run the command, parse its output and send it to Elasticsearch for indexing immediately afterward. The imported data can later be analyzed and visualized in Kibana. By the end of the tutorial, you’ll have an automated system pulling in Redis statistics for later analysis.

Prerequisites

An Ubuntu 18.04 server with at least 4 GB RAM, root privileges, and a secondary, non-root account. You can set this up by following this initial server setup guide. For this tutorial the non-root user is sammy.
Java 8 installed on your server. For installation instructions, visit How To Install Java with apt on Ubuntu 18.04.
Nginx installed on your server. For a guide on how to do that, see How To Install Nginx on Ubuntu 18.04.
Elasticsearch and Kibana installed on your server. Complete the first two steps of the How To Install Elasticsearch, Logstash, and Kibana (Elastic Stack) on Ubuntu 18.04 tutorial.
A Redis managed database provisioned from DigitalOcean with connection information available. Make sure that your server’s IP address is on the whitelist. To learn more about DigitalOcean Managed Databases, visit the product docs.
Redli installed on your server according to the How To Connect to a Managed Database on Ubuntu 18.04 tutorial.

Step 1 — Installing and Configuring Logstash

In this section, you will install Logstash and configure it to pull statistics from your Redis database cluster, then parse them to send to Elasticsearch for indexing.

Start off by installing Logstash with the following command:



 	sudo apt install logstash -y

Once Logstash is installed, enable the service to automatically start on boot:



 	sudo systemctl enable logstash

Before configuring Logstash to pull the statistics, let’s see what the data itself looks like. To connect to your Redis database, head over to your Managed Database Control Panel, and under the Connection details panel, select Flags from the dropdown:

You’ll be shown a preconfigured command for the Redli client, which you’ll use to connect to your database. Click Copy and run the following command on your server, replacing redli_flags_command with the command you have just copied:



 	redli_flags_command info

Since the output from this command is long, we’ll explain this broken down into its different sections:

In the output of the Redis info command, sections are marked with #, which signifies a comment. The values are populated in the form of key:value, which makes them relatively easy to parse.


Output
# Server
redis_version:5.0.4
redis_git_sha1:ab60b2b1
redis_git_dirty:1
redis_build_id:7909f4de3561dc50
redis_mode:standalone
os:Linux 5.2.14-200.fc30.x86_64 x86_64
arch_bits:64
multiplexing_api:epoll
atomicvar_api:atomic-builtin
gcc_version:9.1.1
process_id:72
run_id:ddb7b96c93bbd0c369c6d06ce1c02c78902e13cc
tcp_port:25060
uptime_in_seconds:1733
uptime_in_days:0
hz:10
configured_hz:10
lru_clock:8687593
executable:/usr/bin/redis-server
config_file:/etc/redis.conf

# Clients
connected_clients:3
client_recent_max_input_buffer:2
client_recent_max_output_buffer:0
blocked_clients:0

. . .

The Server section contains technical information about the Redis build, such as its version and the Git commit it’s based on. While the Clients section provides the number of currently opened connections.


Output
. . .
# Memory
used_memory:941560
used_memory_human:919.49K
used_memory_rss:4931584
used_memory_rss_human:4.70M
used_memory_peak:941560
used_memory_peak_human:919.49K
used_memory_peak_perc:100.00%
used_memory_overhead:912190
used_memory_startup:795880
used_memory_dataset:29370
used_memory_dataset_perc:20.16%
allocator_allocated:949568
allocator_active:1269760
allocator_resident:3592192
total_system_memory:1030356992
total_system_memory_human:982.62M
used_memory_lua:37888
used_memory_lua_human:37.00K
used_memory_scripts:0
used_memory_scripts_human:0B
number_of_cached_scripts:0
maxmemory:463470592
maxmemory_human:442.00M
maxmemory_policy:allkeys-lru
allocator_frag_ratio:1.34
allocator_frag_bytes:320192
allocator_rss_ratio:2.83
allocator_rss_bytes:2322432
rss_overhead_ratio:1.37
rss_overhead_bytes:1339392
mem_fragmentation_ratio:5.89
mem_fragmentation_bytes:4093872
mem_not_counted_for_evict:0
mem_replication_backlog:0
mem_clients_slaves:0
mem_clients_normal:116310
mem_aof_buffer:0
mem_allocator:jemalloc-5.1.0
active_defrag_running:0
lazyfree_pending_objects:0
. . .

Here Memory confirms how much RAM Redis has allocated for itself, as well as the maximum amount of memory it can possibly use. If it starts running out of memory, it will free up keys using the strategy you specified in the Control Panel (shown in the maxmemory_policy field in this output).


Output
. . .
# Persistence
loading:0
rdb_changes_since_last_save:0
rdb_bgsave_in_progress:0
rdb_last_save_time:1568966978
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:0
rdb_current_bgsave_time_sec:-1
rdb_last_cow_size:217088
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok
aof_last_cow_size:0

# Stats
total_connections_received:213
total_commands_processed:2340
instantaneous_ops_per_sec:1
total_net_input_bytes:39205
total_net_output_bytes:776988
instantaneous_input_kbps:0.02
instantaneous_output_kbps:2.01
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:0
expired_stale_perc:0.00
expired_time_cap_reached_count:0
evicted_keys:0
keyspace_hits:0
keyspace_misses:0
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:353
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0
. . .

In the Persistence section, you can see the last time Redis saved the keys it stores to disk, and if it was successful. The Stats section provides numbers related to client and in-cluster connections, the number of times the requested key was (or wasn’t) found, and so on.


Output
. . .
# Replication
role:master
connected_slaves:0
master_replid:9c1d345a46d29d08537981c4fc44e312a21a160b
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:0
second_repl_offset:-1
repl_backlog_active:0
repl_backlog_size:46137344
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
. . .

Note: The Redis project uses the terms “master” and “slave” in its documentation and in various commands. DigitalOcean generally prefers the alternative terms “primary” and “replica.”
This guide will default to the terms “primary” and “replica” whenever possible, but note that there are a few instances where the terms “master” and “slave” unavoidably come up.

By looking at the role under Replication, you’ll know if you’re connected to a primary or replica node. The rest of the section provides the number of currently connected replicas and the amount of data that the replica is lacking in regards to the primary. There may be additional fields if the instance you are connected to is a replica.


Output
. . .
# CPU
used_cpu_sys:1.972003
used_cpu_user:1.765318
used_cpu_sys_children:0.000000
used_cpu_user_children:0.001707

# Cluster
cluster_enabled:0

# Keyspace

Under CPU, you’ll see the amount of system (used_cpu_sys) and user (used_cpu_user) CPU Redis is consuming at the moment. The Cluster section contains only one unique field, cluster_enabled, which serves to indicate that the Redis cluster is running.

Logstash will be tasked to periodically run the info command on your Redis database (similar to how you just did), parse the results, and send them to Elasticsearch. You’ll then be able to access them later from Kibana.

You’ll store the configuration for indexing Redis statistics in Elasticsearch in a file named redis.conf under the /etc/logstash/conf.d directory, where Logstash stores configuration files. When started as a service, it will automatically run them in the background.

Create redis.conf using your favorite editor (for example, nano):



 	sudo nano /etc/logstash/conf.d/redis.conf

Add the following lines:

/etc/logstash/conf.d/redis.conf

input {
    exec {
        command => "redis_flags_command info"
        interval => 10
        type => "redis_info"
    }
}

filter {
    kv {
        value_split => ":"
        field_split => "rn"
        remove_field => [ "command", "message" ]
    }

    ruby {
        code =>
        "
        event.to_hash.keys.each { |k|
            if event.get(k).to_i.to_s == event.get(k) # is integer?
                event.set(k, event.get(k).to_i) # convert to integer
            end
            if event.get(k).to_f.to_s == event.get(k) # is float?
                event.set(k, event.get(k).to_f) # convert to float
            end
        }
        puts 'Ruby filter finished'
        "
    }
}

output {
    elasticsearch {
        hosts => "http://localhost:9200"
        index => "%{type}"
    }
}

Remember to replace redis_flags_command with the command shown in the control panel that you used earlier in the step.

You define an input, which is a set of filters that will run on the collected data, and an output that will send the filtered data to Elasticsearch. The input consists of the exec command, which will run a command on the server periodically, after a set time interval (expressed in seconds). It also specifies a type parameter that defines the document type when indexed in Elasticsearch. The exec block passes down an object containing two fields, command and message string. The command field will contain the command that was run, and the message will contain its output.

There are two filters that will run sequentially on the data collected from the input. The kv filter stands for key-value filter, and is built-in to Logstash. It is used for parsing data in the general form of keyvalue_separatorvalue and provides parameters for specifying what are considered a value and field separators. The field separator pertains to strings that separate the data formatted in the general form from each other. In the case of the output of the Redis INFO command, the field separator (field_split) is a new line, and the value separator (value_split) is :. Lines that do not follow the defined form will be discarded, including comments.

To configure the kv filter, you pass : to thevalue_split parameter, and rn (signifying a new line) to the field_split parameter. You also order it to remove the command and message fields from the current data object by passing them to remove_field as elements of an array, because they contain data that are now useless.

The kv filter represents the value it parsed as a string (text) type by design. This raises an issue because Kibana can’t easily process string types, even if it’s actually a number. To solve this, you’ll use custom Ruby code to convert the number-only strings to numbers, where possible. The second filter is a ruby block that provides a code parameter accepting a string containing the code to be run.

event is a variable that Logstash provides to your code, and contains the current data in the filter pipeline. As was noted before, filters run one after another, meaning that the Ruby filter will receive the parsed data from the kv filter. The Ruby code itself converts the event to a Hash and traverses through the keys, then checks if the value associated with the key could be represented as an integer or as a float (a number with decimals). If it can, the string value is replaced with the parsed number. When the loop finishes, it prints out a message (Ruby filter finished) to report progress.

The output sends the processed data to Elasticsearch for indexing. The resulting document will be stored in the redis_info index, defined in the input and passed in as a parameter to the output block.

Save and close the file.

You’ve installed Logstash using apt and configured it to periodically request statistics from Redis, process them, and send them to your Elasticsearch instance.

Step 2 — Testing the Logstash Configuration

Now you’ll test the configuration by running Logstash to verify it will properly pull the data.

Logstash supports running a specific configuration by passing its file path to the -f parameter. Run the following command to test your new configuration from the last step:



 	sudo /usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/redis.conf

It may take some time to show the output, but you’ll soon see something similar to the following:


Output
WARNING: Could not find logstash.yml which is typically located in $LS_HOME/config or /etc/logstash. You can specify the path using --path.settings. Continuing using the defaults
Could not find log4j2 configuration at path /usr/share/logstash/config/log4j2.properties. Using default config which logs errors to the console
[WARN ] 2019-09-20 11:59:53.440 [LogStash::Runner] multilocal - Ignoring the 'pipelines.yml' file because modules or command line options are specified
[INFO ] 2019-09-20 11:59:53.459 [LogStash::Runner] runner - Starting Logstash {"logstash.version"=>"6.8.3"}
[INFO ] 2019-09-20 12:00:02.543 [Converge PipelineAction::Create] pipeline - Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>2, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50}
[INFO ] 2019-09-20 12:00:03.331 [[main]-pipeline-manager] elasticsearch - Elasticsearch pool URLs updated {:changes=>{:removed=>[], :added=>[http://localhost:9200/]}}
[WARN ] 2019-09-20 12:00:03.727 [[main]-pipeline-manager] elasticsearch - Restored connection to ES instance {:url=>"http://localhost:9200/"}
[INFO ] 2019-09-20 12:00:04.015 [[main]-pipeline-manager] elasticsearch - ES Output version determined {:es_version=>6}
[WARN ] 2019-09-20 12:00:04.020 [[main]-pipeline-manager] elasticsearch - Detected a 6.x and above cluster: the `type` event field won't be used to determine the document _type {:es_version=>6}
[INFO ] 2019-09-20 12:00:04.071 [[main]-pipeline-manager] elasticsearch - New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["http://localhost:9200"]}
[INFO ] 2019-09-20 12:00:04.100 [Ruby-0-Thread-5: :1] elasticsearch - Using default mapping template
[INFO ] 2019-09-20 12:00:04.146 [Ruby-0-Thread-5: :1] elasticsearch - Attempting to install template {:manage_template=>{"template"=>"logstash-*", "version"=>60001, "settings"=>{"index.refresh_interval"=>"5s"}, "mappings"=>{"_default_"=>{"dynamic_templates"=>[{"message_field"=>{"path_match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false}}}, {"string_fields"=>{"match"=>"*", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false, "fields"=>{"keyword"=>{"type"=>"keyword", "ignore_above"=>256}}}}}], "properties"=>{"@timestamp"=>{"type"=>"date"}, "@version"=>{"type"=>"keyword"}, "geoip"=>{"dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip"}, "location"=>{"type"=>"geo_point"}, "latitude"=>{"type"=>"half_float"}, "longitude"=>{"type"=>"half_float"}}}}}}}}
[INFO ] 2019-09-20 12:00:04.295 [[main]-pipeline-manager] exec - Registering Exec Input {:type=>"redis_info", :command=>"...", :interval=>10, :schedule=>nil}
[INFO ] 2019-09-20 12:00:04.315 [Converge PipelineAction::Create] pipeline - Pipeline started successfully {:pipeline_id=>"main", :thread=>"#"}
[INFO ] 2019-09-20 12:00:04.483 [Ruby-0-Thread-1: /usr/share/logstash/lib/bootstrap/environment.rb:6] agent - Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
[INFO ] 2019-09-20 12:00:05.318 [Api Webserver] agent - Successfully started Logstash API endpoint {:port=>9600}
Ruby filter finished
Ruby filter finished
Ruby filter finished
...

You’ll see the Ruby filter finished message being printed at regular intervals (set to 10 seconds in the previous step), which means that the statistics are being shipped to Elasticsearch.

You can exit Logstash by clicking CTRL + C on your keyboard. As previously mentioned, Logstash will automatically run all config files found under /etc/logstash/conf.d in the background when started as a service. Run the following command to start it:



 	sudo systemctl start logstash

You’ve run Logstash to check if it can connect to your Redis cluster and gather data. Next, you’ll explore some of the statistical data in Kibana.

Step 3 — Exploring Imported Data in Kibana

In this section, you’ll explore and visualize the statistical data describing your database’s performance in Kibana.

In your web browser, navigate to your domain where you exposed Kibana as a part of the prerequisite. You’ll see the default welcome page:

Before exploring the data Logstash is sending to Elasticsearch, you’ll first need to add the redis_info index to Kibana. To do so, click on Management from the left-hand vertical sidebar, and then on Index Patterns under the Kibana section.

You’ll see a form for creating a new Index Pattern. Index Patterns in Kibana provide a way to pull in data from multiple Elasticsearch indexes at once, and can be used to explore only one index.

Beneath the Index pattern text field, you’ll see the redis_info index listed. Type it in the text field and then click on the Next step button.

You’ll then be asked to choose a timestamp field, so you’ll later be able to narrow your searches by a time range. Logstash automatically adds one, called @timestamp. Select it from the dropdown and click on Create index pattern to finish adding the index to Kibana.

To create and see existing visualizations, click on the Visualize item in the left-hand vertical menu. You’ll see the following page:

To create a new visualization, click on the Create a visualization button, then select Line from the list of types that will pop up. Then, select the redis_info* index pattern you have just created as the data source. You’ll see an empty visualization:

The left-side panel provides a form for editing parameters that Kibana will use to draw the visualization, which will be shown on the central part of the screen. On the upper-right hand side of the screen is the date range picker. If the @timestamp field is being used in the visualization, Kibana will only show the data belonging to the time interval specified in the range picker.

You’ll now visualize the average Redis memory usage during a specified time interval. Click on Y-Axis under Metrics in the panel on the left to unfold it, then select Average as the Aggregation and select used_memory as the Field. This will populate the Y axis of the plot with the average values.

Next, click on X-Axis under Buckets. For the Aggregation, choose Date Histogram. @timestamp should be automatically selected as the Field. Then, show the visualization by clicking on the blue play button on the top of the panel. If your database is brand new and not used you won’t see a very long line. In all cases, however, you will see an accurate portrayal of average memory usage. Here is how the resulting visualization may look after little to no usage:

In this step, you have visualized memory usage of your managed Redis database, using Kibana. You can also use other plot types Kibana offers, such as the Visual Builder, to create more complicated graphs that portray more than one field at the same time. This will allow you to gain a better understanding of how your database is being used, which will help you optimize client applications, as well as your database itself.

Conclusion

You now have the Elastic stack installed on your server and configured to pull statistics data from your managed Redis database on a regular basis. You can analyze and visualize the data using Kibana, or some other suitable software, which will help you gather valuable insights and real-world correlations into how your database is performing.

For more information about what you can do with your Redis Managed Database, visit the product docs. If you’d like to present the database statistics using another visualization type, check out the Kibana docs for further instructions.

Originally posted on DigitalOcean Community Tutorials
Author: DigitalOcean