How To Analyze Managed PostgreSQL Database Statistics Using the Elastic Stack on Ubuntu 18.04

The author selected the Free and Open Source Fund to receive a donation as part of the Write for DOnations program.

Introduction

Database monitoring is the continuous process of systematically tracking various metrics that show how the database is performing. By observing the performance data, you can gain valuable insights and identify possible bottlenecks, as well as find additional ways of improving database performance. Such systems often implement alerting, which notifies administrators when things go wrong. Gathered statistics can be used to not only improve the configuration and workflow of the database, but also those of client applications.

The benefit of using the Elastic Stack (ELK stack) for monitoring your managed database is its excellent support for searching and the ability to ingest new data very quickly. It does not excel at updating the data, but this trade off is acceptable for monitoring and logging purposes, where past data is almost never changed. Elasticsearch offers powerful means of querying the data, which you can use through Kibana to get a better understanding of how the database fares through different time periods. This will allow you to correlate database load with real-life events to gain insight into how the database is being used.

In this tutorial, you’ll import database metrics, generated by the PostgreSQL statistics collector, into Elasticsearch via Logstash. This entails configuring Logstash to pull data from the database using the PostgreSQL JDBC connector to send it to Elasticsearch for indexing immediately afterward. The imported data can later be analyzed and visualized in Kibana. Then, if your database is brand new, you’ll use pgbench, a PostgreSQL benchmarking tool, to create more interesting visualizations. In the end, you’ll have an automated system pulling in PostgreSQL statistics for later analysis.

Prerequisites

Step 1 — Setting up Logstash and the PostgreSQL JDBC Driver

In this section, you will install Logstash and download the PostgreSQL JDBC driver so that Logstash will be able to connect to your managed database.

Start off by installing Logstash with the following command:


  • sudo apt install logstash -y

Once Logstash is installed, enable the service to automatically start on boot:


  • sudo systemctl enable logstash

Logstash is written in Java, so in order to connect to PostgreSQL it requires the PostgreSQL JDBC (Java Database Connectivity) library to be available on the system it is running on. Because of an internal limitation, Logstash will properly load the library only if it is found under the /usr/share/logstash/logstash-core/lib/jars directory, where it stores third-party libraries it uses.

Head over to the download page of the JDBC library and copy the link to latest version. Then, download it using curl by running the following command:


  • sudo curl https://jdbc.postgresql.org/download/postgresql-42.2.6.jar -o /usr/share/logstash/logstash-core/lib/jars/postgresql-jdbc.jar

At the time of writing, the latest version of the library was 42.2.6, with Java 8 as the supported runtime version. Ensure you download the latest version; pairing it with the correct Java version that both JDBC and Logstash support.

Logstash stores its configuration files under /etc/logstash/conf.d, and is itself stored under /usr/share/logstash/bin. Before you create a configuration that will pull statistics from your database, you’ll need to enable the JDBC plugin in Logstash by running the following command:


  • sudo /usr/share/logstash/bin/logstash-plugin install logstash-input-jdbc

You’ve installed Logstash using apt and downloaded the PostgreSQL JDBC library so that Logstash can use it to connect to your managed database. In the next step, you will configure Logstash to pull statistical data from it.

Step 2 — Configuring Logstash To Pull Statistics

In this section, you will configure Logstash to pull metrics from your managed PostgreSQL database.

You’ll configure Logstash to watch over three system databases in PostgreSQL, namely:

  • pg_stat_database: provides statistics about each database, including its name, number of connections, transactions, rollbacks, rows returned by querying the database, deadlocks, and so on. It has a stats_reset field, which specifies when the statistics were last reset.
  • pg_stat_user_tables: provides statistics about each table created by the user, such as the number of inserted, deleted, and updated rows.
  • pg_stat_user_indexes: collects data about all indexes in user-created tables, such as the number of times a particular index has been scanned.

You’ll store the configuration for indexing PostgreSQL statistics in Elasticsearch in a file named postgresql.conf under the /etc/logstash/conf.d directory, where Logstash stores configuration files. When started as a service, it will automatically run them in the background.

Create postgresql.conf using your favorite editor (for example, nano):


  • sudo nano /etc/logstash/conf.d/postgresql.conf

Add the following lines:

/etc/logstash/conf.d/postgresql.conf
input {
        # pg_stat_database
        jdbc {
                jdbc_driver_library => ""
                jdbc_driver_class => "org.postgresql.Driver"
                jdbc_connection_string => "jdbc:postgresql://host:port/defaultdb"
                jdbc_user => "username"
                jdbc_password => "password"
                statement => "SELECT * FROM pg_stat_database"
                schedule => "* * * * *"
                type => "pg_stat_database"
        }

        # pg_stat_user_tables
        jdbc {
                jdbc_driver_library => ""
                jdbc_driver_class => "org.postgresql.Driver"
                jdbc_connection_string => "jdbc:postgresql://host:port/defaultdb"
                jdbc_user => "username"
                jdbc_password => "password"
                statement => "SELECT * FROM pg_stat_user_tables"
                schedule => "* * * * *"
                type => "pg_stat_user_tables"
        }

        # pg_stat_user_indexes
        jdbc {
                jdbc_driver_library => ""
                jdbc_driver_class => "org.postgresql.Driver"
                jdbc_connection_string => "jdbc:postgresql://host:port/defaultdb"
                jdbc_user => "username"
                jdbc_password => "password"
                statement => "SELECT * FROM pg_stat_user_indexes"
                schedule => "* * * * *"
                type => "pg_stat_user_indexes"
        }
}

output {
        elasticsearch {
                hosts => "http://localhost:9200"
                index => "%{type}"
        }
}

Remember to replace host with your host address, port with the port to which you can connect to your database, username with the database user username, and password with its password. All these values can be found in the Control Panel of your managed database.

In this configuration, you define three JDBC inputs and one Elasticsearch output. The three inputs pull data from the pg_stat_database, pg_stat_user_tables, and pg_stat_user_indexes databases, respectively. They all set the jdbc_driver_library parameter to an empty string, because the PostgreSQL JDBC library is in a folder that Logstash automatically loads.

Then, they set the jdbc_driver_class, whose value is specific to the JDBC library, and provide a jdbc_connection_string, which details how to connect to the database. The jdbc: part signifies that it is a JDBC connection, while postgres:// indicates that the target database is PostgreSQL. Next come the host and port of the database, and after the forward slash you also specify a database to connect to; this is because PostgreSQL requires you to be connected to a database to be able to issue any queries. Here, it is set to the default database that always exists and can not be deleted, aptly named defaultdb.

Next, they set a username and password of the user through which the database will be accessed. The statement parameter contains a SQL query that should return the data you wish to process—in this configuration, it selects all rows from the appropriate database.

The schedule parameter accepts a string in cron syntax that defines when Logstash should run this input; omitting it completely will make Logstash run it only once. Specifying * * * * *, as you have done so here, will tell Logstash to run it every minute. You can specify your own cron string if you want to collect data at different intervals.

There is only one output, which accepts data from three inputs. They all send data to Elasticsearch, which is running locally and is reachable at http://localhost:9200. The index parameter defines to which Elasticsearch index it will send the data, and its value is passed in from the type field of the input.

When you are done with editing, save and close the file.

You’ve configured Logstash to gather data from various PostgreSQL statistical tables and send them to Elasticsearch for storage and indexing. Next, you’ll run Logstash to test the configuration.

Step 3 — Testing the Logstash Configuration

In this section, you will test the configuration by running Logstash to verify it will properly pull the data. Then, you will make this configuration run in the background by configuring it as a Logstash pipeline.

Logstash supports running a specific configuration by passing its file path to the -f parameter. Run the following command to test your new configuration from the last step:


  • sudo /usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/postgresql.conf

It may take some time before it shows any output, which will look similar to this:


Output
Thread.exclusive is deprecated, use Thread::Mutex WARNING: Could not find logstash.yml which is typically located in $LS_HOME/config or /etc/logstash. You can specify the path using --path.settings. Continuing using the defaults Could not find log4j2 configuration at path /usr/share/logstash/config/log4j2.properties. Using default config which logs errors to the console [WARN ] 2019-08-02 18:29:15.123 [LogStash::Runner] multilocal - Ignoring the 'pipelines.yml' file because modules or command line options are specified [INFO ] 2019-08-02 18:29:15.154 [LogStash::Runner] runner - Starting Logstash {"logstash.version"=>"7.3.0"} [INFO ] 2019-08-02 18:29:18.209 [Converge PipelineAction::Create
] Reflections - Reflections took 77 ms to scan 1 urls, producing 19 keys and 39 values [INFO ] 2019-08-02 18:29:20.195 [[main]-pipeline-manager] elasticsearch - Elasticsearch pool URLs updated {:changes=>{:removed=>[], :added=>[http://localhost:9200/]}} [WARN ] 2019-08-02 18:29:20.667 [[main]-pipeline-manager] elasticsearch - Restored connection to ES instance {:url=>"http://localhost:9200/"} [INFO ] 2019-08-02 18:29:21.221 [[main]-pipeline-manager] elasticsearch - ES Output version determined {:es_version=>7} [WARN ] 2019-08-02 18:29:21.230 [[main]-pipeline-manager] elasticsearch - Detected a 6.x and above cluster: the `type` event field won't be used to determine the document _type {:es_version=>7} [INFO ] 2019-08-02 18:29:21.274 [[main]-pipeline-manager] elasticsearch - New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["http://localhost:9200"]} [INFO ] 2019-08-02 18:29:21.337 [[main]-pipeline-manager] elasticsearch - Elasticsearch pool URLs updated {:changes=>{:removed=>[], :added=>[http://localhost:9200/]}} [WARN ] 2019-08-02 18:29:21.369 [[main]-pipeline-manager] elasticsearch - Restored connection to ES instance {:url=>"http://localhost:9200/"} [INFO ] 2019-08-02 18:29:21.386 [[main]-pipeline-manager] elasticsearch - ES Output version determined {:es_version=>7} [WARN ] 2019-08-02 18:29:21.386 [[main]-pipeline-manager] elasticsearch - Detected a 6.x and above cluster: the `type` event field won't be used to determine the document _type {:es_version=>7} [INFO ] 2019-08-02 18:29:21.409 [[main]-pipeline-manager] elasticsearch - New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["http://localhost:9200"]} [INFO ] 2019-08-02 18:29:21.430 [[main]-pipeline-manager] elasticsearch - Elasticsearch pool URLs updated {:changes=>{:removed=>[], :added=>[http://localhost:9200/]}} [WARN ] 2019-08-02 18:29:21.444 [[main]-pipeline-manager] elasticsearch - Restored connection to ES instance {:url=>"http://localhost:9200/"} [INFO ] 2019-08-02 18:29:21.465 [[main]-pipeline-manager] elasticsearch - ES Output version determined {:es_version=>7} [WARN ] 2019-08-02 18:29:21.466 [[main]-pipeline-manager] elasticsearch - Detected a 6.x and above cluster: the `type` event field won't be used to determine the document _type {:es_version=>7} [INFO ] 2019-08-02 18:29:21.468 [Ruby-0-Thread-7: :1] elasticsearch - Using default mapping template [INFO ] 2019-08-02 18:29:21.538 [Ruby-0-Thread-5: :1] elasticsearch - Using default mapping template [INFO ] 2019-08-02 18:29:21.545 [[main]-pipeline-manager] elasticsearch - New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["http://localhost:9200"]} [INFO ] 2019-08-02 18:29:21.589 [Ruby-0-Thread-9: :1] elasticsearch - Using default mapping template [INFO ] 2019-08-02 18:29:21.696 [Ruby-0-Thread-5: :1] elasticsearch - Attempting to install template {:manage_template=>{"index_patterns"=>"logstash-*", "version"=>60001, "settings"=>{"index.refresh_interval"=>"5s", "number_of_shards"=>1}, "mappings"=>{"dynamic_templates"=>[{"message_field"=>{"path_match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false}}}, {"string_fields"=>{"match"=>"*", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false, "fields"=>{"keyword"=>{"type"=>"keyword", "ignore_above"=>256}}}}}], "properties"=>{"@timestamp"=>{"type"=>"date"}, "@version"=>{"type"=>"keyword"}, "geoip"=>{"dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip"}, "location"=>{"type"=>"geo_point"}, "latitude"=>{"type"=>"half_float"}, "longitude"=>{"type"=>"half_float"}}}}}}} [INFO ] 2019-08-02 18:29:21.769 [Ruby-0-Thread-7: :1] elasticsearch - Attempting to install template {:manage_template=>{"index_patterns"=>"logstash-*", "version"=>60001, "settings"=>{"index.refresh_interval"=>"5s", "number_of_shards"=>1}, "mappings"=>{"dynamic_templates"=>[{"message_field"=>{"path_match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false}}}, {"string_fields"=>{"match"=>"*", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false, "fields"=>{"keyword"=>{"type"=>"keyword", "ignore_above"=>256}}}}}], "properties"=>{"@timestamp"=>{"type"=>"date"}, "@version"=>{"type"=>"keyword"}, "geoip"=>{"dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip"}, "location"=>{"type"=>"geo_point"}, "latitude"=>{"type"=>"half_float"}, "longitude"=>{"type"=>"half_float"}}}}}}} [INFO ] 2019-08-02 18:29:21.771 [Ruby-0-Thread-9: :1] elasticsearch - Attempting to install template {:manage_template=>{"index_patterns"=>"logstash-*", "version"=>60001, "settings"=>{"index.refresh_interval"=>"5s", "number_of_shards"=>1}, "mappings"=>{"dynamic_templates"=>[{"message_field"=>{"path_match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false}}}, {"string_fields"=>{"match"=>"*", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false, "fields"=>{"keyword"=>{"type"=>"keyword", "ignore_above"=>256}}}}}], "properties"=>{"@timestamp"=>{"type"=>"date"}, "@version"=>{"type"=>"keyword"}, "geoip"=>{"dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip"}, "location"=>{"type"=>"geo_point"}, "latitude"=>{"type"=>"half_float"}, "longitude"=>{"type"=>"half_float"}}}}}}} [WARN ] 2019-08-02 18:29:21.871 [[main]-pipeline-manager] LazyDelegatingGauge - A gauge metric of an unknown type (org.jruby.specialized.RubyArrayOneObject) has been create for key: cluster_uuids. This may result in invalid serialization. It is recommended to log an issue to the responsible developer/development team. [INFO ] 2019-08-02 18:29:21.878 [[main]-pipeline-manager] javapipeline - Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>1, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50, "pipeline.max_inflight"=>125, :thread=>"#>>>>>

’’

’’

&


Entrada Relacionada