Machine Learning allows you to get better insights from your data. But where is most of the structured data stored? In databases! Today, in order to use machine learning with data in a relational database, you need to develop a custom application to read the data from the database and then apply the machine learning model. Developing this application requires a mix of skills to be able to interact with the database and use machine learning. This is a new application, and now you have to manage its performance, availability, and security.
Can we make it easier to apply machine learning to data in a relational database? Even for existing applications?
Starting today, Amazon Aurora is natively integrated with two AWS machine learning services:
- Amazon SageMaker, a service providing you with the ability to build, train, and deploy custom machine learning models quickly.
- Amazon Comprehend, a natural language processing (NLP) service that uses machine learning to find insights in text.
Using this new functionality, you can use a SQL function in your queries to apply a machine learning model to the data in your relational database. For example, you can detect the sentiment of a user comment using Comprehend, or apply a custom machine learning model built with SageMaker to estimate the risk of “churn” for your customers. Churn is a word mixing “change” and “turn” and is used to describe customers that stop using your services.
You can store the output of a large query including the additional information from machine learning services in a new table, or use this feature interactively in your application by just changing the SQL code run by the clients, with no machine learning experience required.
Let’s see a couple of examples of what you can do from an Aurora database, first by using Comprehend, then SageMaker.
Configuring Database Permissions
The first step is to give the database permissions to access the services you want to use: Comprehend, SageMaker, or both. In the RDS console, I create a new Aurora MySQL 5.7 database. When it is available, in the Connectivity & security tab of the regional endpoint, I look for the Manage IAM roles section.
There I connect Comprehend and SageMaker to this database cluster. For SageMaker, I need to provide the Amazon Resource Name (ARN) of the endpoint of a deployed machine learning model. If you want to use multiple endpoints, you need to repeat this step. The console takes care of creating the service roles for the Aurora database to access those services in order for the new machine learning integration to work.
Using Comprehend from Amazon Aurora
I connect to the database using a MySQL client. To run my tests, I create a table storing comments for a blogging platform and insert a few sample records:
CREATE TABLE IF NOT EXISTS comments ( comment_id INT AUTO_INCREMENT PRIMARY KEY, comment_text VARCHAR(255) NOT NULL ); INSERT INTO comments (comment_text) VALUES ("This is very useful, thank you for writing it!"); INSERT INTO comments (comment_text) VALUES ("Awesome, I was waiting for this feature."); INSERT INTO comments (comment_text) VALUES ("An interesting write up, please add more details."); INSERT INTO comments (comment_text) VALUES ("I don’t like how this was implemented.");
To detect the sentiment of the comments in my table, I can use the
aws_comprehend_detect_sentiment_confidence SQL functions:
SELECT comment_text, aws_comprehend_detect_sentiment(comment_text, 'en') AS sentiment, aws_comprehend_detect_sentiment_confidence(comment_text, 'en') AS confidence FROM comments;
aws_comprehend_detect_sentiment function returns the most probable sentiment for the input text:
aws_comprehend_detect_sentiment_confidence function returns the confidence of the sentiment detection, between 0 (not confident at all) and 1 (fully confident).
Using SageMaker Endpoints from Amazon Aurora
Similarly to what I did with Comprehend, I can access a SageMaker endpoint to enrich the information stored in my database. To see a practical use case, let’s implement the customer churn example mentioned at the beginning of this post.
Mobile phone operators have historical records on which customers ultimately ended up churning and which continued using the service. We can use this historical information to construct a machine learning model. As input for the model, we’re looking at the current subscription plan, how much the customer is speaking on the phone at different times of day, and how often has called customer service.
Here’s the structure of my customer table:
SHOW COLUMNS FROM customers;
To be able to identify customers at risk of churn, I train a model following this sample SageMaker notebook using the XGBoost algorithm. When the model has been created, it’s deployed to a hosted endpoint.
When the SageMaker endpoint is in service, I go back to the Manage IAM roles section of the console to give the Aurora database permissions to access the endpoint ARN.
Now, I create a new
will_churn SQL function giving input to the endpoint the parameters required by the model:
CREATE FUNCTION will_churn ( state varchar(2048), acc_length bigint(20), area_code bigint(20), int_plan varchar(2048), vmail_plan varchar(2048), vmail_msg bigint(20), day_mins double, day_calls bigint(20), eve_mins double, eve_calls bigint(20), night_mins double, night_calls bigint(20), int_mins double, int_calls bigint(20), cust_service_calls bigint(20)) RETURNS varchar(2048) CHARSET latin1 alias aws_sagemaker_invoke_endpoint endpoint name 'estimate_customer_churn_endpoint_version_123';
As you can see, the model looks at the customer’s phone subscription details and service usage patterns to identify the risk of churn. Using the
will_churn SQL function, I run a query over my
customers table to flag customers based on my machine learning model. To store the result of the query, I create a new
CREATE TABLE customers_churn AS SELECT *, will_churn(state, acc_length, area_code, int_plan, vmail_plan, vmail_msg, day_mins, day_calls, eve_mins, eve_calls, night_mins, night_calls, int_mins, int_calls, cust_service_calls) will_churn FROM customers;
Let’s see a few records from the
SELECT * FROM customers_churn LIMIT 7;
I am lucky the first 7 customers are apparently not going to churn. But what happens overall? Since I stored the results of the
will_churn function, I can run a
SELECT GROUP BY statement on the
SELECT will_churn, COUNT(*) FROM customers_churn GROUP BY will_churn;
Starting from there, I can dive deep to understand what brings my customers to churn.
If I create a new version of my machine learning model, with a new endpoint ARN, I can recreate the
will_churn function without changing my SQL statements.
The new machine learning integration is available today for Aurora MySQL 5.7, with the SageMaker integration generally available and the Comprehend integration in preview. You can learn more in the documentation. We are working on other engines and versions: Aurora MySQL 5.6 and Aurora PostgreSQL 10 and 11 are coming soon.
The Aurora machine learning integration is available in all regions in which the underlying services are available. For example, if both Aurora MySQL 5.7 and SageMaker are available in a region, then you can use the integration for SageMaker. For a complete list of services availability, please see the AWS Regional Table.
There’s no additional cost for using the integration, you just pay for the underlying services at your normal rates. Pay attention to the size of your queries when using Comprehend. For example, if you do sentiment analysis on user feedback in your customer service web page, to contact those who made particularly positive or negative comments, and people are making 10,000 comments a day, you’d pay $3/day. To optimize your costs, remember to store results.
It’s never been easier to apply machine learning models to data stored in your relational databases. Let me know what you are going to build with this!
Originally posted on AWS News Blog
Author: Danilo Poccia