AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics. Glue is “serverless” – you don’t need to provision or manage any resources and you only pay for resources when Glue is actively running.
AWS Glue version 2.0 is now generally available and features Spark ETL jobs that start 10x faster. This reduction in startup latencies reduces overall job completion times, supports customers with micro-batching and time-sensitive workloads, and increases business productivity by enabling interactive script development and data exploration.
AWS Glue version 2.0 featuring 10x faster Spark ETL job start times, is now generally available. With Glue version 2.0, job startup delay is more predictable and has less overhead. In addition, AWS Glue version 2.0 Spark jobs will be billed in 1-second increments with a 10x lower minimum billing duration—from a 10-minute minimum to a 1-minute minimum. As a result, customers can now run micro-batch, deadline sensitive, interactive workloads more cost effectively. Customers can run micro-batch jobs to quickly load data lakes, data warehouses, and databases and enable real-time analytics. With faster job start times, customers can run SLA driven data pipelines more reliably. Faster job start times also enable interactive data exploration and experimentation. Glue version 2.0 also provides a new capability to install Python modules from a wheel file or from a repository.
How it works
Let’s see how it works on the AWS Management Console. Benefiting from this new feature is easy—you can create new Glue Spark ETL jobs or move your existing Glue Spark ETL jobs to Glue version 2.0 as shown below.
I created a simple Glue job to copy a .csv file across different Amazon S3 buckets.
Glue version 1.0
Glue version 2.0
You can see that the startup time for Glue version 2.0 is 10x faster.
This feature is now available in US East (N. Virginia, Ohio, N.California, and Oregon), Europe (Frankfurt, Ireland, London, Paris, and Stockholm), Asia Pacific (Hong Kong, Mumbai, Seoul, Singapore, Sydney, and Tokyo), Canada (Central), Middle East (Bahrain) and South America (Sao Paulo). Please check out our latest documentation and pricing pages for more details.
Originally posted on AWS News Blog
Author: Harunobu Kameda