Glue Databrew - KodeKloud

How Glue DataBrew Works

The workflow of Glue DataBrew is straightforward:

Create a Project: Establish a workspace to interact, analyze, explore, and perform data preparation tasks.

Select Datasets and/or Data Sources: Import data from various sources such as S3, Redshift, or other services—similar to the process in Glue ETL.

Choose Recipes: Recipes are sets of visual data transformation steps, including operations like filtering rows and converting data types (e.g., string to number). All operations are applied from an intuitive menu without the need for coding.

Run the Recipe: When executed, DataBrew applies all specified transformations to the complete dataset. The processed data is then stored in Amazon S3 for consumption by other services.

One significant advantage of Glue DataBrew is its serverless nature. This means you do not need to manage, secure, or scale servers manually. Instead, operational aspects like monitoring are seamlessly handled via services such as CloudWatch.

Data sources for Glue DataBrew include the Glue Catalog, various database services, and S3, all of which can be directly integrated into your workflows.

The image is a diagram showing AWS Glue Databrew connected to Amazon S3, Amazon Redshift, Amazon RDS, and AWS Glue.

Example Workflow

Consider a workflow where data is sourced from S3 and ingested into Glue DataBrew. It leverages pre-built transformations, and the output is subsequently loaded into Athena. The processed data then becomes accessible to QuickSight for analysis by data and business analysts.

The image is a flowchart illustrating the data processing workflow using AWS services, including AWS Glue Databrew, AWS Glue, Athena, and QuickSight, with roles for Data Analyst and Business Analyst.

Under the hood, Glue DataBrew utilizes AWS Glue to perform data transformations and supports machine learning workflows. For example, you can source data via DataBrew and export the processed output to services such as SageMaker, Rekognition, or Polly.

The image is a diagram showing the integration of AWS services, including Amazon S3, Amazon Redshift, AWS Glue, AWS Glue Databrew, and Amazon SageMaker. It illustrates a data processing workflow from storage to machine learning.

Key Features of Glue DataBrew

Feature	Description
Visual Data Preparation	Clean and transform data through an intuitive graphical interface—no coding required.
Data Profiling	Automatically generate metadata statistics to identify outliers, anomalies, missing values, and inconsistencies.
Scalability	Automatically scales with your data preparation workload without manual intervention.
Integration with AWS Data Stores	Seamlessly integrates with services such as Aurora, Redshift, and RDS.
Job Scheduling and Reusability	Schedule data tasks based on triggers or time, and create reusable project templates.

The image lists five features: Visual Data Preparation, Data Profiling, Scalability and Performance, Integration with AWS Data Stores, and Job Scheduling and Reusability. Each feature is represented with an icon and a gradient background.

In Summary

Glue DataBrew offers a streamlined, visual, and code-free solution for data transformation that harnesses the power of AWS Glue behind the scenes. By simplifying the data preparation process, DataBrew makes it accessible for users who prefer not to write code and accelerates the journey from raw data to actionable insights.

Thank you for reading—see you in the next article.

​How Glue DataBrew Works

​Example Workflow

​Key Features of Glue DataBrew

​In Summary

Watch Video

How Glue DataBrew Works

Example Workflow

Key Features of Glue DataBrew

In Summary