Welcome back, AWS Solutions Architects. In this article, presented by Michael Forrester, we explore the power of data transformation with Glue DataBrew—a service that markedly differs from Glue ETL. Glue DataBrew is a visual data preparation tool that enables you to clean and normalize data without writing a single line of code. Unlike Glue ETL, which allows you to apply Python or PySpark code for data transformations, DataBrew relies purely on a graphical user interface.Documentation Index
Fetch the complete documentation index at: https://notes.kodekloud.com/llms.txt
Use this file to discover all available pages before exploring further.
How Glue DataBrew Works
The workflow of Glue DataBrew is straightforward:- Create a Project: Establish a workspace to interact, analyze, explore, and perform data preparation tasks.
- Select Datasets and/or Data Sources: Import data from various sources such as S3, Redshift, or other services—similar to the process in Glue ETL.
- Choose Recipes: Recipes are sets of visual data transformation steps, including operations like filtering rows and converting data types (e.g., string to number). All operations are applied from an intuitive menu without the need for coding.
- Run the Recipe: When executed, DataBrew applies all specified transformations to the complete dataset. The processed data is then stored in Amazon S3 for consumption by other services.
One significant advantage of Glue DataBrew is its serverless nature. This means you do not need to manage, secure, or scale servers manually. Instead, operational aspects like monitoring are seamlessly handled via services such as CloudWatch.

Example Workflow
Consider a workflow where data is sourced from S3 and ingested into Glue DataBrew. It leverages pre-built transformations, and the output is subsequently loaded into Athena. The processed data then becomes accessible to QuickSight for analysis by data and business analysts.

Key Features of Glue DataBrew
| Feature | Description |
|---|---|
| Visual Data Preparation | Clean and transform data through an intuitive graphical interface—no coding required. |
| Data Profiling | Automatically generate metadata statistics to identify outliers, anomalies, missing values, and inconsistencies. |
| Scalability | Automatically scales with your data preparation workload without manual intervention. |
| Integration with AWS Data Stores | Seamlessly integrates with services such as Aurora, Redshift, and RDS. |
| Job Scheduling and Reusability | Schedule data tasks based on triggers or time, and create reusable project templates. |
