AWS Solutions Architect Associate Certification
Services Data and ML
Demo of Athena in Action
In this guide, we demonstrate how to leverage Amazon Athena to query CSV data containing sample user information. The CSV file, which holds roughly 10,000 entries, includes dummy records for users (or contacts) with attributes such as ID, first name, last name, primary and secondary email addresses, profession, join date, and country.
Below is a snippet from the CSV file:
id,firstname,lastname,email,email2,profession,date joined,Country
100,Calla,Judy,[email protected],[email protected],developer,1943-10-17,UA
101,Letizia,Lanita,[email protected],[email protected],firefighter,1995-02-08,AD
102,Jorry,Zuzana,[email protected],[email protected],developer,2005-01-01,GY
103,Dorothy,Kannry,[email protected],[email protected],doctor,1921-02-15,LR
104,Annora,Burkle,[email protected],[email protected],firefighter,1912-08-26,PN
105,Alyssa,August,[email protected],[email protected],worker,2014-06-20,ET
106,Fanchon,Grobe,[email protected],[email protected],doctor,1902-09-17,UY
107,Atlanta,Daveta,[email protected],[email protected],firefighter,2015-04-04,KE
108,Kalina,Durware,[email protected],[email protected],firefighter,1944-12-13,IL
109,Meg,Henegry,[email protected],[email protected],developer,1909-01-13,GE
110,Raina,Earlie,[email protected],[email protected],police officer,1991-08-23,KN
111,Marsiella,Noman,[email protected],[email protected],police officer,1993-12-24,GU
112,Dode,Colbert,[email protected],[email protected],police officer,2008-09-30,SG
113,Leona,Aida,[email protected],[email protected],developer,1994-02-25,CL
114,Bertine,Stanwood,[email protected],[email protected],doctor,2008-01-10,US
115,Paulita,Gahl,[email protected],[email protected],doctor,1932-12-14,HN
Overview
This demonstration involves uploading the CSV file to an S3 bucket and then querying the data using Amazon Athena. All steps—from uploading data to executing SQL queries—are covered in this guide.
Uploading the CSV to S3 and Configuring Athena
Once you have uploaded your CSV file to an S3 bucket, the next step is to set up Amazon Athena to query the data. Follow these steps:
Navigate to the Amazon Athena Console.
Clean up any default databases or test queries by removing unnecessary test databases. For example, you can drop a pre-existing test database using:
DROP DATABASE test;
Click on "Create Table" and follow the on-screen prompts to define a new table linked to your CSV data. During the set-up, you need to provide:
- Table name and database name (either an existing one or a new one).
- S3 location of your CSV file.
- File format, which in this case is CSV.
Specify the details of the CSV such as the field delimiter (a comma) and configure each column's data type. For example, map the ID to an integer type, text fields such as first name, last name, and emails to string types, and the join date to a date type.
The following image shows the AWS Athena console, where you configure the table details and database settings:
Upon completing the form, Athena will generate a SQL statement similar to the one below to create an external table:
CREATE EXTERNAL TABLE IF NOT EXISTS `athena_demo`.`user` (
id int,
firstname string,
lastname string,
email string,
email2 string,
profession string,
date date,
country string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES ('field.delim' = ',')
STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 's3://athena-demo-kodekloud/'
TBLPROPERTIES ('classification' = 'csv');
This query creates an external table that maps directly to your CSV data on S3, making it possible to run SQL queries on the data.
Querying Data with Amazon Athena
To execute queries on your newly created table, follow these guidelines:
Open a new query tab in the Athena console.
To retrieve all the records from the table, use the following SQL statement:
SELECT * FROM user;
This query will return over 10,000 results corresponding to all the CSV entries.
To limit output for testing purposes, use the SQL LIMIT clause:
SELECT * FROM user LIMIT 10;
This query returns only 10 rows, providing a quick preview of your data.
To filter data (for example, to display only users from the United States), run:
SELECT * FROM user WHERE country = 'US';
The image below shows the AWS Athena query editor with the results displayed in a tabular format, featuring columns such as ID, first name, last name, email, profession, and country:
Cleaning Up Resources
After you've completed your queries, it's a good practice to clean up your Athena environment by removing any tables or databases you no longer need.
To delete the table, execute:
DROP TABLE user;
Then, to drop the database, use:
DROP DATABASE athena_demo;
Cleanup Reminder
Cleaning up your resources helps prevent unnecessary charges and keeps your AWS environment organized.
Conclusion
This guide has demonstrated how to upload a CSV file to an S3 bucket and query it using Amazon Athena with standard SQL commands. By mapping an external table to your S3-based CSV file, Athena enables you to easily analyze large datasets. Enjoy using Amazon Athena for your data querying needs, and happy querying!
For additional resources, consider exploring:
Watch Video
Watch video content