AWS Certified Developer - Associate
Data Analytics
Athena Demo
In this lesson, we demonstrate how to work with Amazon Athena using a sample CSV file containing dummy data. The CSV file simulates user or contact records with columns such as ID, first name, last name, email, secondary email, profession, date joined, and country code.
Below is an excerpt from the CSV file:
id,firstname,lastname,email,email2,profession,date joined,Country
100,Calla,Judyé,[email protected],[email protected],developer,1943-10-17,UA
101,Letizia,Lanita,[email protected],[email protected],firefighter,1995-02-08,AD
102,Jorry,Zuzana,[email protected],[email protected],developer,2005-01-01,GY
103,Dorothy,Kannry,[email protected],[email protected],doctor,1921-02-15,LR
104,Annora,Burkle,[email protected],[email protected],firefighter,1912-08-26,PN
105,Alyssa,August,[email protected],[email protected],worker,2014-06-20,ET
106,Fanchon,Grobe,[email protected],[email protected],doctor,1902-09-17,UY
107,Atlanta,Daveta,[email protected],[email protected],firefighter,2015-04-04,KE
108,Kalina,Durware,[email protected],[email protected],firefighter,1944-12-13,IL
109,Meg,Henenbry,[email protected],[email protected],developer,1909-01-13,GE
110,Raina,Earlie,[email protected],[email protected],police officer,1991-08-23,KN
111,Marsiella,Noman,[email protected],[email protected],police officer,1993-12-24,GU
112,Dode,Colbert,[email protected],[email protected],police officer,2008-09-30,SG
113,Leona,Aida,[email protected],[email protected],developer,1994-02-25,CL
114,Bertine,Stanwood,[email protected],[email protected],doctor,2008-01-10,US
115,Pauilta,Gahl,[email protected],[email protected],doctor,1932-12-14,HN
The CSV also contains additional test data:
1943-12-23,LU
And more sample rows:
2010-08-05,NA
9938,10036,Marnia,Guthrie,[email protected],[email protected],developer,
9939,10037,Leona,Blisse,[email protected],[email protected],firefighter,1901-01-14,TR
9940,10038,Jordan,Chabot,[email protected],[email protected],firefighter,1906-09-29,CX
9941,10039,Lorie,Riordan,[email protected],[email protected],developer,1901-06-23,TT
9942,10040,Calla,Catie,[email protected],[email protected],firefighter,2013-04-18,PK
9943,10041,Annaliese,Hebner,[email protected],[email protected],doctor,
2003-09-11,BN
9944,10042,Halette,Ciapas,[email protected],[email protected],doctor,1901-12-16,AF
9945,10043,Ronna,Lorenz,[email protected],[email protected],firefighter,1985-10-22,CK
9946,10044,Veda,Fancie,[email protected],[email protected],police officer,1996-08-01,BT
9947,10045,Sashenka,Loeb,[email protected],[email protected],doctor,2016-11-15,MG
9948,10046,Annaliese,Honoria,[email protected],[email protected],doctor,1950-07-20,GR
9949,10047,Camile,Oster,[email protected],[email protected],developer,1952-09-07,MV
9950,10048,Sharlene,Craggie,[email protected],[email protected],worker,1994-08-12,VC
9951,10049,Neila,Devlen,[email protected],[email protected],police officer,1991-06-29,BS
9952,10050,Melanie,Leifeste,[email protected],[email protected],worker,
Note
Ensure that your CSV file is correctly formatted and stored in an S3 bucket before proceeding.
Uploading the CSV File and Configuring Amazon Athena
Once the CSV file is uploaded to your S3 bucket, you can use Amazon Athena to directly query the data.
Setting Up the Athena Table
After placing the CSV file in S3, open the Amazon Athena console. By default, you will see the default database. For this demonstration, you will create a new database and an external table to work with your CSV data.
Clearing Existing Test Databases
If you have any pre-existing test databases, clear them using:
DROP DATABASE IF EXISTS test;
Creating or Replacing a View
Before setting up the external table, you might need to create or replace a view. Here is an example:
CREATE OR REPLACE VIEW view_name AS
SELECT column1, column2
FROM table_name
WHERE condition;
Creating an External Table for CSV Data
To create an external table that points to your CSV file in S3, follow these steps in the Athena console:
- Click on "Create Table" to specify a table from your S3 data source.
- Provide a table name (for instance, "users") and choose to create a new database (e.g., "athena_demo").
- Specify the S3 bucket location where your CSV file is stored.
- Select the table type corresponding to your file format (ensure the CSV option is selected).
- Set the field delimiter to a comma.
- Define the column details that match your CSV file. For example:
- id: integer
- firstname: string
- lastname: string
- email: string
- email2: string
- profession: string
- date joined: date
- Country: string
During this process, the AWS console interface will prompt you to configure the table and database details.
Reviewing Column Details
Review the column details carefully using an excerpt similar to the following:
id,firstname,lastname,email,email2,profession,date joined,Country
100,Calla,Judye,[email protected],[email protected],developer,1943-10-17,UA
101,Letizia,Lanita,[email protected],[email protected],firefighter,1995-02-08
102,Jorry,Zuzana,[email protected],[email protected],developer,2005-01-01,GY
103,Dorothy,Kannry,[email protected],[email protected],doctor,1921-02-15,LR
104,Annora,Burkle,[email protected],[email protected],firefighter,1912-08-26,PN
105,Alyssa,August,[email protected],[email protected],worker,2014-06-20,ET
106,Fanchon,Grobe,[email protected],[email protected],doctor,1902-09-17,UY
107,Atlanta,Daveta,[email protected],[email protected],firefighter,2015-04-04,KE
108,Kalina,Durware,[email protected],[email protected],firefighter,1944-12-13,IL
109,Meg,Henebery,Meg.Hene [email protected],[email protected],developer,1909-01-13,GE
110,Raina,Earlie,[email protected],[email protected],police officer,1991-08-23,KN
111,Marsietta,Noman,[email protected],[email protected],police officer,1993-12-24,GU
112,Dode,Colbert,[email protected],[email protected],police officer,2008-09-30,SG
113,Leona,Aida,[email protected],[email protected],developer,1994-02-25,CL
114,Bertine,Stanwood,[email protected],[email protected],doctor,2008-01-10,US
115,Pauilta,Gahl,[email protected],[email protected],doctor,1932-12-14,HN
Note
Adjust the column types as necessary—for example, converting the ID to an integer and the date joined to a date type.
Sample SQL for Creating the External Table
Once you have configured the table details, Athena will generate an SQL statement. An example statement might look like:
CREATE EXTERNAL TABLE IF NOT EXISTS athena_demo.`user` (
id int,
firstname string,
lastname string,
email string,
email2 string,
profession string,
date date,
country string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES ('field.delim' = ',')
STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 's3://athena-demo-kodeloud/'
TBLPROPERTIES ('classification' = 'csv');
After creating the table, you can query your data using standard SQL syntax.
Querying the Data
To retrieve all records from your table, run:
SELECT * FROM user;
This query returns all rows from your CSV file. If you prefer to see a subset of entries, add the LIMIT clause:
SELECT * FROM user LIMIT 10;
You can also filter results. For example, to fetch records with the country code "US":
SELECT * FROM user WHERE country = 'US';
The Athena query editor displays the results with columns including ID, first name, last name, email, profession, and country.
Dropping the Table and Database
If needed, you can remove the table or database using standard SQL commands:
To drop the table:
DROP TABLE user;
To drop the database:
DROP DATABASE athena_demo;
Conclusion
This demonstration explained how to upload a CSV file to an S3 bucket, create an external table in Amazon Athena that references the CSV data, and execute SQL queries to retrieve and filter your data. For further details, refer to the Amazon Athena documentation.
Watch Video
Watch video content