AWS Certified Developer - Associate

Data Analytics

Athena Demo

In this lesson, we demonstrate how to work with Amazon Athena using a sample CSV file containing dummy data. The CSV file simulates user or contact records with columns such as ID, first name, last name, email, secondary email, profession, date joined, and country code.

Below is an excerpt from the CSV file:

id,firstname,lastname,email,email2,profession,date joined,Country
100,Calla,Judyé,[email protected],[email protected],developer,1943-10-17,UA
101,Letizia,Lanita,[email protected],[email protected],firefighter,1995-02-08,AD
102,Jorry,Zuzana,[email protected],[email protected],developer,2005-01-01,GY
103,Dorothy,Kannry,[email protected],[email protected],doctor,1921-02-15,LR
104,Annora,Burkle,[email protected],[email protected],firefighter,1912-08-26,PN
105,Alyssa,August,[email protected],[email protected],worker,2014-06-20,ET
106,Fanchon,Grobe,[email protected],[email protected],doctor,1902-09-17,UY
107,Atlanta,Daveta,[email protected],[email protected],firefighter,2015-04-04,KE
108,Kalina,Durware,[email protected],[email protected],firefighter,1944-12-13,IL
109,Meg,Henenbry,[email protected],[email protected],developer,1909-01-13,GE
110,Raina,Earlie,[email protected],[email protected],police officer,1991-08-23,KN
111,Marsiella,Noman,[email protected],[email protected],police officer,1993-12-24,GU
112,Dode,Colbert,[email protected],[email protected],police officer,2008-09-30,SG
113,Leona,Aida,[email protected],[email protected],developer,1994-02-25,CL
114,Bertine,Stanwood,[email protected],[email protected],doctor,2008-01-10,US
115,Pauilta,Gahl,[email protected],[email protected],doctor,1932-12-14,HN

The CSV also contains additional test data:

1943-12-23,LU

And more sample rows:

2010-08-05,NA
9938,10036,Marnia,Guthrie,[email protected],[email protected],developer,
9939,10037,Leona,Blisse,[email protected],[email protected],firefighter,1901-01-14,TR
9940,10038,Jordan,Chabot,[email protected],[email protected],firefighter,1906-09-29,CX
9941,10039,Lorie,Riordan,[email protected],[email protected],developer,1901-06-23,TT
9942,10040,Calla,Catie,[email protected],[email protected],firefighter,2013-04-18,PK
9943,10041,Annaliese,Hebner,[email protected],[email protected],doctor,
2003-09-11,BN
9944,10042,Halette,Ciapas,[email protected],[email protected],doctor,1901-12-16,AF
9945,10043,Ronna,Lorenz,[email protected],[email protected],firefighter,1985-10-22,CK
9946,10044,Veda,Fancie,[email protected],[email protected],police officer,1996-08-01,BT
9947,10045,Sashenka,Loeb,[email protected],[email protected],doctor,2016-11-15,MG
9948,10046,Annaliese,Honoria,[email protected],[email protected],doctor,1950-07-20,GR
9949,10047,Camile,Oster,[email protected],[email protected],developer,1952-09-07,MV
9950,10048,Sharlene,Craggie,[email protected],[email protected],worker,1994-08-12,VC
9951,10049,Neila,Devlen,[email protected],[email protected],police officer,1991-06-29,BS
9952,10050,Melanie,Leifeste,[email protected],[email protected],worker,

Note

Ensure that your CSV file is correctly formatted and stored in an S3 bucket before proceeding.

Uploading the CSV File and Configuring Amazon Athena

Once the CSV file is uploaded to your S3 bucket, you can use Amazon Athena to directly query the data.

Setting Up the Athena Table

After placing the CSV file in S3, open the Amazon Athena console. By default, you will see the default database. For this demonstration, you will create a new database and an external table to work with your CSV data.

Clearing Existing Test Databases

If you have any pre-existing test databases, clear them using:

DROP DATABASE IF EXISTS test;

Creating or Replacing a View

Before setting up the external table, you might need to create or replace a view. Here is an example:

CREATE OR REPLACE VIEW view_name AS
SELECT column1, column2
FROM table_name
WHERE condition;

Creating an External Table for CSV Data

To create an external table that points to your CSV file in S3, follow these steps in the Athena console:

  1. Click on "Create Table" to specify a table from your S3 data source.
  2. Provide a table name (for instance, "users") and choose to create a new database (e.g., "athena_demo").
  3. Specify the S3 bucket location where your CSV file is stored.
  4. Select the table type corresponding to your file format (ensure the CSV option is selected).
  5. Set the field delimiter to a comma.
  6. Define the column details that match your CSV file. For example:
    • id: integer
    • firstname: string
    • lastname: string
    • email: string
    • email2: string
    • profession: string
    • date joined: date
    • Country: string

During this process, the AWS console interface will prompt you to configure the table and database details.

Reviewing Column Details

Review the column details carefully using an excerpt similar to the following:

id,firstname,lastname,email,email2,profession,date joined,Country
100,Calla,Judye,[email protected],[email protected],developer,1943-10-17,UA
101,Letizia,Lanita,[email protected],[email protected],firefighter,1995-02-08
102,Jorry,Zuzana,[email protected],[email protected],developer,2005-01-01,GY
103,Dorothy,Kannry,[email protected],[email protected],doctor,1921-02-15,LR
104,Annora,Burkle,[email protected],[email protected],firefighter,1912-08-26,PN
105,Alyssa,August,[email protected],[email protected],worker,2014-06-20,ET
106,Fanchon,Grobe,[email protected],[email protected],doctor,1902-09-17,UY
107,Atlanta,Daveta,[email protected],[email protected],firefighter,2015-04-04,KE
108,Kalina,Durware,[email protected],[email protected],firefighter,1944-12-13,IL
109,Meg,Henebery,Meg.Hene [email protected],[email protected],developer,1909-01-13,GE
110,Raina,Earlie,[email protected],[email protected],police officer,1991-08-23,KN
111,Marsietta,Noman,[email protected],[email protected],police officer,1993-12-24,GU
112,Dode,Colbert,[email protected],[email protected],police officer,2008-09-30,SG
113,Leona,Aida,[email protected],[email protected],developer,1994-02-25,CL
114,Bertine,Stanwood,[email protected],[email protected],doctor,2008-01-10,US
115,Pauilta,Gahl,[email protected],[email protected],doctor,1932-12-14,HN

Note

Adjust the column types as necessary—for example, converting the ID to an integer and the date joined to a date type.

Sample SQL for Creating the External Table

Once you have configured the table details, Athena will generate an SQL statement. An example statement might look like:

CREATE EXTERNAL TABLE IF NOT EXISTS athena_demo.`user` (
  id int,
  firstname string,
  lastname string,
  email string,
  email2 string,
  profession string,
  date date,
  country string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES ('field.delim' = ',')
STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 's3://athena-demo-kodeloud/'
TBLPROPERTIES ('classification' = 'csv');

After creating the table, you can query your data using standard SQL syntax.

Querying the Data

To retrieve all records from your table, run:

SELECT * FROM user;

This query returns all rows from your CSV file. If you prefer to see a subset of entries, add the LIMIT clause:

SELECT * FROM user LIMIT 10;

You can also filter results. For example, to fetch records with the country code "US":

SELECT * FROM user WHERE country = 'US';

The Athena query editor displays the results with columns including ID, first name, last name, email, profession, and country.

Dropping the Table and Database

If needed, you can remove the table or database using standard SQL commands:

  • To drop the table:

    DROP TABLE user;
    
  • To drop the database:

    DROP DATABASE athena_demo;
    

Conclusion

This demonstration explained how to upload a CSV file to an S3 bucket, create an external table in Amazon Athena that references the CSV data, and execute SQL queries to retrieve and filter your data. For further details, refer to the Amazon Athena documentation.

The image shows an Amazon Web Services (AWS) interface for creating a table from S3 bucket data using Amazon Athena. It includes fields for table details and database configuration.

The image shows a section of the AWS console where a user is defining column details for a table, including column names, types, and optional descriptions.

The image shows a screenshot of an AWS Athena query editor displaying a table with user data, including columns for ID, first name, last name, email, profession, and country. The query results indicate 43 entries with various professions and email addresses.

Watch Video

Watch video content

Previous
Athena