AWS Solutions Architect Associate Certification
Services Migration and Transfer
Datasync
In this article, we explore AWS DataSync—a powerful service engineered to facilitate rapid and secure data transfers between on-premises storage and AWS storage services. DataSync simplifies, automates, and accelerates the migration of large datasets by streamlining the transfer process to and from AWS.
Core Components of AWS DataSync
AWS DataSync comprises four essential components that work together to ensure efficient data migration:
Agent
The DataSync agent is a virtual machine appliance responsible for reading from and writing to storage during transfers. It can be deployed on various platforms such as VMware, KVM, or Microsoft Hyper-V.Location
A location represents either the data source or destination. Every DataSync transfer involves two locations—for example, a source location in an on-premises environment and a destination location like an AWS S3 bucket or another AWS storage service.Task
A task defines a DataSync transfer, specifying both the source and destination locations. It also outlines detailed instructions on how data should be copied, including how metadata, deleted files, and permissions are managed.Task Execution
Task execution is the practical run of a task blueprint. Multiple executions can occur over time, each reflecting different transfer phases and statuses.
Below is a diagram that illustrates the four key steps of DataSync:
Task and Task Execution States
Both tasks and their executions progress through specific states during the lifecycle of a transfer.
Task States
Available:
The task is ready to initiate data transfer.Running:
Data transfer is in progress.Unavailable:
The DataSync agent is currently offline.Queued:
Another task using the same agent is running. DataSync processes tasks sequentially, so an agent cannot execute more than one task at a time.
Task Execution States
Queued:
The execution is in line, waiting for the agent if another task is active.Launching:
The initialization phase begins when the agent becomes available and no queuing is required.Preparing:
DataSync identifies which files need to be transferred. The duration of this phase depends on the file count and the performance of both the source and destination systems.Transferring:
The selected data is actively transferred. Real-time updates for the number of bytes and files transferred are visible.Verifying:
If configured, DataSync performs a data integrity check. A successful verification results in a "success" status; otherwise, an "error" state is displayed.
The chart below illustrates the status icons and states for tasks and their executions:
The following flow diagram visually represents the stages of task execution:
AWS DataSync Discovery
Similar to many AWS migration services, AWS DataSync offers a discovery process that provides critical insights before initiating data transfers.
- The DataSync agent connects to your on-premises storage and initiates a discovery job to gather system information.
- The agent sends the collected data to DataSync Discovery via a public service endpoint.
- Based on the gathered information, DataSync Discovery recommends the optimal AWS storage services for your data migration.
The flowchart below outlines the DataSync Discovery process:
Key Features of AWS DataSync
DataSync is equipped with a range of features designed to ensure efficient and secure data transfers:
High-Speed Data Transfer:
Optimized for moving large volumes of data rapidly.Simplified User Interface:
User-friendly management of tasks via an intuitive UI.Data Integrity Checks:
Ensures accurate and complete data transfers.Automation and Scheduling:
Enables frequent, automated data transfers without manual intervention.Multi-Protocol Support:
Compatible with NFS, SMB, and S3 protocols.Incremental Transfers:
Transfers only the updated or changed files to reduce redundancy.Security and Encryption:
Data is encrypted during transit and at rest for maximum security.Monitoring and Logging:
Provides real-time transfer progress tracking, comprehensive history logs, and detailed analysis for troubleshooting.
The diagram below highlights these robust features:
Common Use Cases for AWS DataSync
AWS DataSync supports a wide variety of use cases, including:
On-Premises to AWS Storage Transfers:
Migrate data from your on-premises storage systems to AWS services such as Amazon EFS, S3, or FSx.AWS Region-to-Region Synchronization:
Synchronize data between AWS regions to support data redundancy, disaster recovery, or regional migrations. This ensures optimal performance and availability while relocating data between regions.Third-Party Cloud Migrations:
Seamlessly migrate data from other cloud providers such as Microsoft Azure or Google Cloud Platform (GCP) to AWS.
Note
AWS DataSync not only transfers files and objects securely but also supports data replication to AWS storage services. This ensures enhanced data protection, backup capabilities, and the opportunity to archive less frequently accessed data to cost-effective storage like Amazon S3 Glacier.
By leveraging AWS DataSync, organizations can reduce on-premises storage costs and streamline data migrations while maintaining high levels of security and efficiency. DataSync integrates seamlessly with various AWS services, making it an ideal solution for a diverse range of data migration and synchronization projects.
For further reading on AWS services and best practices, check out the following resources:
Watch Video
Watch video content