Tutorial / Cram Notes
Amazon S3 is an object storage service that offers industry-leading scalability, data availability, security, and performance. With S3, you can store and retrieve any amount of data at any time, from anywhere on the web.
Use Cases:
- Hosting static websites.
- Storing data for Big Data analytics.
- Archiving data with S3 Glacier for long-term retention.
Key Features:
- Eleven 9’s durability and four 9’s availability.
- Scalability to exabytes of data.
- Varying storage classes for cost optimization.
Amazon Elastic Block Store (Amazon EBS)
Amazon EBS provides persistent block storage volumes for use with EC2 instances. EBS volumes are
network-attached and provide the durability and low latency required for mission-critical workloads.
Use Cases:
- Mission-critical applications with stringent performance requirements.
- Workloads that require fine-tuning for performance (IOPS and throughput).
- Relational and NoSQL databases that need persistent storage.
Key Features:
- Different volume types for different workloads, such as Provisioned IOPS (SSD), General Purpose (SSD), Throughput Optimized HDD, and Cold HDD.
- Snapshot feature to back up volumes and create new volumes.
- Encrypted volumes for security.
Amazon Elastic File System (Amazon EFS)
Amazon EFS provides a simple, scalable, and fully managed elastic NFS file system for use with AWS Cloud services and on-premises resources.
Use Cases:
- Managed file storage for EC2 instances.
- Serverless and container storage for services like AWS Lambda and Amazon ECS.
- Content management, web serving, and home directories.
Key Features:
- Pay-as-you-go model scalable up to petabytes of data.
- Support for the NFSv4 protocol.
- Can be used with AWS Transfer for SFTP to transfer files in and out over SFTP.
AWS Storage Gateway
AWS Storage Gateway connects an on-premises software appliance with cloud-based storage, providing seamless integration with data security features.
Use Cases:
- Hybrid cloud storage for backup and archival.
- Disaster recovery purposes.
- Connecting on-premises applications to S3 storage.
Key Features:
- Different types of gateways like File Gateway, Volume Gateway, and Tape Gateway.
- Local caching for frequently accessed data.
- Integration with existing on-premises environments.
Comparison Table
Metrics/Service | S3 | EBS | EFS | Storage Gateway |
---|---|---|---|---|
Storage Type | Object | Block | File | Hybrid |
Durability | 99.999999999% | 99.999% | 99.999999999% | Dependent on S3 |
Max Capacity | Unlimited | 16 TiB per volume | Unlimited | Unlimited (S3) |
Data Access | HTTPS, REST | EC2 attached | NFS, POSIX | NFS, iSCSI |
Storage Classes/Tiers | Yes (6 classes) | Yes (4 types) | NA | NA |
Pricing Model | Pay per use | Provisioned capacity | Pay per use | Gateway usage + Storage used |
Decision Factors
When selecting an AWS storage service, consider the following factors:
- Data Access Patterns: Are you dealing with object, block, or file storage? Your access patterns can determine the service you choose.
- Performance Requirements: Services like EBS are optimized for high IOPS/throughput, whereas S3 is optimized for object storage with a high durability requirement.
- Scalability Needs: Do you need to scale to petabytes and beyond? S3 and EFS provide virtually unlimited storage.
- Pricing: Costs can vary depending on access patterns, data retrieval rates, and the amount of data stored. EBS volumes are generally more expensive than S3 but offer better I/O performance.
- Regulatory Compliance and Security: Security features like encryption, access control lists (ACL), and bucket policies should be considered while choosing the right service.
The following example illustrates selecting the right service based on the use case:
Let’s say you are building a media-sharing application that requires low-latency data access for files and needs to scale significantly over time. Your best bet might be to start with Amazon S3 due to its high durability, scalability, and object storage capabilities. However, if your application requires file system features like a file-locking mechanism, you should consider Amazon EFS.
In preparation for the AWS Certified Solutions Architect – Professional exam, it’s crucial to familiarize yourself with these storage services, how they differ, and what scenarios they are best suited for. The exam will test your ability to evaluate these services and apply your knowledge to various case studies and scenarios, ensuring you can design solutions that efficiently utilize AWS storage services.
Practice Test with Explanation
True/False: Amazon S3 Standard is the best storage option for infrequently accessed data where retrieval time is not sensitive.
- True
- False
False
Amazon S3 Standard is designed for frequently accessed data. For infrequently accessed data, storage classes like S3 Standard-IA or S3 One Zone-IA are more cost-effective.
When storing sensitive data requiring at-rest encryption and frequent access with low-latency, which AWS service is most appropriate?
- A. Amazon Glacier
- B. Amazon EBS
- C. Amazon S3
- D. AWS Storage Gateway
B. Amazon EBS
Amazon EBS is suitable for sensitive data that requires encryption and provides low-latency access. Glacier is for archival, S3 is for object storage, and Storage Gateway is for hybrid storage scenarios.
True/False: Amazon EFS is a good choice for file storage that requires single-instance, high IOPS performance.
- True
- False
False
Amazon EFS is designed for file storage across multiple instances. For single-instance, high IOPS, Amazon EBS provisioned IOPS volumes are a better choice.
Which storage service integrates with on-premises environments to provide a hybrid storage solution?
- A. Amazon S3
- B. AWS Snowball
- C. AWS Storage Gateway
- D. Amazon FSx
C. AWS Storage Gateway
AWS Storage Gateway connects on-premises environments with AWS cloud storage for a hybrid storage solution.
True/False: AWS Snowball is a petabyte-scale data transport solution that can be used for transferring large amounts of data into and out of AWS.
- True
- False
True
AWS Snowball is a data transport solution used to move large amounts of data into and out of the AWS cloud with physical devices, ideal for situations where internet transfer is not feasible.
Which AWS storage service is optimized for large-scale analytics workloads?
- A. Amazon S3
- B. Amazon RDS
- C. Amazon EBS
- D. Amazon Redshift
A. Amazon S3
Amazon S3 can be integrated with various Big Data analytics and data processing services making it optimal for analytical workloads.
True/False: Amazon RDS provides a highly durable and scalable object storage service.
- True
- False
False
Amazon RDS is a relational database service, not an object storage service. Amazon S3 offers scalable object storage.
For storing application-specific objects that are frequently updated, what AWS service is most appropriate?
- A. Amazon Glacier
- B. Amazon DynamoDB
- C. Amazon EBS
- D. Amazon S3
D. Amazon S3
Amazon S3 is well-suited for storing and retrieving any amount of data and is a good choice for application-specific objects with frequent updates.
True/False: Amazon FSx for Lustre is designed for workloads that require a Windows-native file system.
- True
- False
False
Amazon FSx for Lustre is optimized for high-performance computing workloads and is not a Windows-native file system. Amazon FSx for Windows File Server provides Windows-native file storage.
When should you consider using Amazon S3 Glacier Deep Archive?
- A. For data that requires millisecond access times
- B. When cost is a primary concern and data is accessed once or twice a year
- C. For database storage for an online transaction processing (OLTP) workload
- D. For frequently accessed files that require high IOPS
B. When cost is a primary concern and data is accessed once or twice a year
Amazon S3 Glacier Deep Archive is designed for long-term storage of data that is rarely accessed, offering the lowest cost storage option in AWS.
True/False: Amazon EFS provides a scalable file storage solution that can support thousands of concurrent NFS clients.
- True
- False
True
Amazon EFS is designed to provide a simple, scalable, elastic file system with support for thousands of concurrent clients over the NFS protocol.
What AWS service should be used for a managed NoSQL database with flexible data storage and fast retrieval times?
- A. Amazon S3
- B. Amazon RDS
- C. Amazon DynamoDB
- D. Amazon Redshift
C. Amazon DynamoDB
Amazon DynamoDB is a managed NoSQL database service that provides flexible data storage and fast retrieval times, suitable for applications that require consistent, single-digit millisecond latency at any scale.
Interview Questions
When considering storage solutions on AWS, how would you differentiate between the use cases for Amazon S3 and Amazon Elastic Block Store (EBS)?
Amazon S3 is object storage built to store and retrieve any amount of data from anywhere, well-suited for storing large amounts of static data, like web content, backups, and media files. It’s ideal for data that is accessed infrequently but requires high durability and availability. Amazon EBS, on the other hand, provides block-level storage volumes for use with EC2 instances. It’s designed for data requiring frequent and fast access, such as the data needed by a database or a file system.
Can you explain how Amazon S3 data consistency models affect the selection of storage for a distributed application?
Amazon S3 offers strong read-after-write consistency for PUTS of new objects and eventual consistency for overwrite PUTS and DELETES, meaning that once you upload a file, you can read it immediately, but if you modify or delete a file, the change might not be visible immediately to all users. This affects distributed applications in that you must design them to handle eventual consistency where necessary. Applications requiring strong consistency for all operations might need to consider alternative storage solutions or add logic to manage consistency.
Describe a scenario where using Amazon S3 Intelligent-Tiering would be beneficial.
S3 Intelligent-Tiering is beneficial for data with unknown or changing access patterns. For instance, if you’re storing a mixture of archival data and data that might be accessed at unpredictable intervals, Intelligent-Tiering can help save costs by automatically moving data to the most cost-effective access tier without performance impact or operational overhead.
How would you ensure high availability and durability for critical data stored on AWS?
To ensure high availability and durability, you would typically store data redundantly across multiple Availability Zones within an AWS Region using services like Amazon S3, which offers 11 nines of durability. Optionally, you can also replicate the data across multiple AWS regions using features like cross-region replication to protect against regional failures.
When would you recommend using Amazon Glacier over Amazon S3 for storage?
Amazon S3 Glacier is recommended for long-term archival storage where data retrieval times of several minutes to hours are acceptable. It’s significantly cheaper for storing data that is rarely accessed, such as regulatory archives and digital preservation, compared to the standard S3 tiers that are designed for more frequent access.
What factors do you consider when selecting between file storage, block storage, and object storage for a given application?
Factors to consider include the access pattern (random or sequential), performance requirements, scalability needs, data structure (file-based, block, or object), the type of workload (database, file sharing, media storage), data retrieval times, and cost-effectiveness.
In which scenario would you consider using AWS Storage Gateway, and why?
AWS Storage Gateway is a hybrid storage service that enables on-premises applications to use AWS cloud storage. You would consider it when you need to integrate on-premises IT environments with cloud storage for backup and archiving, disaster recovery, or data processing, allowing you to maintain existing workflows while leveraging the scalability and cost benefits of the cloud.
How does AWS DataSync assist in migrating large datasets to the cloud, and how do you know when to use it?
AWS DataSync automates and accelerates moving large amounts of data into and out of AWS storage services over the internet or AWS Direct Connect. It’s particularly useful for transferring large datasets, like data lakes or media libraries, and when migrating active data sets to AWS. Use DataSync when you require faster transfer times than traditional methods can provide, need to schedule regular transfers, or must minimize operational overhead.
What are the best practices for ensuring the security of data at rest in AWS S3?
Best practices include enabling S3 default encryption to encrypt objects server-side, using AWS Key Management Service (AWS KMS) for managing encryption keys, implementing bucket policies to control access to the data, enabling versioning to protect against accidental deletions, and using access logs to monitor requests to the buckets.
How would you approach the decision to use Provisioned IOPS with Amazon EBS, and what are the factors that influence this choice?
Provisioned IOPS are chosen when you have workloads with demanding and predictable performance needs, like high-performance database applications. Factors influencing this choice include the required IOPS for your application, the throughput required, and whether the workload experiences peak times with higher performance demands.
Discuss the ideal use cases for using Amazon FSx in comparison to other AWS storage services.
Amazon FSx provides fully managed third-party file systems. FSx for Windows is ideal for Windows-based applications that require Windows file system features, while FSx for Lustre is designed for compute-intensive workloads, like high-performance computing, machine learning, and media processing, providing high-speed and concurrent access to large datasets. Choose FSx when compatibility with specific file system features and native performance is required.
What measures can you take to control costs when using Amazon EBS?
To control costs with Amazon EBS, monitor and delete unused volumes, choose the correct volume type (General Purpose SSD, Provisioned IOPS SSD, Throughput Optimized HDD, Cold HDD) for your workload, leverage EBS snapshots for backups but manage snapshot lifecycle, and consider using reserved instances for predictable workloads that have long-term steady state usage.
Great post! Really helpful for my SAP-C02 exam prep.
Can anyone explain the difference between S3 and EFS in the context of exam scenarios?
Thanks for the clarification, it definitely clears things up for me!
I appreciate this article, it gives in-depth knowledge.
Could someone help me understand when to use Glacier vs S3 Infrequent Access?
This blog post is very well-organized and comprehensive.
Found a typo on the section discussing EBS-optimized instances.
What about FSx for Lustre? When should I consider it?