Tutorial / Cram Notes
Amazon Web Services (AWS) offers an extensive range of storage services that cater to different use cases, performance needs, and access patterns. Two of its primary services are Amazon Simple Storage Service (Amazon S3) and the Amazon Elastic File System (Amazon EFS), each designed for specific storage requirements. Understanding these services’ characteristics and use cases is crucial for the AWS Certified Solutions Architect – Professional (SAP-C02) exam.
Amazon Simple Storage Service (Amazon S3)
Amazon S3 is an object storage service that offers scalability, data availability, security, and performance. It is designed to store and retrieve any amount of data from anywhere on the web. It’s widely used for backup and recovery, data archiving, and as the storage layer for applications deployed on AWS.
Key Features of Amazon S3:
- Durability and Availability: S3 stores data across multiple facilities and devices, ensuring 99.999999999% (11 9’s) of durability. Its standard storage class guarantees 99.99% availability over a given year.
- Security: Offers robust access controls, including ACLs and bucket policies, to manage permissions. S3 also supports encryption in transit and at rest.
- Performance: S3 is optimized for high-speed transfers and can handle rapid uploads and downloads. S3 Transfer Acceleration can further enhance transfer speeds to and from the S3 bucket.
- Scalability: Virtually infinite scalability, handling large amounts of data without a decrease in performance.
- Cost-effectiveness: S3 offers various storage classes that allow cost optimization based on access patterns, such as S3 Standard, S3 Intelligent-Tiering, S3 Standard-IA (Infrequent Access), S3 One Zone-IA, and S3 Glacier for archiving.
- Event Notifications: Can configure S3 to send notifications in response to specific events, like object creation or deletion, which integrates with AWS Lambda, SQS, and SNS.
Example Use Cases:
- Hosting static websites or content such as images or videos.
- Data lakes for analytics purposes.
- Disaster recovery storage for critical data.
Amazon Elastic File System (Amazon EFS)
Amazon EFS is a fully-managed file storage service that makes it easy to set up and scale file storage in the AWS Cloud. It provides a simple, scalable, and fully managed elastic NFS file system for use with AWS Cloud services and on-premises resources.
Key Features of Amazon EFS:
- Elasticity: Automatically grows and shrinks as you add and remove files, which means you pay only for the storage you use.
- Performance: Offers two performance modes (General Purpose and Max I/O) that can support a broad spectrum of use cases.
- Scalability: Can scale up to petabytes in size while maintaining consistent performance.
- Availability and Durability: Designed to be highly available and durable, spreading file system objects across multiple Availability Zones.
- Shared File Access: Multiple Amazon EC2 instances can access an EFS file system simultaneously, providing a common data source for workloads and applications running on more than one instance.
- Security: Supports POSIX permissions, encryption at rest, and encryption in transit.
Example Use Cases:
- Serving content management systems where data is accessed and updated frequently.
- Shared storage for container and microservices-based architectures.
- Data storage for enterprise applications that require a file system interface and file system semantics.
Comparison: Amazon S3 vs Amazon EFS
Feature | Amazon S3 | Amazon EFS |
---|---|---|
Storage Type | Object Storage | File Storage |
Use Cases | Static website hosting, backups, archiving | Shared file systems, content management |
Performance | High-speed transfers | Consistent, low-latency performance |
Scalability | Unlimited storage, objects can be terabytes in size | Scales to petabytes, multiple concurrent instances access |
Durability | 11 9’s | 99.999999999% for objects, distributed across AZs |
Availability | 99.99% Standard | 99.999% availability over a year |
Pricing | Pay for what you use, various storage classes for cost optimization | Pay for the amount of storage used in a month |
Security | Robust ACLs, bucket policies, encryption | POSIX permissions, encryption at rest and in transit |
In preparation for the AWS Certified Solutions Architect – Professional exam, candidates must understand how to properly architect solutions utilizing these storage options. When deciding between Amazon S3 and Amazon EFS, one must consider the type of data, the access patterns, the scalability needs, and the specific requirements of the application or workload.
A professional solutions architect would, for instance, select Amazon S3 for storing static assets for a global application due to its content distribution capabilities when integrated with Amazon CloudFront. On the other hand, they might opt for Amazon EFS for a shared file system needed by multiple EC2 instances running a media processing application that requires fast, shared file system access.
No matter the use case, AWS storage services are designed to provide secure, scalable, and performance-optimized options for your architecture needs. As such, being knowledgeable about these services is essential for a successful run at the AWS Certified Solutions Architect – Professional certification.
Practice Test with Explanation
True or False: Amazon S3 guarantees 999999999% (11 9’s) durability for objects over a given year.
- (A) True
- (B) False
Answer: A) True
Explanation: Amazon S3 offers 999999999% (11 9’s) durability of objects over a given year. This level of durability is achieved by automatically replicating data across multiple servers and facilities.
What storage class would you typically use for infrequently accessed data that requires rapid access when needed?
- (A) Amazon S3 Standard
- (B) Amazon S3 One Zone-Infrequent Access (S3 One Zone-IA)
- (C) Amazon S3 Glacier
- (D) Amazon S3 Intelligent-Tiering
Answer: B) Amazon S3 One Zone-Infrequent Access (S3 One Zone-IA)
Explanation: Amazon S3 One Zone-Infrequent Access (S3 One Zone-IA) is designed for data that is accessed less frequently, yet requires rapid access when needed.
True or False: Amazon EFS shares can be mounted on premises via AWS Direct Connect or AWS VPN.
- (A) True
- (B) False
Answer: A) True
Explanation: Amazon EFS file systems can be mounted on premises over AWS Direct Connect or AWS VPN, which allows you to seamlessly connect on-premises infrastructure to AWS storage services.
Multiple Select: Which of the following are key features of Amazon S3 Glacier?
- (A) Instant data access
- (B) Secure and durable storage
- (C) Low-cost storage for data archiving
- (D) Vault lock policies
Answer: B) Secure and durable storage, C) Low-cost storage for data archiving, D) Vault lock policies
Explanation: Amazon S3 Glacier provides secure, durable, and low-cost storage for data archiving. Instant data access is not a feature of S3 Glacier since it is designed for long-term storage with retrieval times ranging from minutes to hours.
True or False: You cannot enforce encryption of data at rest in Amazon S3 at the bucket level.
- (A) True
- (B) False
Answer: B) False
Explanation: You can enforce encryption of data at rest in Amazon S3 by setting a bucket policy to deny any PUT request that does not include the x-amz-server-side-encryption parameter in the request header.
If you require a file storage service that is designed to be used with traditional file system interface semantics, which AWS service would you choose?
- (A) Amazon S3
- (B) Amazon EBS
- (C) Amazon EFS
- (D) Amazon S3 Glacier
Answer: C) Amazon EFS
Explanation: Amazon Elastic File System (Amazon EFS) is designed to provide a file system interface, file system access semantics (like read-after-write consistency), and concurrently-accessible storage for multiple EC2 instances.
True or False: You can store objects of up to 5 terabytes in size in Amazon S
- (A) True
- (B) False
Answer: A) True
Explanation: Amazon S3 supports objects up to 5 terabytes in size. This allows you to store large files as a single object, simplifying access and management.
Which AWS service provides a block storage service that is designed to be used with Amazon EC2 instances?
- (A) Amazon S3
- (B) Amazon EBS
- (C) Amazon EFS
- (D) Amazon S3 Glacier
Answer: B) Amazon EBS
Explanation: Amazon Elastic Block Store (Amazon EBS) provides block-level storage volumes for use with Amazon EC2 instances. EBS volumes offer the consistent and low-latency performance needed to run your workloads.
True or False: Amazon S3 Intelligent-Tiering is a storage class designed for data with unknown or changing access patterns.
- (A) True
- (B) False
Answer: A) True
Explanation: The Amazon S3 Intelligent-Tiering storage class is designed for data with unknown or changing access patterns, as it automatically moves data to the most cost-effective tier based on how frequently it is accessed.
Multiple Select: Which of the following options are available for accelerating transfers to Amazon S3?
- (A) Amazon S3 Transfer Acceleration
- (B) Amazon S3 Multi-Part Upload
- (C) AWS Direct Connect
- (D) Amazon S3 Glacier
Answer: A) Amazon S3 Transfer Acceleration, B) Amazon S3 Multi-Part Upload, C) AWS Direct Connect
Explanation: Amazon S3 Transfer Acceleration enables fast, easy, and secure transfers of files over long distances. Multi-Part Upload allows for faster upload by splitting files and uploading them in parts. AWS Direct Connect provides a dedicated network connection between on-premises and AWS for consistent transfer rates.
True or False: Amazon EFS is not compatible with Linux-based AMIs.
- (A) True
- (B) False
Answer: B) False
Explanation: Amazon EFS is designed to work with Linux-based AMIs, as it provides a standard file system interface and file system semantics that can be mounted on any Linux-based EC2 instance.
Which of the following options is NOT an Amazon S3 storage class?
- (A) S3 Standard-IA
- (B) S3 Reduced Redundancy Storage
- (C) S3 Standard
- (D) S3 Provisioned IOPS
Answer: D) S3 Provisioned IOPS
Explanation: S3 Provisioned IOPS is not a storage class in Amazon S It is actually an EBS volume type. Amazon S3 storage classes include options like S3 Standard, S3 Standard-IA, and S3 Reduced Redundancy Storage (although S3 RRS is being phased out in favor of other storage classes with higher durability).
Interview Questions
Can you describe what Amazon S3 is and when you would use it over Amazon EFS?
Amazon S3 is an object storage service that offers industry-leading scalability, data availability, security, and performance. It is typically used for storing and retrieving any amount of data, such as websites, mobile applications, backup and restore, archive, enterprise applications, IoT devices, and big data analytics. You would use Amazon S3 over Amazon EFS when you need a highly durable, scalable, and secure object storage solution that isn’t limited by file system structure or size and when you require features such as versioning and lifecycle policies.
How does Amazon EFS differ from Amazon EBS?
Amazon Elastic File System (EFS) is a scalable file storage service for use with AWS Cloud services and on-premises resources. It’s easy to use and offers a simple interface that allows you to create and configure file systems quickly. Amazon Elastic Block Store (EBS) provides block-level storage volumes for use with EC2 instances. EFS is used when you need a file storage system that can be shared across different EC2 instances, while EBS is used when you need persistent block-level storage for a single EC2 instance.
What is the significance of S3 storage classes and when would you recommend using S3 Intelligent-Tiering?
S3 storage classes are designed to cater to different use cases by offering various levels of accessibility, durability, and cost. S3 Intelligent-Tiering is recommended when you have data with unknown or changing access patterns, as it automatically moves data to the most cost-effective storage tier without performance impact or retrieval fees.
What mechanisms does Amazon S3 provide to secure data at rest?
Amazon S3 provides several mechanisms to secure data at rest, including server-side encryption with Amazon S3-managed keys (SSE-S3), AWS Key Management Service (AWS KMS) keys (SSE-KMS), or customer-provided keys (SSE-C). Additionally, S3 supports default encryption on buckets to ensure all new objects are encrypted.
How does Amazon EFS’s performance mode and throughput mode affect file system operations, and when would you choose one over the other?
Amazon EFS offers two performance modes: General Purpose, which is suitable for most file systems and provides a balance between latency and throughput, and Max I/O, which is optimized for high levels of aggregate throughput and operations per second at the cost of slightly higher latencies. You should choose Max I/O when your workload requires high levels of throughput. EFS also provides two throughput modes: Bursting Throughput and Provisioned Throughput. Bursting Throughput scales with the size of the file system, while Provisioned Throughput can be configured for applications that need a higher level of sustained throughput beyond what Bursting Throughput provides.
What is the primary difference between Amazon S3 and Amazon Glacier, and how would you determine which service to use for data storage?
The primary difference between Amazon S3 and Amazon Glacier is related to data access time and cost. Amazon S3 is designed for frequent data access with low latency, while Amazon Glacier is a low-cost storage option for data archiving, with retrieval times ranging from minutes to hours. You would determine which service to use based on the required access patterns—use Amazon S3 for active data and Amazon Glacier for long-term data archiving where access times can be flexible.
Can you explain the concept of S3 Lifecycle policies and how they help manage costs?
S3 Lifecycle policies help manage costs by automatically migrating objects to more cost-effective storage classes or by purging objects that are no longer needed. For example, you can create rules to move objects to S3 Standard-IA for infrequent access after a certain period or archive objects to Amazon Glacier after a set time. Lifecycle policies are also used to automatically delete expired objects or incomplete multipart uploads.
What is Amazon S3 Versioning, and what are the benefits of using it?
Amazon S3 Versioning is a feature that keeps multiple versions of an object within the same bucket. This means that you can easily recover from both unintended user actions and application failures. Benefits include easy recovery of data, protection against accidental overwrites and deletions, and the ability to retain a full history of changes.
How can you improve data transfer performance when uploading large amounts of data to Amazon S3?
To improve data transfer performance, you can use Amazon S3 Transfer Acceleration for faster uploads over long distances, enable multipart uploads to break down large files into smaller chunks and upload them in parallel, and leverage Amazon S3’s parallel requests capability to increase throughput.
How would you ensure high availability and durability of data using Amazon EFS?
Amazon EFS is designed to be highly available and durable, automatically replicating data across multiple Availability Zones within an AWS Region to prevent data loss due to failures. To further ensure high availability, you should deploy AWS resources that interact with EFS across multiple Availability Zones. For instance, setting up instances that mount the EFS file system in different Availability Zones can provide continued access to data in case one zone becomes unavailable.
In what scenarios would you consider using Amazon S3’s Cross-Region Replication (CRR)?
You would consider using Amazon S3’s Cross-Region Replication in scenarios where you need to comply with data residency requirements, want to replicate data for disaster recovery purposes, or want to maintain lower-latency access for users in different geographic locations.
What is the difference between Amazon S3’s “Standard” and “Reduced Redundancy Storage (RRS)” storage classes, and is RRS recommended for critical data?
The “Standard” storage class in Amazon S3 is designed to provide 999999999% (11 9’s) of durability and is suitable for critical data with high availability needs. Reduced Redundancy Storage (RRS) was an older offering that provided less durability at a lower cost, and was suitable for non-critical or reproducible data. However, RRS is no longer recommended for any data, and AWS has phased it out in favor of other more cost-effective and durable storage classes like S3 Standard-IA and S3 One Zone-IA.
Great post! Amazon S3’s versatility really sets it apart from other storage services.
Can someone explain the differences between Amazon S3 and Amazon EFS in terms of use cases?
This blog post is a life-saver for my SAP-C02 prep. Thanks!
Quick question: How do you handle data consistency in Amazon S3?
I find Amazon EFS a bit slower compared to Amazon S3. Anyone else?
Highly recommend diving deeper into the storage classes of S3 like Glacier and Intelligent-Tiering for the exam.
Insightful as always! The lifecycle policies in Amazon S3 were explained so clearly.
Do I need to use both Amazon S3 and EFS, or can I just stick with one?