Tutorial: AWS Certified Solutions Architect - Associate (SAA-C03)

Storage access patterns

Concepts

Random Access:
- Applications read and write small amounts of data from any location.
- Suitable for databases like Amazon RDS or Amazon DynamoDB.
Sequential Access:
- Data is read and written in a sequential manner.
- Common in big data processing tasks, which can be handled by Amazon EMR or Amazon Kinesis.
Large Binary Files:
- Hosting and delivering large binary files like videos, images, or software binaries.
- Amazon S3 is a good fit due to its durability and scalability.

Understanding Access Patterns

Frequency of Access:
- Hot Data: Frequently accessed, requiring high-performance storage solutions like Amazon EFS for file storage or provisioned IOPS SSD for Amazon RDS.
- Cold Data: Infrequently accessed, where Amazon S3 Glacier or S3 Standard-Infrequent Access (S3 Standard-IA) can be more cost-effective.
Size of Data Objects:
- Small Objects: Amazon DynamoDB is optimized for small, consistent, low-latency reads and writes.
- Large Objects: Amazon S3 is optimized for storing large objects like media files or backups.
Read/Write Operations:
- High Read/Write Throughput: Amazon DynamoDB Accelerator (DAX) or ElastiCache to cache data for high throughput.
- Balanced Read/Write: A combination of Amazon EBS for block storage and Amazon RDS for transactional data.
Data Lifecycle:
- Utilizing Amazon S3 lifecycle policies to transition data between different storage classes automatically.

Examples of Access Patterns and AWS Services

E-Commerce Website:
- Usage Pattern: High read and write throughput due to user traffic, product catalog interactions, and transaction processing.
- Suggested Services: Amazon RDS for transactional data with Multi-AZ deployments for high availability and durability, Amazon DynamoDB for user session data, and Amazon S3 for storing product images and files.
Data Archiving Solution:
- Usage Pattern: Infrequent access to archived data with regulatory compliance needs.
- Suggested Services: Amazon S3 Glacier for long-term archival with S3 lifecycle policies to transition data from S3 Standard to S3 Glacier.
Media Streaming Platform:
- Usage Pattern: Large files that need to be delivered globally with low latency.
- Suggested Services: Amazon S3 for storage in combination with Amazon CloudFront for content delivery to minimize latency.
Log Analytics:
- Usage Pattern: Sequential writing of log data with periodic analysis.
- Suggested Services: Amazon Kinesis for real-time data streaming and Amazon S3 for storage in combination with Amazon Athena or Amazon EMR for analytics.

Choosing the Right Storage Solution

Storage Requirement	AWS Service	Use Case Scenarios
Block Storage	Amazon EBS	EC2 instances, databases
File Storage	Amazon EFS	Shared file storage across EC2, containerized applications
Object Storage	Amazon S3	Web content, backups, data lakes
Relational Databases	Amazon RDS	Transactional workloads, SQL queries
NoSQL Databases	Amazon DynamoDB	Fast, flexible non-relational data
Data Archiving	Amazon S3 Glacier	Low-cost long-term archival
Content Delivery	Amazon CloudFront	Global distribution of content
Big Data Processing	Amazon EMR	Processing large-scale data sets
In-Memory Caching	ElastiCache	Speeding up read-heavy application workloads
Real-Time Data Streaming	Amazon Kinesis	Real-time data processing for streaming data

Selecting the appropriate storage solution involves understanding the access pattern, matching it with the right AWS service, and then optimizing for cost, performance, and scalability based on these patterns.

When studying for the AWS Certified Solutions Architect – Associate exam, you’ll want to explore these storage services and their access patterns in more depth, including how to effectively design and scale these solutions for different applications and use cases. Understanding how to evaluate these solutions using the AWS Well-Architected Framework will also be beneficial in preparing for the exam scenarios.

Answer the Questions in Comment Section

T/F: Amazon S3 is a suitable storage option for frequently accessed, structured data that requires complex queries.

True
False

Answer: False

Explanation: Amazon S3 is best for unstructured data and is ill-suited for complex queries. Amazon RDS or Amazon Redshift would be better for structured data requiring complex queries.

T/F: Amazon EBS provides block-level storage volumes for use with Amazon EC2 instances.

True
False

Answer: True

Explanation: Amazon EBS does indeed provide block-level storage volumes that are used with EC2 instances for both throughput-intensive and transaction-intensive workloads at any scale.

Which AWS service is optimized for high-frequency read/write access with millisecond latency?

Amazon S3
Amazon EFS
Amazon Glacier
Amazon DynamoDB

Answer: Amazon DynamoDB

Explanation: Amazon DynamoDB is optimized for high-frequency read/write operations and delivers single-digit millisecond latency.

T/F: Amazon S3 Intelligent-Tiering is designed to optimize costs by automatically moving data between access tiers when access patterns change.

True
False

Answer: True

Explanation: S3 Intelligent-Tiering moves data across different tiers to optimize costs based on access patterns, without performance impact or operational overhead.

Which type of Amazon EBS volume is best for frequently accessed workloads?

Provisioned IOPS SSD (io1/io2)
Throughput Optimized HDD (st1)
General Purpose SSD (gp2/gp3)
Cold HDD (sc1)

Answer: General Purpose SSD (gp2/gp3)

Explanation: General Purpose SSD volumes provide a balance of performance and cost, making them suitable for a broad range of frequently accessed workloads.

T/F: Amazon S3 One Zone-Infrequent Access (S3 One Zone-IA) stores data redundantly across multiple AZs (Availability Zones) within a region.

True
False

Answer: False

Explanation: S3 One Zone-IA stores data in a single AZ and is cheaper than S3 Standard-IA, which stores data redundantly across multiple AZs.

Which AWS service is a scalable file storage solution for use with AWS Cloud services and on-premises resources?

Amazon S3
Amazon EFS
Amazon EC2
Amazon Glacier

Answer: Amazon EFS

Explanation: Amazon EFS is a scalable file storage solution that can be used with both AWS Cloud services and on-premises resources.

A data access pattern that requires infrequent access, but rapid retrieval when needed, is best suited for which storage class?

Amazon S3 Standard
Amazon S3 Glacier
Amazon S3 Standard-Infrequent Access (Standard-IA)
Amazon S3 Intelligent-Tiering

Answer: Amazon S3 Standard-Infrequent Access (Standard-IA)

Explanation: S3 Standard-IA is designed for data that is less frequently accessed, but requires rapid access when needed, at a lower cost than S3 Standard.

T/F: Amazon RDS is ideal for use cases where you need low-latency and high-throughput performance for file-based workloads.

True
False

Answer: False

Explanation: Amazon RDS is a managed relational database service for structured data; for low-latency and high-throughput performance for file-based workloads, Amazon EFS or Amazon FSx would be more appropriate.

During which scenario would you recommend Amazon S3 Glacier over other S3 storage classes?

For multimedia content delivery
For data warehousing
For archival of cold data
For big data analytics

Answer: For archival of cold data

Explanation: Amazon S3 Glacier is a very low-cost storage service that provides secure, durable, and flexible storage for data archiving and online backup, making it ideal for cold data archival.

Which AWS service is best suited for batch processing workloads that require sequential read/write operations?

Amazon DynamoDB
Amazon EFS
Amazon S3
Amazon EBS Throughput Optimized HDD (st1)

Answer: Amazon EBS Throughput Optimized HDD (st1)

Explanation: Throughput Optimized HDD (st1) volumes are designed for large, sequential read/write workloads, such as log processing, making them ideal for batch processing workloads.

T/F: You can use AWS Storage Gateway to integrate on-premises applications with cloud-based storage automatically to optimize the storage layer for frequently accessed files.

True
False

Answer: True

Explanation: AWS Storage Gateway provides a set of services enabling hybrid storage between on-premises environments and AWS’s cloud, optimizing storage automatically based on access patterns.

0 0 votes

Article Rating

21 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Antonios Ross

10 months ago

Great post on storage access patterns! Really helped with my prepping for the AWS Certified Solutions Architect exam.

Mark Wolfrum

11 months ago

Can anyone explain the difference between S3 Standard and S3 Intelligent-Tiering in terms of access pattern?

Faith Harper

10 months ago

Thank you for this detailed article!

Kylie Davidson

11 months ago

What’s the best practice for choosing between EFS and EBS for storage in AWS?

Vesna Janković

9 months ago

I think the article missed discussing the benefits of S3 Glacier for archival storage.

Patrick Wilson

11 months ago

In my experience, S3 Lifecycle policies are crucial for managing costs in data storage. Anyone else using them?

Anika Fries

10 months ago

What’s the latency like with S3 Intelligent-Tiering?

Ian Bennett

11 months ago

I had some trouble grasping the different types of access patterns discussed. Any simpler way to understand these?

Storage access patterns

Concepts

Understanding Access Patterns

Examples of Access Patterns and AWS Services

Choosing the Right Storage Solution

Answer the Questions in Comment Section

T/F: Amazon S3 is a suitable storage option for frequently accessed, structured data that requires complex queries.

T/F: Amazon EBS provides block-level storage volumes for use with Amazon EC2 instances.

Which AWS service is optimized for high-frequency read/write access with millisecond latency?

T/F: Amazon S3 Intelligent-Tiering is designed to optimize costs by automatically moving data between access tiers when access patterns change.

Which type of Amazon EBS volume is best for frequently accessed workloads?

T/F: Amazon S3 One Zone-Infrequent Access (S3 One Zone-IA) stores data redundantly across multiple AZs (Availability Zones) within a region.

Which AWS service is a scalable file storage solution for use with AWS Cloud services and on-premises resources?

A data access pattern that requires infrequent access, but rapid retrieval when needed, is best suited for which storage class?

T/F: Amazon RDS is ideal for use cases where you need low-latency and high-throughput performance for file-based workloads.

During which scenario would you recommend Amazon S3 Glacier over other S3 storage classes?

Which AWS service is best suited for batch processing workloads that require sequential read/write operations?

T/F: You can use AWS Storage Gateway to integrate on-premises applications with cloud-based storage automatically to optimize the storage layer for frequently accessed files.

Related Post

Access options (for example, an S3 bucket with Requester Pays object storage)

AWS cost management service features (for example, cost allocation tags, multi-account billing)

AWS cost management tools with appropriate use cases (for example, AWS Cost Explorer, AWS Budgets, AWS Cost and Usage Report)