Identify data sources (for example, content and location, primary sources such as user data).
Orchestrate data ingestion pipelines (batch-based ML workloads and streaming-based ML workloads) – Amazon Kinesis , Amazon Kinesis Data Firehose , Amazon EMR , AWS Glue , Amazon Managed Service for Apache Flink
Know the difference between supervised and unsupervised learning.
Understand AWS service quotas.
Understand optimization techniques for ML training (for example, gradient decent, loss functions, convergence).
Create AMIs and golden images.
Understand neural network architecture (layers and nodes), learning rate, and activation functions.
Perform regularization. Drop out, L1/L2
Understand tree-based models (number of trees, number of levels).
Identify data job styles and job types (for example, batch load, streaming).
Use Spot Instances to train deep learning models by using AWS Batch.
Follow AWS best practices.
Log and monitor AWS environments. AWS CloudTrail and Amazon CloudWatch. Build error monitoring solutions.
AWS Identity and Access Management (IAM)
Perform cross validation.