Create graphs (for example, scatter plots, time series, histograms, box plots).
Determine whether there is sufficient labeled data. Identify mitigation strategies. Use data labelling tools (for example, Amazon Mechanical Turk).
Select from among classification, regression, forecasting, clustering, and recommendation models.
Express the intuition behind models.
Update and retrain models, Batch or real-time/online
Format, normalize, augment, and scale data.
Understand AWS infrastructure (for example, instance types) and cost considerations.
Choose appropriate compute resources (for example GPU or CPU, distributed or non-distributed), Choose appropriate compute platforms (Spark or non-Spark).
Determine when to build custom models and when to use Amazon SageMaker built-in algorithms.
Handle ML-specific data by using MapReduce (for example, Apache Hadoop, Apache Spark, Apache Hive).
Compare models by using metrics (for example, time to train a model, quality of model, engineering costs).
Create Docker containers.