How to delete data to meet business and legal requirements
Data retention policies and archiving strategies
How to protect data with appropriate resiliency and availability
How to ensure accuracy and trustworthiness of data by using data lineage
Best practices for indexing, partitioning strategies, compression, and other data optimization techniques
How to model structured, semi-structured, and unstructured data
Schema evolution techniques
How to maintain and troubleshoot data processing for repeatable business outcomes
API calls for data processing
Which services accept scripting (for example, Amazon EMR, Amazon Redshift, AWS Glue)
Tradeoffs between provisioned services and serverless services
SQL queries (for example, SELECT statements with multiple qualifiers or JOIN clauses)
How to visualize data for analysis
When and how to apply cleansing techniques
Data aggregation, rolling average, grouping, and pivoting