Data Engineering
From pipeline design to production observability: 7 decks to master data engineering, from ingestion through delivery.
Decks in this domain
7 decks to master Data Engineering
Frequently asked questions
FAQ — Data Engineering
What is Data Engineering?
Data Engineering is the discipline of designing, building, and operating the systems that collect, transform, and deliver data. A data engineer builds the pipelines that feed data analysts, data scientists, and applications.
What is the difference between ETL and ELT?
In ETL (Extract-Transform-Load), transformation happens before loading into the database. In ELT (Extract-Load-Transform), raw data is first loaded into the data warehouse, then transformed using SQL. ELT has become dominant with cloud warehouses (BigQuery, Snowflake, Redshift) that can transform very large volumes efficiently.
What is a Data Lakehouse?
The Lakehouse combines the advantages of a Data Lake (low-cost raw storage, open formats) and a Data Warehouse (ACID guarantees, performance, governance). Formats like Delta Lake, Apache Iceberg, or Apache Hudi allow querying Parquet data directly with transactional guarantees.
Which tools should a Data Engineer know?
The fundamentals: SQL, Python, an orchestrator (Airflow, Prefect, Dagster), a transformation tool (dbt), and proficiency in at least one cloud (AWS, GCP, or Azure). Modern stacks often add Spark or Flink for large-scale processing.
How do I retain these concepts with memia?
The 7 decks in this domain cover architecture, pipelines, modeling, batch/streaming, storage, quality, and observability. Start with Data Architecture (Fundamentals), then progress by learning track.
Access Data Engineering decks
7 decks, 295 cards. Retain the fundamentals with spaced repetition.