Question 1

What is Data Engineering?

Accepted Answer

Data Engineering is the discipline of designing, building, and operating the systems that collect, transform, and deliver data. A data engineer builds the pipelines that feed data analysts, data scientists, and applications.

Question 2

What is the difference between ETL and ELT?

Accepted Answer

In ETL (Extract-Transform-Load), transformation happens before loading into the database. In ELT (Extract-Load-Transform), raw data is first loaded into the data warehouse, then transformed using SQL. ELT has become dominant with cloud warehouses (BigQuery, Snowflake, Redshift) that can transform very large volumes efficiently.

Question 3

What is a Data Lakehouse?

Accepted Answer

The Lakehouse combines the advantages of a Data Lake (low-cost raw storage, open formats) and a Data Warehouse (ACID guarantees, performance, governance). Formats like Delta Lake, Apache Iceberg, or Apache Hudi allow querying Parquet data directly with transactional guarantees.

Question 4

Which tools should a Data Engineer know?

Accepted Answer

The fundamentals: SQL, Python, an orchestrator (Airflow, Prefect, Dagster), a transformation tool (dbt), and proficiency in at least one cloud (AWS, GCP, or Azure). Modern stacks often add Spark or Flink for large-scale processing.

Question 5

How do I retain these concepts with memia?

Accepted Answer

The 7 decks in this domain cover architecture, pipelines, modeling, batch/streaming, storage, quality, and observability. Start with Data Architecture (Fundamentals), then progress by learning track.

Data Engineering

7 decks to master Data Engineering

FAQ — Data Engineering