40 cardsPremium

Data Architecture

Learn how modern data platforms are structured. This subtheme covers the architecture of data systems, data pipelines, and the patterns used to organize and process information at scale.

Language
English
Theme
Digital & Data Transformation
Category
Business & Decision

Why learn with flashcards?

Flashcards combined with spaced repetition improve active recall. You review at the right time, retain knowledge longer, and track progress card by card.

Sample flashcards from this deck

Card 1

What does treating data as a shared asset enable?

Consistent reuse of trusted data across teams and use cases.

Explanation

Seeing data as a shared asset drives common standards, shared infrastructure, and cross-domain reuse.

Common mistake

Thinking data belongs only to the application that first created it, instead of the whole organization.

Card 2

What is the key benefit of separating storage and compute?

Storage and processing can scale independently.

Explanation

Independent scaling helps control cost and performance by adjusting only the constrained resource.

Common mistake

Assuming separation of storage and compute automatically removes all performance bottlenecks.

Card 3

What defines schema-on-write?

Data must fit a predefined structure before storage.

Explanation

Schema-on-write enforces validation at ingestion, which simplifies downstream querying but reduces flexibility.

Common mistake

Confusing schema-on-write with having no schema and total flexibility at storage time.

Card 4

What does choosing availability over consistency prioritize?

Serving responses even if data is temporarily stale.

Explanation

Favoring availability accepts eventual, not immediate, consistency when partitions or delays occur.

Common mistake

Believing high availability always comes with perfectly up-to-date, strongly consistent data.

Card 5

What makes data governance an architectural concern rather than only a policy document?

It requires embedding controls for access, quality, and compliance directly into data systems.

Explanation

Architecture must support governance with capabilities like authorization, validation, and auditing.

Common mistake

Treating governance as a purely bureaucratic activity disconnected from platform design.

Card 6

Why is data lineage a design requirement?

It shows how data was produced, transformed, and used.

Explanation

Lineage makes it possible to trace data from source to consumption, which supports debugging, impact analysis, compliance, trust, and controlled change management.

Common mistake

Assuming lineage can be reconstructed later without designing systems and metadata to capture it properly.

Card 7

What is the main goal of a data catalog?

Helping users find and understand datasets through metadata.

Explanation

A data catalog improves discoverability by centralizing metadata such as definitions, owners, tags, and usage context, so users can find and understand available datasets.

Common mistake

Treating a catalog like a storage tool rather than a discovery and metadata layer.

Card 8

What is the main difference between operational and analytical systems?

Operational systems run transactions; analytical systems support reporting and analysis.

Explanation

Transactional workloads optimize for many small updates; analytical workloads optimize for large read-heavy queries.

Common mistake

Designing a single database to efficiently handle both heavy analytics and critical transactions.

Card 9

What is the purpose of a landing zone in a data platform?

To store newly ingested data with minimal processing.

Explanation

A landing zone is the initial area where incoming data is captured with minimal alteration, preserving source fidelity before validation, transformation, and downstream modeling.

Common mistake

Skipping the landing zone and loading unvalidated source data directly into curated consumption layers.

Card 10

What defines the curated layer in an analytics platform?

Cleaned, modeled, business-ready data for broad use.

Explanation

The curated layer contains cleaned, structured, and business-aligned data designed for reliable reuse across reporting, analytics, and self-service use cases.

Common mistake

Confusing curated data with raw ingestion outputs or assuming curation means copying source tables as-is.

Ready to learn faster?

Create your Memia account to unlock this deck and start focused practice sessions with progress tracking.