database internals

Databases aren't magic — let's open the hood.

start:

Mar 31, 2026

25 classes

350 $/mo

online

350 $/mo

online

signUp ↓

what's inside

How do you build a database management system? Why are there so many types of DBMS, with new ones appearing every year? How do primary keys, joins, and ORDER BY actually work under the hood? How is data stored on disk, and what optimizations do modern systems apply? If you want answers to these and other questions about building DBMS — this course is for you.

Throughout the course, we'll trace the complete journey of building a database — from parsing queries to executing and scaling them. You'll discover the algorithms and concepts that power modern DBMS and build your own from scratch.

This course is valuable for anyone looking to dive deeper into systems programming, DBMS users who want to understand internal database mechanics and learn to optimize them more effectively. The best learning experience comes with Rust or C++, as they let you explore database architecture more deeply and get hands-on with memory management and performance — but this is just a recommendation.

curriculum
prepare yourself

From file systems, hierarchical and network models to relational databases
The Relational Model Revolution and Its Impact on the Industry
The emergence of SQL, the development of object-oriented and NoSQL databases
Current trends: cloud, serverless, lakehouse and edge databases
Basics of query parsing: tokenization, SQL grammar, constructing an AST

Introduction to the course. History of databases. Parsing SQL queries

Tokenization, SQL grammar, AST construction

Relationships, attributes, tuples, and keys
Primary and foreign keys. Ensuring data integrity
Basic operations of relational algebra: selection, projection, union, difference, Cartesian product
Joins: inner, outer, natural
Properties of algebra - commutativity, associativity. The role of algebra as a basis for query optimization

Relational model. Relational algebra.

Formal foundations of relational DBMSs and operations underlying SQL

Pages and blocks as basic storage units
Organizing records in pages: fixed and variable length
Row-store and its applications
Data fragmentation and defragmentation methods

Data warehouse basics.

How data is organized on disk and in memory, and why it affects performance

B-Tree and B+Tree: structure, search, insertion and deletion algorithms
LSM trees: how log-structured merges work
Write/Read amplification and compaction mechanisms
Using Bloom filters to speed up searches

Indexes. B-tree and LSM.

Data structures to speed up searches, inserts, and range scans

Row-oriented vs column-oriented storage
Advantages of the columnar approach for analytical queries
Data compression methods: RLE, dictionary encoding, delta encoding
Hybrid HTAP systems (Hybrid Transactional/Analytical Processing)
PAX format — combining the advantages of row and column approaches

Columnar and hybrid data stores. Compression

Efficient storage of analytical data

Heuristic optimizations: pushdown selections and projections
Cost-based optimization: using statistics and evaluating selectivity
Join strategies: nested loop, hash join, sort-merge join

Query planning and optimization.

Logical plans (relational algebra trees)

Volcano (iterator) execution model
Vectorization: SIMD and block processing of values
JIT compilation of SQL queries

Query execution. Plan vectorization. SQL compilation

Row-at-a-time vs batch-at-a-time approach

Handling NULL values
Explicit and implicit type casting
Rules of precedence in expressions
User-defined types
Semi-structured data types (JSON, XML, etc.)

Data types. Type system. Type casting.

Basic types: numeric, text, time, logical, binary

Concurrency issues: dirty reads, phantom reads, and others
Transaction isolation levels: Read Committed, Repeatable Read, Serializable, etc.
2PL (two-phase locking) and deadlock detection
MVCC (multi-version concurrency control) and snapshot isolation

Transactions and Concurrent Access Management

ACID transaction properties

Horizontal scaling (sharding)
Replication: synchronous and asynchronous; master–slave and multi-leader models
CAP theorem and data consistency models

Distributed databases: scaling and consistency

2PC and consensus protocols (Paxos, Raft)

Serverless databases
Separation of compute and storage layers
Automation and self-driving DBMS
Lakehouse architecture, agentic DBMS and Iceberg storage

Open-source databases. Modern architecture and recent issues.

Using ML for indexing, planning, tuning

Presentation of projects and analysis of research.

You've written your own DBMS. Time to present it :)

instructor:

Denys Tsyomenko

Founding Engineer @Embucket

Former Software Engineer @CaspianDB @SingleStore @DataRobot @Microsoft

University lecturer @Kyiv School of Economics

Ready? Take the first step

ready?
take the first step

I accept the terms of the Public Offer Agreement and consent to the processing of my personal data in accordance with the Privacy Policy.

reviews
what alumni say

what awaits
have fun and dive deep

format that works

Constant feedback in Slack.

No superficial slides — just deep dives into real production challenges.

Certificates earned through real results: completed assignments, active discussions, measurable progress.

communication that drives you

Twice weekly on Zoom — Mondays and Wednesdays at 7:00 PM, 1.5 hours each. All lectures recorded for later review. Taught in Ukrainian. Supplementary materials in English.

Slack is our hub for discussions, clever test cases, and top company referrals.

environment that energizes

We screen carefully — you'll learn among strong, motivated peers. Skip homework? You're out.

Your instructor is always available. They'll explain until it clicks — whether that's a third code review or staying late after lecture.

That's how we work: learn and grow stronger together.

database internals

Databases aren't magic — let's open the hood.

what's inside

curriculum prepare yourself

From file systems, hierarchical and network models to relational databases

The Relational Model Revolution and Its Impact on the Industry

The emergence of SQL, the development of object-oriented and NoSQL databases

Current trends: cloud, serverless, lakehouse and edge databases

Basics of query parsing: tokenization, SQL grammar, constructing an AST

Introduction to the course. History of databases. Parsing SQL queries

Relationships, attributes, tuples, and keys

Primary and foreign keys. Ensuring data integrity

Basic operations of relational algebra: selection, projection, union, difference, Cartesian product

Joins: inner, outer, natural

Properties of algebra - commutativity, associativity. The role of algebra as a basis for query optimization

Relational model. Relational algebra.

Pages and blocks as basic storage units

Organizing records in pages: fixed and variable length

Row-store and its applications

Data fragmentation and defragmentation methods

Data warehouse basics.

B-Tree and B+Tree: structure, search, insertion and deletion algorithms

LSM trees: how log-structured merges work

Write/Read amplification and compaction mechanisms

Using Bloom filters to speed up searches

Indexes. B-tree and LSM.

Row-oriented vs column-oriented storage

Advantages of the columnar approach for analytical queries

Data compression methods: RLE, dictionary encoding, delta encoding

Hybrid HTAP systems (Hybrid Transactional/Analytical Processing)

PAX format — combining the advantages of row and column approaches

Columnar and hybrid data stores. Compression

Heuristic optimizations: pushdown selections and projections

Cost-based optimization: using statistics and evaluating selectivity

Join strategies: nested loop, hash join, sort-merge join

Query planning and optimization.

Volcano (iterator) execution model

Vectorization: SIMD and block processing of values

JIT compilation of SQL queries

Query execution. Plan vectorization. SQL compilation

Handling NULL values

Explicit and implicit type casting

Rules of precedence in expressions

User-defined types

Semi-structured data types (JSON, XML, etc.)

Data types. Type system. Type casting.

Concurrency issues: dirty reads, phantom reads, and others

Transaction isolation levels: Read Committed, Repeatable Read, Serializable, etc.

2PL (two-phase locking) and deadlock detection

MVCC (multi-version concurrency control) and snapshot isolation

Transactions and Concurrent Access Management

Horizontal scaling (sharding)

Replication: synchronous and asynchronous; master–slave and multi-leader models

CAP theorem and data consistency models

Distributed databases: scaling and consistency

Serverless databases

Separation of compute and storage layers

Automation and self-driving DBMS

Lakehouse architecture, agentic DBMS and Iceberg storage