database internals
Databases aren't magic — let's open the hood.
start:
Mar 31, 2026
25 classes
350 $/mo
what's inside
How do you build a database management system? Why are there so many types of DBMS, with new ones appearing every year? How do primary keys, joins, and ORDER BY actually work under the hood? How is data stored on disk, and what optimizations do modern systems apply? If you want answers to these and other questions about building DBMS — this course is for you.
Throughout the course, we'll trace the complete journey of building a database — from parsing queries to executing and scaling them. You'll discover the algorithms and concepts that power modern DBMS and build your own from scratch.
This course is valuable for anyone looking to dive deeper into systems programming, DBMS users who want to understand internal database mechanics and learn to optimize them more effectively. The best learning experience comes with Rust or C++, as they let you explore database architecture more deeply and get hands-on with memory management and performance — but this is just a recommendation.
curriculum
prepare yourself
From file systems, hierarchical and network models to relational databases
The Relational Model Revolution and Its Impact on the Industry
The emergence of SQL, the development of object-oriented and NoSQL databases
Current trends: cloud, serverless, lakehouse and edge databases
Basics of query parsing: tokenization, SQL grammar, constructing an AST
Introduction to the course. History of databases. Parsing SQL queries
Tokenization, SQL grammar, AST construction
Relationships, attributes, tuples, and keys
Primary and foreign keys. Ensuring data integrity
Basic operations of relational algebra: selection, projection, union, difference, Cartesian product
Joins: inner, outer, natural
Properties of algebra - commutativity, associativity. The role of algebra as a basis for query optimization
Relational model. Relational algebra.
Formal foundations of relational DBMSs and operations underlying SQL
Pages and blocks as basic storage units
Organizing records in pages: fixed and variable length
Row-store and its applications
Data fragmentation and defragmentation methods
Data warehouse basics.
How data is organized on disk and in memory, and why it affects performance
B-Tree and B+Tree: structure, search, insertion and deletion algorithms
LSM trees: how log-structured merges work
Write/Read amplification and compaction mechanisms
Using Bloom filters to speed up searches
Indexes. B-tree and LSM.
Data structures to speed up searches, inserts, and range scans
Row-oriented vs column-oriented storage
Advantages of the columnar approach for analytical queries
Data compression methods: RLE, dictionary encoding, delta encoding
Hybrid HTAP systems (Hybrid Transactional/Analytical Processing)
PAX format — combining the advantages of row and column approaches
Columnar and hybrid data stores. Compression
Efficient storage of analytical data
Heuristic optimizations: pushdown selections and projections
Cost-based optimization: using statistics and evaluating selectivity
Join strategies: nested loop, hash join, sort-merge join
Query planning and optimization.
Logical plans (relational algebra trees)
Volcano (iterator) execution model
Vectorization: SIMD and block processing of values
JIT compilation of SQL queries
Query execution. Plan vectorization. SQL compilation
Row-at-a-time vs batch-at-a-time approach
Handling
NULLvaluesExplicit and implicit type casting
Rules of precedence in expressions
User-defined types
Semi-structured data types (JSON, XML, etc.)
Data types. Type system. Type casting.
Basic types: numeric, text, time, logical, binary
Concurrency issues: dirty reads, phantom reads, and others
Transaction isolation levels: Read Committed, Repeatable Read, Serializable, etc.
2PL (two-phase locking) and deadlock detection
MVCC (multi-version concurrency control) and snapshot isolation
Transactions and Concurrent Access Management
ACID transaction properties
Horizontal scaling (sharding)
Replication: synchronous and asynchronous; master–slave and multi-leader models
CAP theorem and data consistency models
Distributed databases: scaling and consistency
2PC and consensus protocols (Paxos, Raft)
Serverless databases
Separation of compute and storage layers
Automation and self-driving DBMS
Lakehouse architecture, agentic DBMS and Iceberg storage
Open-source databases. Modern architecture and recent issues.
Using ML for indexing, planning, tuning
Presentation of projects and analysis of research.
You've written your own DBMS. Time to present it :)
instructor:

Denys Tsyomenko
Founding Engineer @Embucket
Former Software Engineer @CaspianDB @SingleStore @DataRobot @Microsoft
University lecturer @Kyiv School of Economics
Ready? Take the first step
ready?
take the first step
I accept the terms of the Public Offer Agreement and consent to the processing of my personal data in accordance with the Privacy Policy.
reviews
what alumni say
what awaits
have fun and dive deep
format that works
Constant feedback in Slack.
No superficial slides — just deep dives into real production challenges.
Certificates earned through real results: completed assignments, active discussions, measurable progress.
communication that drives you
Twice weekly on Zoom — Mondays and Wednesdays at 7:00 PM, 1.5 hours each. All lectures recorded for later review. Taught in Ukrainian. Supplementary materials in English.
Slack is our hub for discussions, clever test cases, and top company referrals.
environment that energizes
We screen carefully — you'll learn among strong, motivated peers. Skip homework? You're out.
Your instructor is always available. They'll explain until it clicks — whether that's a third code review or staying late after lecture.
That's how we work: learn and grow stronger together.