top of page

database internals

Databases aren't magic — let's open the hood.

start:

Mar 31, 2026

25 classes

350 $/mo

what's inside

How do you build a database management system? Why are there so many types of DBMS, with new ones appearing every year? How do primary keys, joins, and ORDER BY actually work under the hood? How is data stored on disk, and what optimizations do modern systems apply? If you want answers to these and other questions about building DBMS — this course is for you.


Throughout the course, we'll trace the complete journey of building a database — from parsing queries to executing and scaling them. You'll discover the algorithms and concepts that power modern DBMS and build your own from scratch.


This course is valuable for anyone looking to dive deeper into systems programming, DBMS users who want to understand internal database mechanics and learn to optimize them more effectively. The best learning experience comes with Rust or C++, as they let you explore database architecture more deeply and get hands-on with memory management and performance — but this is just a recommendation.

curriculum
prepare yourself

  • From file systems, hierarchical and network models to relational databases
  • The Relational Model Revolution and Its Impact on the Industry
  • The emergence of SQL, the development of object-oriented and NoSQL databases
  • Current trends: cloud, serverless, lakehouse and edge databases
  • Basics of query parsing: tokenization, SQL grammar, constructing an AST

Introduction to the course. History of databases. Parsing SQL queries

Tokenization, SQL grammar, AST construction

  • Relationships, attributes, tuples, and keys
  • Primary and foreign keys. Ensuring data integrity
  • Basic operations of relational algebra: selection, projection, union, difference, Cartesian product
  • Joins: inner, outer, natural
  • Properties of algebra - commutativity, associativity. The role of algebra as a basis for query optimization

Relational model. Relational algebra.

Formal foundations of relational DBMSs and operations underlying SQL

  • Pages and blocks as basic storage units
  • Organizing records in pages: fixed and variable length
  • Row-store and its applications
  • Data fragmentation and defragmentation methods

Data warehouse basics.

How data is organized on disk and in memory, and why it affects performance

  • B-Tree and B+Tree: structure, search, insertion and deletion algorithms
  • LSM trees: how log-structured merges work
  • Write/Read amplification and compaction mechanisms
  • Using Bloom filters to speed up searches

Indexes. B-tree and LSM.

Data structures to speed up searches, inserts, and range scans

  • Row-oriented vs column-oriented storage
  • Advantages of the columnar approach for analytical queries
  • Data compression methods: RLE, dictionary encoding, delta encoding
  • Hybrid HTAP systems (Hybrid Transactional/Analytical Processing)
  • PAX format — combining the advantages of row and column approaches

Columnar and hybrid data stores. Compression

Efficient storage of analytical data

  • Heuristic optimizations: pushdown selections and projections
  • Cost-based optimization: using statistics and evaluating selectivity
  • Join strategies: nested loop, hash join, sort-merge join

Query planning and optimization.

Logical plans (relational algebra trees)

  • Volcano (iterator) execution model
  • Vectorization: SIMD and block processing of values
  • JIT compilation of SQL queries

Query execution. Plan vectorization. SQL compilation

Row-at-a-time vs batch-at-a-time approach

  • Handling NULL values
  • Explicit and implicit type casting
  • Rules of precedence in expressions
  • User-defined types
  • Semi-structured data types (JSON, XML, etc.)

Data types. Type system. Type casting.

Basic types: numeric, text, time, logical, binary

  • Concurrency issues: dirty reads, phantom reads, and others
  • Transaction isolation levels: Read Committed, Repeatable Read, Serializable, etc.
  • 2PL (two-phase locking) and deadlock detection
  • MVCC (multi-version concurrency control) and snapshot isolation

Transactions and Concurrent Access Management

ACID transaction properties

  • Horizontal scaling (sharding)
  • Replication: synchronous and asynchronous; master–slave and multi-leader models
  • CAP theorem and data consistency models

Distributed databases: scaling and consistency

2PC and consensus protocols (Paxos, Raft)

  • Serverless databases
  • Separation of compute and storage layers
  • Automation and self-driving DBMS
  • Lakehouse architecture, agentic DBMS and Iceberg storage

Open-source databases. Modern architecture and recent issues.

Using ML for indexing, planning, tuning

Presentation of projects and analysis of research.

You've written your own DBMS. Time to present it :)

instructor:

Denys Tsyomenko

Founding Engineer @Embucket

Former Software Engineer @CaspianDB @SingleStore @DataRobot @Microsoft

University lecturer @Kyiv School of Economics

Ready? Take the first step

ready?
take the first step

I accept the terms of the Public Offer Agreement and consent to the processing of my personal data in accordance with the Privacy Policy.

reviews
what alumni say

what awaits
have fun and dive deep

format that works

Constant feedback in Slack.

No superficial slides — just deep dives into real production challenges.

Certificates earned through real results: completed assignments, active discussions, measurable progress.

communication that drives you

Twice weekly on Zoom — Mondays and Wednesdays at 7:00 PM, 1.5 hours each. All lectures recorded for later review. Taught in Ukrainian. Supplementary materials in English.

Slack is our hub for discussions, clever test cases, and top company referrals.

environment that energizes

We screen carefully — you'll learn among strong, motivated peers. Skip homework? You're out.

Your instructor is always available. They'll explain until it clicks — whether that's a third code review or staying late after lecture.

That's how we work: learn and grow stronger together.

bottom of page