The following blog post is the ninth chapter in a primer series by Michael Kaminsky on databases. You can read Chapter Eight if you missed the third lesson about distributed databases, or watch the complete series.
Indexes are used to make queries run faster and they can be found and used in lots of different contexts. Like everything else related to databases, there are trade-offs with using indexes so it’s important to understand how they work.
The benefit to indexes is that they make queries go faster, allowing us to speed up our applications or improve query times for our analytics. The drawback to indexes is that they take up extra space in the database and can make other operations slower.
Indexes in the Physical Word
Database indexes work much like real-life indexes such as card catalogs in libraries and indexes at the backs of textbooks.
Physical card catalogs at a library aren’t very common anymore, but were once used to look up books in the library by topic. The other common place for an index is in the back of a nonfiction book or textbook. In a long textbook that covers a lot of topics, an index in the back can point you toward a certain topic that you want to find again.
Let’s talk through the steps we go through when using a book index. The purpose of this review is to show you how some things that humans do naturally and easily are referred to as “algorithms” when a computer does them. When you use an index, you’re using an algorithm called “binary search” that is very important in computing and makes indexes very powerful.
Let’s imagine we have a dense textbook with 1,000 pages. Our goal is to find a certain topic that the textbook discusses, but we don’t want to flip through every page to see if the topic is there. In the worst-case scenario we’d have to read the whole textbook just to find what we’re looking for! However, if the textbook has an index, the index will point directly to the pages containing the topic we want.
This index is organized by topic, and the topics are sorted alphabetically. The fact that the index is ordered by a known system (i.e. alphabetical) allows us to use binary search to find the topic that we’re looking for.
Binary Search
Binary search is the process of flipping back and forth between entries to “zero in” on the thing that you’re looking for. You might start too low, then flip a bit too high, then flip a bit lower, then a bit higher until you land on what you’re looking for.
Imagine we’re looking for the topic “frogs” in our index.
- If we start out in the index and flip to a random page, maybe we see the entry for “trees.” We know that “F,” for frogs, comes before “T,” for trees, so we know the entry for frogs must be earlier.
- So we flip to an earlier page, and we see the entry for “dogs”. We know that “F” comes after “D,” so we know frogs must be on a later page, but before the page where we found “trees.”
- So we flip to a later page, and we see “lava”. We’re getting closer but we’re not there yet, now we know we need to go back, but not as far back as “dogs.”
We repeat this process until we zero in on the entry for frogs, and then we can look up the correct page in the textbook to learn about the amphibians in question.
This zeroing-in process is what’s known as binary search. It works on the principle that we know how the Latin alphabet is ordered. By contrast, you would not be able to apply the same order to a Russian book in Cyrillic (because you don’t know how the Cyrillic alphabet is sorted!).
There are two major costs to textbook indexes, though:
- The index takes up space. There are a lot of extra pages you have to print and carry around with you!
- The index makes the book more expensive because you have to pay someone to actually create an accurate index.
Database Indexes
Databases use indexes very similarly to how we use indexes in books, so we can use a lot of the same intuition to understand how these work.
Imagine that we have a users table with millions of users in it.
Start for free
Join the thousands of companies using Fivetran to centralize and transform their data.