Heya! We use cookies

Codesphere uses cookies to understand your preferences and make sure you have the best experience on our site. By using Codesphere, you accept our use of cookies.

Understood. Yum!

Hash Tables: How They Work and Why They Matter

20.06.2022

One of the most important data structures that is used professionally is without a doubt the Hash Table.

A lot of data structures that are considered important are never actually used directly by most developers. The same can’t be said about Hash Tables. Being able to understand when and how to use a Hash table is a necessary condition for building effective software.

In this article, we’ll go over how Hash tables work and why they matter for the average developer.


How Do Hash Tables Work?

Hash Tables are a way of storing collections of data through key-value pairs. This effectively works as a dictionary.

For example, if we were using a hash table for a list of contacts, it might look like this:

So what does this actually look like at a low-level? Well first consider that our list of values needs to be stored as an array in the computer:

A hashtable works by allowing us to find the correct index for a particular value from a key, which is much more descriptive than a numerical index.

Hash tables achieve this through the use of what’s known as a hash function.

A Hash function is a function that takes in an input that could have any number of possibilities(Such as a name in our contact book example) and returns a number from a finite set of numbers(Our array index). Hash functions are used in a variety of instances but in the context of hash tables, it transforms our key into a given numerical value, and then maps that numerical value onto an array index, which can then be mapped onto our corresponding value.

The important attribute of a hash function is that it works by performing a computation on the input key. This means the speed at which it can convert the key into a hash is not dependent on how many different items we are storing!

For example, a lookup for our hashtable might look this:

“Jane” -> 41523563 -> 1 -> {Phone: 123-555-3344, Company: Big Oil}


Why Use a Hash Table over a Simple Array

It may not be immediately clear why this is so helpful.

Let’s say we’re building a blog platform and we have a user who just selected the article they need to read, and we now have to retrieve the blog article for them from a list of articles.

Now when we are trying to find the article from a list, it is unlikely that we already have its index for the array. The likely scenario is that we have some kind of identifying value for the article, such as the title of the article.

Let’s say we just had all the articles in an unsorted array. How could we find the article? We’d have to go through the array item by item until we find the article that matches our title. This is going to take an incredibly long time, especially if we have a large number of articles to check.

Now what if we instead had it in a hash table. Well now we could just plug our title into our hash function, and get back the correct index. From there we know exactly where in the array to look for our article information. Due to the hash function, this process is going to take the same amount of time no matter how many articles we have!


This makes Hash Tables one of the most important parts of building any piece of software. This dictionary format of storing and accessing data is most conducive to how applications actually work!


Collisions and Separate Chaining

Okay, now I actually lied a bit when I said that the time it takes to search a hash table doesn’t change based on the size of the table.

A key issue that needs to be considered when working with hash tables are what are known as collisions.

Now remember that a hash function is responsible for mapping an infinite amount of possibilities to a finite amount of indices:


As a result, when we are working with enough data points, it might be that our hash function maps two or more keys to the same array index. This is known as a collision.

So does this make Hash tables useless? Of course not, it is just something that needs to be accounted for. There are a number of methods to resolve collisions, but one of the most common is called separate chaining.


Separate Chaining is a process of storing LinkedLists(Lists of elements that point to each other instead of being indexed) of the values, instead of individual values at each index. Additionally, each entry in our hashtable now also needs to store the key.

Thus after passing our key to the hash function and looking up the correct index, we sift through the (hopefully short) linkedlist to find the entry that corresponds to our key.

While this can make the lookup time of a hash table increase slightly as the amount of key-value pairs grows, it is almost always going to be more efficient than a simple array.


Whats Next?

So how can you get comfortable working with hash tables? One of the best ways is to try and implement a hash table yourself in your favorite language, without the help of any data structure libraries.

Note that Hash Tables aren’t always the best way to organize collections of data. If our data has some natural ordering that makes lookup easy, or if it can be represented well in a tree, hash tables might be unnecessary or even non-optimal.

Being able to think rigorously about how to best represent your data is an important part of software engineering. Being well-versed in data structures and algorithms can help you navigate these kinds of questions.