Written by: Stefan Bradstreet
Data structures are tools within a programming language that a software developer can use to organize their data. This allows the developer and the program to store and retrieve data in a way depending on the developers goals. If a program will need to save and retrieve data often then a software developer may choose to organize that data in a hashtable. If a developer needs to optimize how much memory is being used by their program and still want a binary search time, also known as O(log n) in big O notation speak, then they may organize their data in a Binary Search Tree type data structure.
Both of these data structures are built from other core data structures. A hash-table is a special implementation of an array which is a default data structure in the majority of programming languages. The hash table class also has special functions to insert data by running it through a hashing function and a modulus operator to determine where that data will be inserted. A binary search tree is a special type of graph built of a node data structure which contains a property to store the data and another property to point to the left and right children nodes. The left child is conically smaller than the parent’s data and the right child is larger.
This rest of this article will go through the basic data structures in more detail and these should be understood thoroughly to guarantee a successful career as a software developer.
Data structures overview
Please do study this infographic at your leisure. It contains the core data structures that are the building blocks of all programs and I use these daily as a software engineer and a hobby coding problem solver over on leetcode.
My history with Data Structures at College
During my undergraduate courses I took several classes on this subject alone. I didn’t really think much about it as I just wanted to get my grades and to start my career, I all ready felt out of place being 25 years old in a room full of 19 year olds, most of whom were awesome kids and it was just my own thing being the ‘old guy’ in the room. So i would get the projects such as implement an address book of people, simulate a car wash, write a calculator, and some other cool things; then I would spend a few days coding them while other students spent weeks and go do stuff to waste time than wasn’t deep diving what I was actually doing.Introducing Echo Show 5
While this process was enough to graduate I really don’t recommend it because come to find out in depth knowledge of data structures is really important to the bigger tech companies out there. If your end goal isn’t getting a job at Amazon, Facebook, Google, Microsoft, etc… you’ll probably be just fine learning some basic implementations and then which libraries let you use them out of the box.
But when you have millions of people using your apps it starts to get a little more important if you are being efficient with what you are doing, it’s also important to make it clean enough so that when hundreds of other developers are working with your code they know what’s going on. And while it may take you several extra hours to document a class across the organization it could save other developers hundreds of hours which saves the company lots in the long run, and earns you praise and maybe a few free beers in the meantime from thankful peers. That’s something I didn’t REALLY learn and understand until I got into the field. Click here for more general developer career tips and check out the BeAPythonDev facebook group to be a part of a community of motivated pythonic individuals interested in growing their software development and coding skills.
- A basic collection of items such as numbers, names, objects. Memory must be designated for the array when you declare it.
- When you need to add more entries to an array then new memory has to be designated for a new array of a bigger size and the old items must be iterated through and copied. This can be ineffective if this process must be done lots of times during your runtime
- Can also lead to your program using lots of memory that isn’t needed
- To find data in an array you generally need to search the whole array for what you need unless the data is sorted.
- Real World Example of an Array: A row of seats on a couch where only one person can sit in each seat. A person can sit on any seat that is open. The couch only has so many seats and if you need to add more people you would need to buy a bigger couch and move everyone to the new couch.
- A structure that uses a first in, first out process similar to waiting in a line. For instance a person gets into a line for something, multiple people can line up behind him. During this process we always know who’s at the front of the line and who we need to take of first.
- Real World Example of a Queue: A restaurant drive through. The restaurants biggest priority is the first person that has joined and been in the line the longest. As such they have pointers to who is first and they will be the first ones to leave the line after they have received their food.
- A structure that uses a last in, first out process to manage data. Stacks can be used to process a numerical expression such as 3 * ( 4 + 5 ) by keeping one stack for numbers and one stack for operators. This allows for intricate operations where order matters and we can operate on things as they are added to the stack based on what we are adding.
- The simplest example of an operation that a stack is especially effective at is to reverse a word or collection of items.
- Real World Example of a Stack: A stack of papers on a persons desk. You can always peek at what paper on the top but if you need the paper on the bottom you need to pick up all the other pieces of paper and place them somewhere else on the desk.
- Hash Tables
- A data structure used to effectively store data in memory where it will be looked up many times such as people in an address book based on key value pairs. For example where a key is the persons email address and value could be the persons name
- Made by declaring a larger array then the amount of data we will expect to put into it. For example if are we going to put 5 people in our hash table we will want an array of generous size to prevent collisions.
- Data is inserted by finding a hash of the data, some unique numerical expression of the data created by running it through a hash table. Then by taking the modolus operator of that hash based on the size of the array to determine which slot in the array it should go into. This happens in near constant time based on the hash function.
- If two items have the same hash value there is a Collision and they are added as a list of objects in the same bucket of the array then retrieved by comparing the unique Keys of the data
- Retrieval is done similar where we hash the key and return what is in that array index
- Advantages: Constant time data insertions and retrievals assuming a perfect hash function with no collisions which will greatly speed up the execution of your programs
- Disadvantages: Can be memory inefficient, many collisions could result in worst case lookup times that are greater than finding data in an array
- Real world example of a hash table: This is a bit of a stretch but a persons drawers can be arranged as a hash table where shirts always go in the top draws, underwear in the middle, and pants in the bottom. That way if you are looking for underwear you dont need to go looking through every single drawer. This falls apart if you are looking for a certain pair of underwear but maybe you have another “hash table” or organization method inside your drawer which would simulated nested dictionaries in Python or Json
- Lists (implementation and usage)
- Similar to an array in that it allows storing a ‘linear’ collection of items. For example a list could be 1,3,5,7,11,13
- Different from an array that instead of needing to declare your memory allocation before hand, you just create your data and then point to the next entry in the list so a list can grow indefinitely during your programs execution
- Advantages: Good at dynamically allocating memory as your data grows or shrinks
- Disadvantages: Slow lookup times, traversing a list is generally unidirectional.
- Real World Example of a list: A train. More cars can be easily added to the back but not so much to the middle. If you are looking for a certain person on the car you would need to start at the engine and work all the way through the back to find the particular person.
- A way of sorting linked data in a logical order that makes working with the data generally more effective than a list since you can search faster than looking at every single node in your list
- Starts with a Parent node and can have many children nodes as you get lower in the tree similar to a family tree drawing. Trees are generally represented in this way since humans read top down and this seems to be more comfortable to interpret. If it was bottom -> up it would look really similar to a tree tree
- Advantages: Can look up data faster than a list in generally good cases
- Disadvantages: Many deletions in your data can cause an increase in runtime as the trees need to be relinked and nodes get shifted around
- Real world example of a tree: A filing cabinet could potentially be a tree structure if you have a folder with other folders which have other folders in them. The limitation is that a folder can’t have a folder inside of it that it is already inside of.
- Graphs are similar to trees in that data is linked to other nodes and many nodes can branch to other nodes.
- They differ in that there isn’t usually a set parent and children as a graph could be single directed, bi directed, and can be cyclical
- I’ve never been in an interview where they have asked me to solve an algorithm with a graph. This would probably only come up if you are interviewing for a high level specialized position.
- Real world example of a graph: Cities are naturally laid out in a graph structure where the highways are the pointers that connect a city. There is a way to travel from Chicago to LA to New York and back to Chicago without using the same highway or pointer.
For an awesome link to a cheat sheet on the runtimes of searching and traversing through these data structures and other bigO information check out:
About Stefan Bradstreet
Stefan is a software developer engineer II at Amazon with 5+ years of experience in tech. He is passionate about helping people become better coders and climbing the ranks in their careers.
Thank you for visiting!
Please remember if you found this content useful and would like to see more python and software development articles from us, help us conquer the google algorithm in any of the ways below:
- Comment your thoughts below
- Share this on your social media
- follow me on twitter
- subscribe to our Youtube channel
- Contact me on the service page for coaching
- Shop through our affiliate links: Python crash course book
- Join as a patron https://www.patreon.com/beapythondev for more exclusive content and my personal weekly schedule and notes I use to keep growing as a SDE
If you didn’t find what you expected feel free to use the contact page and let us know ways we can improve or what would help you grow as a developer better! I am honored by every view and it takes a village to really make this great.