Data Structures: A high level overview of what they are and why we need them

Introduction

During my undergraduate courses I took several classes on this subject alone. I didn’t really think much about it as I just wanted to get my grades and to start my career, I all ready felt out of place being 25 years old in a room full of 19 year olds, most of whom were awesome kids and it was just my own thing being the ‘old guy’ in the room. So i would get the projects such as implement an address book of people, simulate a car wash, write a calculator, and some other cool things; then I would spend a few days coding them while other students spent weeks and go do stuff to waste time than wasn’t deep diving what I was actually doing.

Introducing Echo Show 5

While this process was enough to graduate I really don’t recommend it because come to find out in depth knowledge of data structures is really important to the bigger tech companies out there. If your end goal isn’t getting a job at Amazon, Facebook, Google, Microsoft, etc… you’ll probably be just fine learning some basic implementations and then which libraries let you use them out of the box.

But when you have millions of people using your apps it starts to get a little more important if you are being efficient with what you are doing, it’s also important to make it clean enough so that when hundreds of other developers are working with your code they know what’s going on. And while it may take you several extra hours to document a class across the organization it could save other developers hundreds of hours which saves the company lots in the long run, and earns you praise and maybe a few free beers in the meantime from thankful peers. That’s something I didn’t REALLY learn and understand until I got into the field. Click here for more general developer career tips and check out the BeAPythonDev facebook group to be a part of a community of motivated pythonic individuals interested in growing their software development and coding skills.

The structures

  • Arrays
    • A basic collection of items such as numbers, names, objects. Memory must be designated for the array when you declare it.
    • When you need to add more entries to an array then new memory has to be designated for a new array of a bigger size and the old items must be iterated through and copied. This can be ineffective if this process must be done lots of times during your runtime
    • Can also lead to your program using lots of memory that isn’t needed
    • To find data in an array you generally need to search the whole array for what you need unless the data is sorted.
  • Queues
    • A structure that uses a first in, first out process similar to waiting in a line. For instance a person gets into a line for something, multiple people can line up behind him. During this process we always know who’s at the front of the line and who we need to take of first.
  • Stacks
    • A structure that uses a last in, first out process to manage data. Stacks can be used to process a numerical expression such as 3 * ( 4 + 5 ) by keeping one stack for numbers and one stack for operators. This allows for intricate operations where order matters and we can operate on things as they are added to the stack based on what we are adding.
    • The simplest example of an operation that a stack is especially effective at is to reverse a word or collection of items.
  • Hash Tables
    • A data structure used to effectively store data in memory where it will be looked up many times such as people in an address book based on key value pairs. For example where a key is the persons email address and value could be the persons name
    • Made by declaring a larger array then the amount of data we will expect to put into it. For example if are we going to put 5 people in our hash table we will want an array of generous size to prevent collisions.
    • Data is inserted by finding a hash of the data, some unique numerical expression of the data created by running it through a hash table. Then by taking the modolus operator of that hash based on the size of the array to determine which slot in the array it should go into. This happens in near constant time based on the hash function.
    • If two items have the same hash value there is a Collision and they are added as a list of objects in the same bucket of the array then retrieved by comparing the unique Keys of the data
    • Retrieval is done similar where we hash the key and return what is in that array index
    • Advantages: Constant time data insertions and retrievals assuming a perfect hash function with no collisions which will greatly speed up the execution of your programs
    • Disadvantages: Can be memory inefficient, many collisions could result in worst case lookup times that are greater than finding data in an array
  • Lists (implementation and usage)
    • Similar to an array in that it allows storing a ‘linear’ collection of items. For example a list could be 1,3,5,7,11,13
    • Different from an array that instead of needing to declare your memory allocation before hand, you just create your data and then point to the next entry in the list so a list can grow indefinitely during your programs execution
    • Advantages: Good at dynamically allocating memory as your data grows or shrinks
    • Disadvantages: Slow lookup times, traversing a list is generally unidirectional.
  • Trees
    • A way of sorting linked data in a logical order that makes working with the data generally more effective than a list
    • Starts with a Parent node and can have many children nodes as you get lower in the tree similar to a family tree drawing. Trees are generally represented in this way since humans read top down and this seems to be more comfortable to interpret. If it was bottom -> up it would look really similar to a tree tree
    • Advantages: Can look up data faster than a list in generally good cases
    • Disadvantages: Many deletions in your data can cause an increase in runtime as the trees need to be relinked and nodes get shifted around
  • Graphs
    • Graphs are similar to trees in that data is linked to other nodes and many nodes can branch to other nodes.
    • They differ in that there isn’t usually a set parent and children as a graph could be single directed, bi directed, and can be cyclical
    • I’ve never been in an interview where they have asked me to solve an algorithm with a graph. This would probably only come up if you are interviewing for a high level specialized position.

For an awesome link to a cheat sheet on the runtimes of these algorithms and other bigO information check out:

BigO Cheat Sheet

If you don’t know about bigO, I’m assuming your just starting computer science. Which one congratulations and two I definitely suggest following.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s