11 Things Every Serious Python Developer Should Know

Somewhere in your life you decided you wanted to learn python. You took the initiative to download and install Python and then you’ve learned how to declare variables and call functions with ease. Then you’ve worked through several tutorials and are comfortable making a guess the number game, password generator, and a simple calculator. Maybe you’ve even made a few classes to save people in an address book and are considering Python to be pretty simple but how well do you really know the following 11 topics below about the Python coding language?

  1. The interpreted nature of python
  2. Using the python shell
  3. Owning your IDE and debugging
  4. Testing and documentation
  5. Using built in functions
  6. Advanced function concepts
  7. Context managers
  8. Most useful python modules
  9. List comprehensions and slicing
  10. Global Interpreter Lock
  11. Iterators and why to use them

In this article I’ll go over these in more detail and how awareness of the above topics will help you reach the next level as a Python developer. Also, if you like these kind of coding and career development articles please feel free to connect with me on twitter!

The interpreted nature of Python

As you may or may not know Python is an interpreted language which has a more human based dynamic language and doesn’t require making sophisticated builds and packaging distributions. This also means there is another program written in a lower level language that will load your code, create constructs for your objects within its own language, and step through all of your code as part of its run-time.

The most common interpreter is the C based CPython which is what you install when you download and install Python from python.org. Another popular based interpreter is Jython which used the java virtual machine to run your code. CPython has been benchmarked to run faster than Jython but Jython allows you to mix the languages together more seamlessly when you need more performance from your code by writing Java directly.

Now there are a few major drawbacks from Python being an interpreted language. First Python code is more resource intensive when it comes to both CPU speed and memory conception. Second you don’t get the ability to truly take advantage of multi threaded code because of Pythons global interpreter lock which I talk about in more detail later in this article. And lastly there are no compile time errors. Since everything needs to be detected during run-time this requires that you should have more tests for your code for when other developers are making changes to your aging code.

Using the python shell

Typically in my day to day work I find myself often going to the Python shell to quickly experiment with new libraries, list comprehensions, or to even import the requests library and make Python based http get requests.

Occasionally I have even used the Python to shell to run my teams code on a Production server to debug behavior of a specific class and function by checking for that specific instance of the log output that will now be shown right in front of me in the terminal.

To run the shell simply open a terminal or command shell and run the “python” command and you should see something like the below text.

Python 3.7.0 (default, Aug 22 2018, 20:50:05) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 

You can even load a shell from the https://www.python.org/shell/ webpage and tinker there.

Owning your IDE and Debugging

When it comes to Python the three most popular integrated development environments are VSCode by Microsoft, Pycharm by Jetbrains, and Jupyter Notebook by team IPython.

Whichever one you chose you should take extra bootstrapping time to learn the most common shortcuts to cut and paste lines, quickly jump between files, grepping, replacing, and refactoring. Taking the initiative to deep dive the IDE you chose can greatly boost how productively you write code compared to a weekend warrior who just downloads an IDE and uses it just to write *.py files.

Another huge advantage you get when using an IDE is the ability to do line by line debugging. You can also hook up remote debugging tools to step through code that is running on a different host. This is greatly favorable to dumping print statements to your console because of the fine grain control you get and the ability to see all the contents of your variables in real time.

Even if your code is working as expected; all good Python developers should step through code with any complexity to understand their data’s behavior more in depth and potentially find optimizations through more intimacy with their own code.

Documentation and Testing

When it comes to a human readable and self documenting language like Python it becomes easy to fall into a trap of avoiding excess documentation but that is actually the opposite mentality to have when working with Python.

Python gives developers a built in tool called the doc string to thoroughly document your code. There are tools such as Sphinx which will parse all of your python files looking for your doc strings to auto generate complete API documentation since Python allows access to doc strings in code by doing method.__doc__

A developer can also use the doc string to write simple test cases which can be evaluated using the doctest method. This gives the ability for new developers to come into your code and quickly realize the objective of each function. An example of this can be seen below.

def add_two(x, y):
    """
    This function takes two integers x and y and adds them together
    
    Arguments:
    x -> int
    y -> int

    >>> add_two(2, 3)
    5
    >>>
    """
    return (x+y)

def _test():
    import doctest
    doctest.testmod()

if __name__ == "__main__":
    _test()

Early in my Python career I would laugh at such a test. I thought unit tests seemed trivial and slowed down how quickly I could release new features. After my small personal project at work grew from hundreds to thousands of lines with multiple developers working on it; I saw that a proper testing strategy and a desired testing coverage threshold is crucial to the long term success of a project and my team brought in the Python pytest module to cover 80% of the projects lines complete with automated tests on each release with reports.

Unit tests are valuable to serve as additional documentation to tricky bits of code that can be ran by themselves to help a developer understand functions. Tests written in unison as part of the development process often leads to less error prone code by deliberately thinking through happy and bad use cases. Also tests makes your code resilient to breaking changes. Imagine you are required to update some of your dependencies because of a known security holes. When the developers of that library fixed the security leak they may have also updated the foo() method used by your package in a way that changes the return data in a way that could break your code. Having a unit test or integration test in place for your critical code paths for this case would avoid breaking your production state and causing downtime for your customers.

Built in functions

One of my favorite features of Python is the ability to randomly pull useful functions out of my back pocket that I would have needed to import from other languages to do common operations such as printing and type casting. Below are the most common built in functions I find myself using on a daily basis.

  1. bool() int() str() # functions for converting types
  2. abs() # absolute value
  3. all() # returns true if and only if all the parameters are true
  4. any() # returns true if any of the parameters are true
  5. dict() # Initializes a hash table collection object or calls the __iter__ function of an object to attempt to convert the object to a dict
  6. dir() # Gives the directory mapping of an object, part of the standard inherited object
  7. list() # Defines a variable array collection object that can serve as a queue, stack, or array that can store any kind of dynamic data
  8. enumerate() # Allows you to iterate for the values and indexes of an iterable at the same time
  9. filter(function, iterable) # Applies a function to each item in an array
  10. sorted(iterable, key=None) # Sorts your iterable, uses TimSort by default with an O(n log n) runtime or you can apply your own sort method passing in a key function
  11. getattr(object, “attribute”, default_value) # Gets an attribute from an object or dict and returns either the default_value or none
  12. max() min() # Run the comparison method on two data types
  13. open() # Used for quick input/output of files
  14. print() # Writes values to the standard output. Can pass in multiple arguments to auto concatenate the input
  15. range(start, end, step) # Defines a range for iterating against
  16. type() # Tells you the __class__ value of an object

Advanced function concepts

Beyond the standards of using functions to avoid repeating yourself and encapsulating functionality; Python also allows you to do a few more less obvious things with your functions. Let me know if you’d like to see a more detailed article or micro course about the concepts below.

Passing functions

When declaring functions python automatically stores a copy to the function reference which can then be passed to other functions and used to generate callbacks. These function delegates can also be passed into multi threaded functions for simple concurrency or filter / sort functions to give more control over how your code interprets its data.

# Toy example of a simple callback 
def even():
    print("I'm even")
def odd():
    print("I'm odd")

def check_number(number, even_f, odd_f):
    if number % 2 == 0:
        even_f()
    else:
        odd_f()

check_number(7,even, odd)
# output: "I'm odd" 

Using variable and keyword arguments values

If you find yourself writing dynamic code eventually you will need to write a function which will accept a variable amount of arguments for growing use cases. The code below shows the basic use of this principle, how you use it will depend on how creative you want to be.

# Using variable args for unlimited arguments
def calculate_sum(*args):
  # Assume args are each ints
  sum = 0
  for arg in args:
    sum += arg
  return sum

print(calculate_sum(1,2,3,4,5))
# Output 15

def keywords(**kwargs):
  for key in kwargs:
    print(key," : ",kwargs[key])

keywords(hello="world", foo="bar")

Function Decorators

Imagine you are writing several functions and you would like to know how long each of them ran and then you want to save these values to a metrics file. Rather than writing timer logic at the beginning and each of each of your functions you can write a single function decorator and then add the annotation of this to each of your functions.

# func is automatically passed in by the annotation
# glue within python
def calc_latency(func):
  ... latency and metric persistence logic ..

@calc_latency()
def my_database_reader_func():
  ...database business logic that grows over time...

@calc_latency()
def my_critical_business_logic_func():
  ...some time sensitive business logic...
  

Check https://www.geeksforgeeks.org/decorators-in-python/ to see a detailed timer example

Annotations are also commonly used to run authorization checks on methods within web application frameworks such as Django and Flask.

Nested Functions

Python also gives you the ability to declare functions wherever you may find them useful as in the below example.

def top_function():
  def inner_function(int_to_double):
    print(int_to_double * 2)

  for x in range(0,10):
    inner_function(x)

I often find myself using nested functions for recursive solutions while working on coding challenges. Function decorators also use nested functions to abstract the functions they wrap.

Advanced Classes for Classes

After working with Python for a while within my job I eventually found myself asking if there is a way to run some clean up code everytime I’m done with my class. This class would create a directory, move some files into the new directory to use as the input for an external binary, parse the output, and we should always be deleting the directory when we are done.

While researching this I learned about Python Context managers and magic methods. Below is an example for such a class

from load_binary import LoadBinary
from utils import create_tmp_dir

class MyFileAutomation(object):
  binary = LoadBinary("foo_bin")

  def __init__(self, *args):
    self.args = args
    self.running = False
  
  # Will be ran when creating the object from a context manager
  def __enter__(self):
    self.tmp_dir = create_tmp_dir()
    return self
  
  # Magic exit function similar to dispose in other languages
  def __exit__(self, type, value, traceback):
    # Parse any logs
    # Delete temp dir
  
# Magic function to create a string representation
  def __repr__(self):
    return "%s status: %s" % (self.binary.name, self.running)

  def run_automation(self):
    binary.run(args)
    self.running = True


with MyFileAutomation("arg1", "arg2") as autotron:
  autotron.run_automation()

# we're out of scope now and __exit__ will automatically be called
# this eliminates the need for using a try except finally as the __exit__
# will always be ran.

  

Python also has many many other magic methods you can define in your classes to further define their behavior. For example defining __getattr__ and __setattr__ can allow you to interact with an internal collection data structure within your class. You can also define __gt__ and __lt__ magic methods to do comparisons on your classes directly against each other and define how python should do these comparisons.

Most useful python modules

Python ships with a large standard library that can by simply importing the code within your modules. These libraries provide access to code that solves common problems such as checking file existence, running commands on a terminal, creating graphical user interface, etc…

Common modules and what I use them in my daily work.

  1. os – operating system level functions such as directory walking, checking files, working with file paths
  2. sys – exit python code while setting return code, load arguments, redirect output streams to standard error
  3. shutil – often used for deleting or copying directories
  4. json – serialize and deserialize dictionaries to and from json formatted strings or files
  5. subprocess – Run commands on your operating system. Useful for running other binaries from within your application
  6. random – Generate random integers
  7. unittest – Standard unit testing framework for asserting test cases and mocking dependencies
  8. math – contains methods for calculating ceiling, floor, factorial, trig functions, and also contains constants for pi, tau, and others
  9. datetime – Generate system time stamps with various formats. These datetime objects can also be used to add or subtract using time deltas
  10. tkinter – Used for building simple GUIs
  11. venv – Create virtual environments for installing additional python dependencies isolate to the project in the virtual environment

Useful 3p libraries. These can all be conveniently installed

  1. Flask – Web application framework that allows you to bring in other useful flask plugins to control your web apps behavior
  2. Numpy – Linear algebra based library for manipulating arrays
  3. Matplotlib – Functional graphing library
  4. Requests – Abstracts the standard urllib library for making http rest requests
  5. SQLAlchemy – A database object relational mapping plugin commonly used with SQLLite, mysql, and postgres
  6. Scikit – Library for working with simple data analysis and machine learning concepts. Contains common machine learning algorithms such as decision trees and nueral networks. Also has standard datasets to experiment with.
  7. Pandas – Library for working with dataframes and other data analysis manipulations. Has built in histogram plotting functionality
  8. Keras – More advanced Nueral network library
  9. Beautiful Soup – Loads in html pages DOMs for web scraping
  10. Pytest – Another python unit testing framework which provides the ability to create fixtures and generate test reports

List comprehensions and slicing

The list comprehension is a useful way of running operations on elements in iterators. Refer to the below code to see how this improves readability.

x = [1,2,3,4,5,6,7]

for i, val in enumerate(x):
  if val < 4:
    x[i] = val * 2

# The above can also be written as
x = [val * 2 for val in x if val < 4]

# Reassign X
x = [1,2,3,4,5,6,7]
# Other useful list operations
print(x[-1]) # 7

# Get first two elements
y = x[:2]  # [1, 2]

# slice off first three elements
z = x[3:] # [4,5,6,7]

# Slice with two indexes, inclusive
ww = x[2:4] = [3,4,5]

# Get last two
xx = x[-2:] # [6 ,7]

# Use a list as stack
yy = x.pop() # x = [1,2,3,4,5,6]  yy =7

# Use a list as a queue
zz = x.pop(0) # x = [2,3,4,5,6] zz = 1 

Global Interpreter Lock

The Global interpreter lock is a mutex within the main interpreter application that allows a python application to behave in a multi threaded way. Because your python application is not truly multi threaded if you assign 8 cpus to concurrently work on a large linear algebra problem you will see a speed up substantially less then 8x as you would expect. In my own experiments I have only observed a speedup of 1.8x and it seems to cap at around this value.

If your code is running a subprocess command, http requests, or some other external input/output based process then you can expect to see faster runtimes of your application as the io work is distributed by the GIL.

To run non input/output based calculations concurrently then it is best to have the application code create multiple python processes and let the operating system distribute the processes. The disadvantage here is your code will need to be written in an isolated way since the processes do not have shared memory access without building in a shared cache.

Generators and why to use them

If you have a function that would return a list, rather then calculating the entire list you can have python yield a generator. This tells python to wait on getting all the values until the client code exclusively needs them. This helps with memory management of your application.

You can learn more about generators and iterators here

Conclusion

As you can see there are many powerful features in Python that allow it to be more than just a scripting language. Python has a powerful robust standard library, many 3p parties that are distributed through pip, and many advanced features within the syntax to allow extensible and maintable code beyond a standard scripting language.

If you’ve made it this far I highly appreciate your tenacity to learn and wish you the best in your python future! Please leave me a message and what else you’d like to see from this site.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s