Coding Best Practices - Python Edition

Sat, Mar 25, 2023

Read in 7 minutes

How to know if code is well written, structured, readable and will stand the test of time? Following this general set of rules will ensure that best practices are implemented and so the code will be of high quality.

Coding Best Practices - Python Edition
  1. Create a Code Repository and use Version Control
  2. Create Readable Documentation
  3. Follow Style Guidelines
  4. Correct Broken Code Immediately
  5. Use the PyPI Instead of Doing it Yourself
  6. Use the Right Data Structures
  7. Write Readable Code
  8. Use Virtual Environments
  9. Write Object-Oriented Code
  10. What Not to Do while Programming in Python

1. VCS - Version Control System

Version Control is critical for good tracking of each code modification and it’s really good when working in conjunction with other developers.

Version control is handled with git . The implementation can be local, in-house or cloud.

For cloud the most popular platforms are:

Every project should contain the code files and additional markdown files in an understable structure. The minimum is:

Additionally, and much recommended, is to use a widespread and trustworthy structure with *cookiecutter.*

The documentation can be found here.

An example use for Data Science projects in general goes like this (by cookiecutter data-science):

cookiecutter -c v1 https://github.com/drivendata/cookiecutter-data-science

And results in a project structure like this:

├── LICENSE
├── Makefile           <- Makefile with commands like `make data` or `make train`
├── README.md          <- The top-level README for developers using this project.
├── data
│   ├── external       <- Data from third party sources.
│   ├── interim        <- Intermediate data that has been transformed.
│   ├── processed      <- The final, canonical data sets for modeling.
│   └── raw            <- The original, immutable data dump.
│
├── docs               <- A default Sphinx project; see sphinx-doc.org for details
│
├── models             <- Trained and serialized models, model predictions, or model summaries
│
├── notebooks          <- Jupyter notebooks. Naming convention is a number (for ordering),
│                         the creator's initials, and a short `-` delimited description, e.g.
│                         `1.0-jqp-initial-data-exploration`.
│
├── references         <- Data dictionaries, manuals, and all other explanatory materials.
│
├── reports            <- Generated analysis as HTML, PDF, LaTeX, etc.
│   └── figures        <- Generated graphics and figures to be used in reporting
│
├── requirements.txt   <- The requirements file for reproducing the analysis environment, e.g.
│                         generated with `pip freeze > requirements.txt`
│
├── setup.py           <- makes project pip installable (pip install -e .) so src can be imported
├── src                <- Source code for use in this project.
│   ├── __init__.py    <- Makes src a Python module
│   │
│   ├── data           <- Scripts to download or generate data
│   │   └── make_dataset.py
│   │
│   ├── features       <- Scripts to turn raw data into features for modeling
│   │   └── build_features.py
│   │
│   ├── models         <- Scripts to train models and then use trained models to make
│   │   │                 predictions
│   │   ├── predict_model.py
│   │   └── train_model.py
│   │
│   └── visualization  <- Scripts to create exploratory and results oriented visualizations
│       └── visualize.py
│
└── tox.ini            <- tox file with settings for running tox; see tox.readthedocs.io

``

2. Documentation

Every project needs some form of documentation. Even inline comments are the bare minimum, but one should expect to fulfill at least some docstrings inside classes and methods.

The recommended use-case is Sphinx or MkDocs, making it auto-deployable using CI/CD tools like Github Actions.

def get_random_ingredients(kind=None):
    """
    Return a list of random ingredients as strings.

    :param kind: Optional "kind" of ingredients.
    :type kind: list[str] or None
    :raise lumache.InvalidKindError: If the kind is invalid.
    :return: The ingredients list.
    :rtype: list[str]

    """
    return ["shells", "gorgonzola", "parsley"]

Links:

3. Style Guidelines

In Python there is one standard above all:

PEP8

Every programmer needs to know how to properly write Python code by following the PEP8 convention.

Python PEP8

There are a number of tools to check that at the testing stage, using pre-commit hooks with black and flake8 will ensure only PEP8 compliant code will get commited, else it will throw an error and point it out.

Pre-commit workflow for Python

Other tools are integrated in major IDEs like PyCharm or VSCode.

4. Correct Broken Code

The code that doesn’t work should never be pushed into the main development branch.

Instead, when a broken path is detected, one should open a PR, branch the code at that point, and fix it ASAP. Then petition to merge it into the main development branch.

This methodology is effective at protecting the main development timeline and is called “Zero defects methodology”.

5. Use Available Packages

The main power of Python is the widespread availability and ease of use of its infinite packages with infinite use-cases.

pypi

So whenever you need some special and repeatable method, first take a look around because chances are that someone faced it before… and packaged the solution.

So don’t waste your time.

Use PyPI, conda, github… but don’t re-write.

6. Data Structures

In python there are several data structures, and each one is better suited for different use-cases. Choosing badly will lead to slow performance or maybe major re-writing of the code.

https://www.edureka.co/blog/wp-content/uploads/2019/10/TreeStructure-Data-Structures-in-Python-Edureka1.png

Always take into account the general data structures as well as user or package created ones:

7. Readable Code

If you follow PEP8 convention then you’re almost there.

Also, the code has to be easy to understand. Without unused imports or methods, only necessary comments, keep all the comments with the same style, maximum line length set at 79 or 80 chars.

Variable names should be descriptive for anyone even without reading the rest of the code.

# Don't write like this
x = 1
y = 2

def my_function(a,b):
    z = a * b
    return z

compute = my_function(x,y)
print("Result is: ",compute)

#
# Should be LIKE THIS
#
heigth = 1
width = 2

def area_rectangle(side_a:float,side_b:float)->float:
    """
    Returns the area of a rectangle in float by
    entering both sides and multiplying them.

    :param side_a: Length of the first side
    :param side_b: Lenght of the second side
    :return: The computed area.
    :rtype: float

    """
    return a*b
  
area = area_rectangle(heigth,width)
print(f"The rectangle has {area} sqm")

And yes, f’strings are very useful for that matter!

8. Virtual Environments

One virtual environment for every project. This is a golden rule. Every project starts with the virtual env creation.

virtual envs venv

There are many options for that matter:

The reason behind virtual environments is avoiding package collisions and keeping configurations local.

This makes the code reproducible and so, reliable.

9. Be Object-Oriented

There is good reason to not have everything in one file with thousands of functions.

Python objects

Python is a very good object-oriented easy to understand language. In order to exploit this feature, one should code taking into account that:

By using this principle it is really easy to make object-oriented code reusable, readable, modular, encapsulated, and inheritable.

10. What to Avoid

Wrapping up

Writing good python code is easy and achievable if one follows a simple set of rules.

In this article I presented my general rules for that purpose, but there may be others.

This is the foundation of the checks needed for MLOps, Data Science, DevOps and general python coding.