I wrote a Python Linting Code Scanning Action

October 10, 2023

A GitHub Action to lint Python code, for GitHub Code Scanning

Recently I ran a training “bootcamp” for a customer on GitHub Advanced Security, and the topic of code quality came up.

Pinning down just what we mean by “code quality” can be a bit tricky: witness the debates over Uncle Bob’s work on “clean code”. It’s fair to say that it’s often about other factors of the code than the security of the code, which is the main focus of Advanced Security (the clue is in the name). We’re talking performance, readability, maintainability, and so on.

It isn’t like there’s been a lack of thought on the topic; there’s even an ISO standard for it, but that doesn’t mean every tool means the same thing by “quality”, even so.

We do have code quality rules for Advanced Security, that can be run with CodeQL, but developers are often familiar with a linting 🧹 tool for their language of choice, and want to use that, and we encourage integrating 3rd party tools into Code Scanning.

What’s a linter?

Lint is the computer science term for a static code analysis tool used to flag programming errors, bugs, stylistic errors and suspicious constructs

Thanks Wikipedia!

There is a lot of innovation in the linting space, and cheap approaches can produce good results for linting — in constrast to security results, which can really suffer if produced by a simpler tool — and the community is often really great at producing rules and plugins for linters.

Some linters have been integrated in Code Scanning already, but that isn’t the case for Python, other than the Bandit security linter.

In the Python 🐍 world there has been a proliferation of great linters and type checkers, and I wanted to see if I could use them to produce a GitHub Action that would work with Code Scanning. Plenty of GitHub Actions exist to run one or more Python linters, but they don’t work with Code Scanning.

I needed something to run the linters and turn the output into SARIF, which is the standard for SAST results (and that GitHub helps design).

This is the result, started in the airport 🛫 on the way home from the training, and finished over the last couple of weeks:

Python linting Code Scanning Action

I suggest if you want to read about how to use it you follow that link, read the README, and dive in.

Carry on reading here for a bit more background on the decisions behind it.

How I wrote it

I started by deciding which work I wanted to do, and which to delegate to the linters and type checkers (I’ll just say “linters” from now).

The first major decision is whether to try to replicate what the linters are doing in CodeQL instead of using them directly. Replicating the linters was a non-starter for me, but worth mentioning. The linters will do the work.

Configuration was an obvious choice to delegate. I wanted to use the same configuration as the linters, so that the Action would be familiar to developers who already use the linters, and I didn’t have to design a universal configuration format that would try to account for the variation in the linters. Configuration was to be done using the configuration file for each linter.

I didn’t want to have to fork or change the linters to be able to use them, so I needed to run them and then transform their output into SARIF myself. In some cases the linter already has a JSON output option, so I can transform that reasonably easily; in others I was parsing human-readable text output. In one case, Flake8, it has a plugin system, so I wrote my own flake8-sarif-formatter for it. There was an existing flake8-sarif module on PyPi, but the source repo is 404, making me worried about relying on it, so I decided to write my own independently.

Picking which linters to support was fairly easy, and I decided to support type checkers as well. I picked the most popular linters and type checkers, and the ones that I was familiar with. I ended up with Pylint, Flake8, Ruff, Fixit 2, Mypy, Pyright and Pytype.

I didn’t include Pyre since I’ve had problems with it in the past; I tried it out for this and again didn’t get on with it. I can’t quite recall why! I also didn’t include Bandit, since it’s a security linter, and I wanted to focus on code quality — there’s an existing Code Scanning Action for Bandit, if you want to use it.

At first I thought since I was running Python linters I would be able to load the linters as Python modules in a Python script, but that turned out not to be possible. I think one was callable as a Python module, but it didn’t have a maintained API to get results. Instead I called them as command-line tools, and parsed their output.

Some of the linters gave reasonably helpful rule descriptions in their output, but a few didn’t, so I had to rely on creating my own from the individual alert messages, or from the name of the rule. Rule descriptions are important for Code Scanning, since they are used as the title of the alert, rather than the alert message.

The result

The result is a GitHub Action that can be used from an Actions workflow to lint Python code, and produce results in Code Scanning.

I’ve tested it on a few repositories, and it seems to work well. I’ve also used it to lint the code in the Action itself, and it found a few issues that I was able to fix.

Hope you find it useful; if you have any feedback, please open an issue or start a discussion on the repo.