New Github Secret Scanning Custom Patterns

July 13, 2023

GitHub Secret Scanning gives loads of value off-the-shelf, with highly precise vendor secret detection, but sometimes a customer wants something that isn’t already covered by the built-in patterns.

For that, our custom patterns for Advanced Security are perfect, and I’ve just released some that I’ve written for a couple of customers (and also on a whim).

Here’s the list:

.NET Configuration file, e.g. <add key="password" value="somesecret" />
.NET machineKey, e.g. <machineKey validationKey="..." />
Database connection string, e.g. Server=myServerAddress;Database=myDataBase;User Id=myUsername;Password=myPassword;
MS SQLServer TSQL user creation, e.g. CREATE LOGIN user1 WITH PASSWORD = 'password';
Commonly leaked weak passwords, e.g. creds = "p@ssw0rd"
Sentry tokens, secrets and keys, e.g. SENTRY_API_KEY = '1234567890abcdef1234567890abcdef'
Bearer tokens, e.g. Authorization: Bearer AAAAAAAAAAAA
DataDog API and app keys, e.g. datadog_api_key: 1234567890abcdef1234567890abcdef

You can find them in the GitHub Advanced Security Field Secret Scanning Custom Patterns repository.

DataDog API and app keys

These DataDog keys are on-the-face-of-it a bit hard to spot, since they are just strings of hex characters, 32 and 40 characters long. We do detect them in our vendor patterns, and alert DataDog to them, but we don’t have a user-facing alert or push protection for them, since that would come with a mass of false positives on MD5 hashes and so on.

I decided that I’d take a look into what I can do to help customers spot these keys, and I came up with a custom pattern that looks for them in context.

This custom pattern gives you the option to use push protection for DataDog keys, so you can prevent having to rotate secrets and clean things up on the developer’s own machine.

That’s done by adding context that DataDog keys are commonly used in, such as datadog_api_key: <key>.

Sentry tokens, secrets and keys

It’s a similar story with Sentry tokens, secrets and keys. They’re strings of hex characters, 32 and 64 characters long, so they’re hard to spot without a lot of false positives.

We can spot them in context in several config file formats, including .env format, in a plugin for webpack, and in Terraform files.

Commonly leaked passwords

That commonly leaked weak passwords pattern is, um, “fun”. For such a generic pattern, there’s a balance between false positives and false negatives.

I took inspiration from the OWASP SecLists project to come up with a list of passwords that are most commonly leaked, such as “password123” or “football”. I then added some variations to make them spot more of the SecLists passwords, so we can spot things like “P@55w0rd123!”, which a naïve user would imagine is hard to predict (it’s not!).

I deliberately restricted it to looking for contexts that look like a variable or key assignment, so that it doesn’t just match on these words in plain text, in the middle of a sentence, but I didn’t constrain it to just password: <pattern>, so there’s still lots of potential for false positives.

You could loosen that restriction and look for these in any context at all, but that would hugely increase the false positive rate.

It’s best to use something like this in dry-run to audit for passwords, without flooding developers with results from a published pattern. You can then spot places where hardcoded passwords are being used, and develop a specific pattern for that.

You could tighten it up by adding some constraints to the “before” part of the password, so that it has to be a key that looks “passwordy”, but that comes with the risk of missing some passwords. If you want to do that, you can take inspiration from the existing “generic password” pattern.