GitHub Changes Token Format to Improve Identifiability, Secret Scanning, and Entropy

GitHub has recently moved to a new format for all of its tokens, including personal access, OAuth access, user-to-server and server-to-server, and refresh tokens. As GitHub engineer Heather Harvey explains, the new format aims to make tokens more easily identifiable, including when scanning repos for secrets, and to increase their entropy.

GitHub uses a number of different tokens to control access to its APIs: the personal access token, used for authentication instead of using username and password; the OAuth Access Token, that implements the OAuth 2.0 protocol for apps that do not have access to a Web browser; the GitHub App User-to-Server Token and the GitHub App Server-to-Server Token, used to grant access to a repo for a GitHub app on behalf of a user; and the Refresh Token, used to refresh a user-to-server token.

From the outside, the changes to the token format appear to be pretty minor, with only a new three-character prefix and extending the allowed character set. Those changes, though, says Harvey, lead to a couple of desirable properties.

First off, the new three-letter prefix improves token identifiability. For example, the ghp prefix will be used with GitHub personal access token while gho will prefix OAuth access tokens. The first two letters in a token prefix identify the company that created the token, while the third letter specifies the kind of token. Other prefixes in use at GitHub are ghu for user-to-server tokens, ghs for server-to-server tokens, and ghr for refresh tokens.

With this prefix alone, we anticipate the false positive rate for secret scanning will be down to 0.5%.

Less visibly, GitHub has also decided to utilize the last six characters in a token for a 32 bit checksum, with the aim to make secret scanning even more reliable:

We start the implementation with a CRC32 algorithm, a standard checksum algorithm. We then encode the result with a Base62 implementation, using leading zeros for padding as needed.

If we count the three-letter prefix, plus the underscore separator, and the checksum, the useful length for tokens gets reduced by ten characters, which has a negative impact on the number of unique tokens that can be created. To counter that, GitHub extended the allowed character set for tokens, while keeping their length the same. While the old character set only enabled the representation of hexadecimal numbers, the new one includes lowercase and uppercase letters as well as decimal numbers from 0 to 9. As a result of all these changes, GitHub tokens have a higher entropy now. For example, in the case of OAuth tokens, the entropy went from 160 to 178.

These changes are completely transparent to users and integrators, if we leave aside the fact they might want to reset personal access and OAuth tokens to use the new format right now. GitHub, though, has also announced its plan to support tokens of up to 255 characters after June 1, 2021, which could require integrators to add specific support for them.

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

InfoQ Article Contest

Rate this Article

This content is in the Source Code topic

Related Topics:

Related Editorial

Related Sponsored Content

Popular across InfoQ

The InfoQ Newsletter