I'm evaluating detect-secrets and I'm not sure why I get different results from detect-secrets and the hook.
Here is a log of a simplification:
$ cat docs/how-to-2.md
AZ_STORAGE_CS="DefaultEndpointsProtocol=https;AccountName=storageaccount1234;AccountKey=1OM7c6u5Ocp/zyUMWcRChowzd8czZmxPhzHZ8o45X7tAryr6JFF79+zerFFQS34KzVTK0yadoZGkvZh42A==;EndpointSuffix=core.windows.net"
$ detect-secrets scan --string $(cat docs/how-to-2.md)
AWSKeyDetector : False
ArtifactoryDetector : False
Base64HighEntropyString: True (5.367)
BasicAuthDetector : False
CloudantDetector : False
HexHighEntropyString : False
IbmCloudIamDetector : False
IbmCosHmacDetector : False
JwtTokenDetector : False
KeywordDetector : False
MailchimpDetector : False
PrivateKeyDetector : False
SlackDetector : False
SoftlayerDetector : False
StripeDetector : False
TwilioKeyDetector : False
$ detect-secrets-hook docs/how-to-2.md
$ detect-secrets-hook --baseline .secrets.baseline docs/how-to-2.md
I would have expected that detect-secrets-hook
would tell me about that Azure storage account key that has high entropy.
more details about the baseline:
$ cat .secrets.baseline
{
"custom_plugin_paths": [],
"exclude": {
"files": null,
"lines": null
},
"generated_at": "2020-10-09T10:06:54Z",
"plugins_used": [
{
"name": "AWSKeyDetector"
},
{
"name": "ArtifactoryDetector"
},
{
"base64_limit": 4.5,
"name": "Base64HighEntropyString"
},
{
"name": "BasicAuthDetector"
},
{
"name": "CloudantDetector"
},
{
"hex_limit": 3,
"name": "HexHighEntropyString"
},
{
"name": "IbmCloudIamDetector"
},
{
"name": "IbmCosHmacDetector"
},
{
"name": "JwtTokenDetector"
},
{
"keyword_exclude": null,
"name": "KeywordDetector"
},
{
"name": "MailchimpDetector"
},
{
"name": "PrivateKeyDetector"
},
{
"name": "SlackDetector"
},
{
"name": "SoftlayerDetector"
},
{
"name": "StripeDetector"
},
{
"name": "TwilioKeyDetector"
}
],
"results": {
".devcontainer/Dockerfile": [
{
###obfuscated###
}
],
"deployment/export-sp.sh": [
{
###obfuscated###
}
],
"docs/pip-install-from-artifacts-feeds.md": [
{
###obfuscated###
}
]
},
"version": "0.14.3",
"word_list": {
"file": null,
"hash": null
}
}
This is definitely peculiar behavior, but after some investigation, I realize that you've stumbled upon an edge case of the tool.
HighEntropyStringPlugin
supports a limited set of characters (not including ;
)HighEntropyStringPlugin
leverages the heuristic that strings are quoted in certain contexts.Therefore, the functionality differs: when run through detect-secrets-hook
, it does not parse the string accordingly due to the existence of ;
. However, when run through detect-secrets scan --string
, it does not require quotes, and breaks the string up.
HighEntropyString tests are pretty noisy, if not aggressively pruned for false positives. One way it attempts to do this is via applying a rather strict regex (source), which requires it to be inside quotes. However, for certain contexts, this quoted requirement is removed (e.g. YAML files, and inline string scanning).
When this quoted requirement is removed, we get the following breakdown:
>>> line = 'AZ_STORAGE_CS="DefaultEndpointsProtocol=https;AccountName=storageaccount1234;AccountKey=1OM7c6u5Ocp/zyUMWcRChowzd8czZmxPhzHZ8o45X7tAryr6JFF79+zerFFQS34KzVTK0yadoZGkvZh42A==;EndpointSuffix=core.windows.net"'
>>> with self.non_quoted_string_regex(is_exact_match=False):
... self.regex.findall(line)
['AZ_STORAGE_CS=', 'DefaultEndpointsProtocol=https', 'AccountName=storageaccount1234', 'AccountKey=1OM7c6u5Ocp/zyUMWcRChowzd8czZmxPhzHZ8o45X7tAryr6JFF79+zerFFQS34KzVTK0yadoZGkvZh42A==', 'EndpointSuffix=core', 'windows', 'net']
When we do so, we can see that the AccountKey=1OM7c6u5Ocp/zyUMWcRChowzd8czZmxPhzHZ8o45X7tAryr6JFF79+zerFFQS34KzVTK0yadoZGkvZh42A==
would trigger the base64 plugin, as demonstrated below:
$ detect-secrets scan --string 'AccountKey=1OM7c6u5Ocp/zyUMWcRChowzd8czZmxPhzHZ8o45X7tAryr6JFF79+zerFFQS34KzVTK0yadoZGkvZh42A=='
AWSKeyDetector : False
ArtifactoryDetector : False
Base64HighEntropyString: True (5.367)
BasicAuthDetector : False
CloudantDetector : False
HexHighEntropyString : False
IbmCloudIamDetector : False
IbmCosHmacDetector : False
JwtTokenDetector : False
KeywordDetector : False
MailchimpDetector : False
PrivateKeyDetector : False
SlackDetector : False
SoftlayerDetector : False
StripeDetector : False
TwilioKeyDetector : False
However, when this quoted requirement is applied, then the entire string payload is scanned as one potential secret: DefaultEndpointsProtocol=https;AccountName=storageaccount1234;AccountKey=1OM7c6u5Ocp/zyUMWcRChowzd8czZmxPhzHZ8o45X7tAryr6JFF79+zerFFQS34KzVTK0yadoZGkvZh42A==;EndpointSuffix=core.windows.net
This doesn't get flagged because it fails the original base64 regex rules, which doesn't know how to handle ;
.
>>> self.regex.findall(line)
[]
Therefore, this functionality differs, though, not immediately apparent through the invocation pattern described.
This is a much more challenging question, since allowing other characters would change the entropy calculation, and the probability of flagging strings. There has been some discussion around creating a plugin for all characters, but the team has not yet been able to decide on a default entropy limit for this.