Search code examples
pysparkdatabricksazure-databricks

Databricks shows REDACTED on a hardcoded value


I am using Azure Databricks to take the environment value from the Azure Key Vault which has the value intg.

env = dbutils.secrets.get(scope = "myscrope", key = "environment")

When I print this, it shows as [REDACTED], which is expected.

Now I declare another variable as below.

prm = 'myintgterritoy'

When I print this, it is showing as my[REDACTED]territoy as the intg keyword is in this. I am not expecting this behaviour as this is entirely a different variable. How can I get the value as myintgterritoy?

I tried an approach where I can put the actual value with a whitespace using the below code to a new variable.

new_prm = ''
for char in prm:
  new_prm += char + ' '

But when I replace space with empty string, it is giving the result back as my[REDACTED]territoy.

new_prm.replace(' ','')

I am expecting the output as myintgterritoy.


Solution

  • It's not possible, Databricks just scans entire output for occurences of secret values and replaces them with "[REDACTED]".

    It is helpless if you transform the value. For example, like you tried already, you could insert spaces between characters and that would reveal the value. You can use a trick with an invisible character - for example Unicode invisible separator, which is encoded as 0xE281A3 in UTF-8.

    invisible_sep = bytes.fromhex("E281A3").decode("utf-8")
    secret = dbutils.secrets.get("myscrope", "environment")
    plaintextSecret = secret.replace("", invisible_sep)
    print(secret)  # would print "[REDACTED]"
    print(plaintextSecret)  # would print "intg"
    

    Note that what looks like "intg" in reality is "<sep>i<sep>n<sep>t<sep>g<sep>" - so the separator is still there, just invisible, and will remain if you eg. copy-paste the value.

    Better, use secret scope for what really is a secret and for other variables pick other tools like Azure App Configuration or Consul KV or maybe just a config file.