Search code examples
google-cloud-platformgoogle-cloud-dlp

How to preserve line breaks in text content de-identified with Data Loss Prevention?


I am using an API call content.deidentify to de-identify text content. It is working as expected, but newline characters get stripped.

API call

curl -s \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  https://dlp.googleapis.com/v2/projects/$PROJECT_ID/content:deidentify \
  -d @text-request.json \
  > text-response.json

Input

Eleanor Rigby 
Pharmacist 
Liverpool Hospital 
[email protected]

Output

{
  "item": {
    "value": "Eleanor Rigby Pharmacist Liverpool Hospital [email-address]"
  },
  "overview": {
    ...
  }
}

Is there any option I can add to the request to preserve the line breaks?

I found setPrettyPrint in the Java client documentation. Can I use this option when calling the API directly?


Solution

  • The issue had nothing to do with DLP.

    I was sending invalid JSON:

    {
        "item": {
            "value": "Eleanor Rigby 
    Pharmacist 
    Liverpool Hospital 
    [email protected]"
        },
        "deidentifyConfig": {
            ...
        }
    }
    

    Replacing the newline characters with \n solved the problem.