Search code examples
curlansiblebase64databricks

ansible.builtin.uri expects a base64 body


I have an Ansible task that performs a curl on the databricks API. Here's what it looks like :

---
- name: Databricks dbfs upload
  shell: "curl -H 'Authorization: Bearer {{ token }}' \
    -F filedata=config-file/application.conf \
    -F path=/destpath/on/dbfs/application.conf \
    https://<instance_name>.azuredatabricks.net/api/2.0/dbfs/put"

I want to replace that curl with the ansible.builtin.uri method, but I encounter the following error:

{
  ...
  "json": {
    "error_code": "MALFORMED_REQUEST", 
    "message": "Could not parse request object: Unexpected end of base64-encoded String: base64 variant 'MIME-NO-LINEFEEDS' expects padding (one or more '=' characters) at the end. This Base64Variant might have been incorrectly configured\n at [Source: (ByteArrayInputStream); line: 1, column: 36]\n at [Source: java.io.ByteArrayInputStream@35649c34; line: 1, column: 36]"
  }, 
  "msg": "Status code was 400 and not [200]: HTTP Error 400: Bad Request"
  ...
}

Here is what my new databricks_upload.yml file looks like :

---
- name: Databricks dbfs upload
  ansible.builtin.uri:
    url: https://<instance_name>.azuredatabricks.net/api/2.0/dbfs/put
    method: POST
    return_content: true
    headers:
      Authorization: "Bearer {{ token }}"
    body_format: json
    body: 
      contents: config-file/application.conf
      path: /destpath/on/dbfs/application.conf

where config-file/application.conf refers to a local path, i.e. where the playbook that calls databricks_upload.yml is located.

Here is what I tried so far :

  • renaming config-file/application.conf, because the underscore and the dot generate base64 encoding errors as well
  • changing Ansible version : Ansible-core 2.12 and 2.14
  • putting the body in an actual json like this : body: "{{ lookup('ansible.builtin.file','body.json') }}"

But none of these tries had success.

I don't understand why it would expect that type of encoding there, and I am running out of ideas. What am I missing?


Solution

  • First and foremost, as mentioned by Zeitounator, body_format must be set to form-multipart.

    But also, the way that fields must be passed in the body is — rigidly — codified. In our case, the Databricks API needs two fields : contents and path.

    • contents is the file to be POST-ed. As a file, the reference to its path must be passed as filename.
    • path is the path of the new file in dbfs. The string must be passed inside a content field (careful: it has nothing to do with contents mentioned above)

    The doc alludes to this but only in the examples : it makes it hard to know whether this applies in general or only in the example.

    Taking all of this into account, here is what it must look like :

    ---
    - name: Databricks dbfs upload
      ansible.builtin.uri:
        url: "https://<instance_name>.azuredatabricks.net/api/2.0/dbfs/put"
        method: POST
        return_content: true
        headers:
          Authorization: "Bearer {{ token }}"
        body: 
          contents: 
            filename: "config-file/application.conf"
          path: 
            content: "/destpath/on/dbfs/application.conf"
        body_format: form-multipart
    

    EDIT

    I don't fully understand this, but here's what I observed :

    • Without further specification, the uploaded file will be base64-encoded, and I'll have to handle the decoding on dbfs, which is a bit annoying.
    • However, as specified on the Databricks API doc, it is possible to pass contents as base64 string. So if I do the encoding myself (see snippet below), the Databricks API is able to deal with it, and will return the decoded file

    Finally, syntax-wise, I found that confusing filename and content keywords could be left out by adequately using ansible.builtin.file. So the following two are equivalent :

      body: 
        contents: 
          filename: "config-file/application.conf"
        path:
          content : "/destpath/on/dbfs/application.conf"
    
      body: 
        contents: "{{ lookup('ansible.builtin.file', 'config-file/application.conf') }}"
        path : "/destpath/on/dbfs/application.conf"
    

    So here's my final attempt :

    ---
    - name: Databricks dbfs upload
      ansible.builtin.uri:
        url: "https://<instance_name>.azuredatabricks.net/api/2.0/dbfs/put"
        method: POST
        return_content: true
        headers:
          Authorization: "Bearer {{ token }}"
        body: 
          contents: "{{ lookup('ansible.builtin.file', 'config-file/application.conf')
     |b64encode }}"
          path: "/destpath/on/dbfs/application.conf"
        body_format: form-multipart