Search code examples
javaapache-sparkdatabricksdatabricks-connect

How do I connect to and write a csv file to a remote instance of Databricks Apache Spark from Java?


I'm trying to connect to a remote instance of Databricks and write a csv file to a specific folder of the DBFS. I can find bits and pieces here and there but I'm not seeing how to get this done. How do I add the file to DBFS on a remote Databricks instance from a Java program running on my local machine?

I'm currently using a community instance I created from here: https://databricks.com/try-databricks

This is the url for my instance (I'm guessing the "o=7823909094774610" is identifying my instance).
https://community.cloud.databricks.com/?o=7823909094774610

Here's some of the resources I'm looking at trying to resolve this but I'm still not able to get off of the ground:


Solution

  • You could take a look at the DBFS REST API, and consider using that in your Java application.

    If a Java solution is not required, then you could also take a look at the databricks-cli. After installing it with pip (pip install databricks-cli) you simply have to:

    1. Configure the CLI by running: databricks configure
    2. Copy the file to DBFS by running: databricks fs cp <source> dbfs:/<target>