Search code examples
repositorydatabricksazure-databricksrepoaws-databricks

Databricks Repo vs Workspace


I noticed that in Databricks, there is a folder section for 'Workspace' and a folder for 'Repos' - as seen below:

enter image description here

I have been trying to research online what the difference is, but no luck. It seems as though they serve the same purpose? I am able to manage source code in both.

Is there any difference between the two? And are there any best practices for which I should use - especially if I am working with a team.


Solution

    • Workspace is generally where you create and work on notebooks of different languages i.e., Python, SQL, Scala or R. You can also add libraries, new folders or an ML flow experiment

    enter image description here

    • As you can see, there is a Users section where there is the list of users. Once you configured the users, each of the user's resources can be easily managed.

    • Refer to this official Microsoft documentation which has detailed information about Databricks workspace.

    • Repos are basically used for GIT integration. You can add your repository as shown below:

    enter image description here

    • We use Repos whenever we want to work our GIT repositories and all the general GIT operations are supported.

    • Refer to this official Microsoft documentation to completely understand the capabilities of Databricks repos.

    • So, as far as I know, you choose Databricks Repos when your work includes development through GIT. Anything that does not involve GIT integration can be carried out through Databricks Workspace resources itself.