was just wondering whether anyone had any thoughts on best practices when working in databricks. It is financially costing a lot to develop within databricks, hence would like to know where else it would be best to develop python code in. With thought also to collaborative work, is there a similar set up to databricks for collaborative work that is free or of little cost to use.
Any suggestions, greatly appreciated!
The cost of Databricks is really related to the size of the clusters you are running (1 worker, 1 driver or 1 driver 32 workers?), the spec of the machines in the cluster (low RAM and CPU or high RAM and CPU), and how long you leave them running (always running or short time to live, aka "Terminate after x minutes of inactivity". I am also assuming you are not running the always on High Concurrency cluster mode.
Some general recommendations would be:
Obviously there is a trade-off in assembling representative samples and making sure your outputs are still accurate and useful but that's up to you.