Databricks Cheat Sheet 1
1 min readApr 8, 2023
Cluster Management
- Create a cluster:
Clusters > Create Cluster
. - Edit cluster configuration:
Clusters > Edit
. - Terminate a cluster:
Clusters > Terminate
.
Notebook Basics
- Create a notebook:
Workspace > Create > Notebook
. - Rename a notebook: Click on the notebook’s name and type the new name.
- Delete a notebook: Right-click on the notebook and select
Delete
. - Run a cell: Click on the cell and press
Shift + Enter
. - Add a new cell: Click on the
+
button or pressCtrl + Enter
. - Move a cell: Click on the up/down arrow buttons.
- Copy a cell: Click on the copy button.
- Delete a cell: Click on the delete button.
Data Management
- Upload a file:
Workspace > Upload Data
. - Mount external storage:
Workspace > Create > Mount
. - Create a table:
Workspace > Create > Table
. - Browse tables:
Data > Tables
. - Create a view:
Data > Views
. - Query data: Use SQL or Spark code in a notebook.
Spark Basics
- Create a Spark context:
val sc = sparkContext
. - Create a Spark session:
val spark = SparkSession.builder().appName("MyApp").getOrCreate()
. - Read data:
val df = spark.read.format("csv").option("header", "true").load("file.csv")
. - Write data:
df.write.format("csv").mode("overwrite").save("output")
. - Transform data: Use Spark’s DataFrame API.
- Aggregate data: Use Spark’s DataFrame API or SQL.
- Join data: Use Spark’s DataFrame API or SQL.
Visualization
- Plot data: Use the
%matplotlib
magic command or third-party libraries likeplotly
. - Show data: Use the
display
function.
Machine Learning
- Import MLlib:
import org.apache.spark.ml._
. - Train a model: Use Spark’s MLlib API.
- Evaluate a model: Use Spark’s MLlib API.
Additional Resources
- Databricks documentation: https://docs.databricks.com
- Databricks Community: https://community.databricks.com