Databricks Cheat Sheet 1

Mayur Saparia
1 min readApr 8, 2023

--

Cluster Management

  • Create a cluster: Clusters > Create Cluster.
  • Edit cluster configuration: Clusters > Edit.
  • Terminate a cluster: Clusters > Terminate.

Notebook Basics

  • Create a notebook: Workspace > Create > Notebook.
  • Rename a notebook: Click on the notebook’s name and type the new name.
  • Delete a notebook: Right-click on the notebook and select Delete.
  • Run a cell: Click on the cell and press Shift + Enter.
  • Add a new cell: Click on the + button or press Ctrl + Enter.
  • Move a cell: Click on the up/down arrow buttons.
  • Copy a cell: Click on the copy button.
  • Delete a cell: Click on the delete button.

Data Management

  • Upload a file: Workspace > Upload Data.
  • Mount external storage: Workspace > Create > Mount.
  • Create a table: Workspace > Create > Table.
  • Browse tables: Data > Tables.
  • Create a view: Data > Views.
  • Query data: Use SQL or Spark code in a notebook.

Spark Basics

  • Create a Spark context: val sc = sparkContext.
  • Create a Spark session: val spark = SparkSession.builder().appName("MyApp").getOrCreate().
  • Read data: val df = spark.read.format("csv").option("header", "true").load("file.csv").
  • Write data: df.write.format("csv").mode("overwrite").save("output").
  • Transform data: Use Spark’s DataFrame API.
  • Aggregate data: Use Spark’s DataFrame API or SQL.
  • Join data: Use Spark’s DataFrame API or SQL.

Visualization

  • Plot data: Use the %matplotlib magic command or third-party libraries like plotly.
  • Show data: Use the display function.

Machine Learning

  • Import MLlib: import org.apache.spark.ml._.
  • Train a model: Use Spark’s MLlib API.
  • Evaluate a model: Use Spark’s MLlib API.

Additional Resources

--

--

Mayur Saparia

Data engineering is my profession, making data available for analytics from various source is my responsibility. Passionate about Big data technology and cloud.