Databricks Cheat Sheet 1

Mayur Saparia
1 min readApr 8, 2023

--

Cluster Management

  • Create a cluster: Clusters > Create Cluster.
  • Edit cluster configuration: Clusters > Edit.
  • Terminate a cluster: Clusters > Terminate.

Notebook Basics

  • Create a notebook: Workspace > Create > Notebook.
  • Rename a notebook: Click on the notebook’s name and type the new name.
  • Delete a notebook: Right-click on the notebook and select Delete.
  • Run a cell: Click on the cell and press Shift + Enter.
  • Add a new cell: Click on the + button or press Ctrl + Enter.
  • Move a cell: Click on the up/down arrow buttons.
  • Copy a cell: Click on the copy button.
  • Delete a cell: Click on the delete button.

Data Management

  • Upload a file: Workspace > Upload Data.
  • Mount external storage: Workspace > Create > Mount.
  • Create a table: Workspace > Create > Table.
  • Browse tables: Data > Tables.
  • Create a view: Data > Views.
  • Query data: Use SQL or Spark code in a notebook.

Spark Basics

  • Create a Spark context: val sc = sparkContext.
  • Create a Spark session: val spark = SparkSession.builder().appName("MyApp").getOrCreate().
  • Read data: val df = spark.read.format("csv").option("header", "true").load("file.csv").
  • Write data: df.write.format("csv").mode("overwrite").save("output").
  • Transform data: Use Spark’s DataFrame API.
  • Aggregate data: Use Spark’s DataFrame API or SQL.
  • Join data: Use Spark’s DataFrame API or SQL.

Visualization

  • Plot data: Use the %matplotlib magic command or third-party libraries like plotly.
  • Show data: Use the display function.

Machine Learning

  • Import MLlib: import org.apache.spark.ml._.
  • Train a model: Use Spark’s MLlib API.
  • Evaluate a model: Use Spark’s MLlib API.

Additional Resources

--

--

Mayur Saparia
Mayur Saparia

Written by Mayur Saparia

Data engineering is my profession, making data available for analytics from various source is my responsibility. Passionate about Big data technology and cloud.

No responses yet