How to copy data from Azure SQL to Blob Storage as CSV file

Mayur Saparia
12 min readMar 23, 2020

--

In this blog we are going to Copy data from Source i.e. Azure SQL (This can be your on-premises SQL Server also since I didn’t have one so I have used Azure SQL) to Sink i.e. Azure blob storage, this is one of the basic and most common tasks that data engineers need to perform. Microsoft Azure gives you a lot of options to copy the data from a variety of sources and allows you to put them in your choice of sink.

This blog is going to be a little bit longer since I have a lot of screenshots that will help you to do the activity with ease.

So let’s get started!

Outline of the activity that we are going to perform.

  1. We are going to create Azure SQL instance with few sample tables which azure provides by default.
  2. We will create one storage account which will store our blob files
  3. Then we will create an Azure Data Factory instance that will help us to develop and deploy our Copy data pipeline.
  4. Once all of these steps are done we can see the CSV file in Azure blob storage.

We will divide the blog into 3 parts as described above.

Create Azure SQL instance with sample tables

  1. Create a new resource under the pratice_rg resource group (To learn how to create a resource group, check out the blog here). Click on “+ Add” highlighted in the below image.

2. After you land to New resource page you need to search “Azure SQL” and click on Azure SQL resource.

Azure SQL

3. You will land on the Azure SQL page, click on “Create” to create the Azure SQL resource.

Azure SQL create a resource

4. After you click on “Create” we need to select the “SQL Databases” service.

Select SQL databases

5. Fill up the “Project details” as per your Subscription, Resource group, Database name, Server (This you need to create in case it’s not created. Please refer to screenshot), Compute + Storage (default is OK), if you want to change make it to “Basic”.

Project details create a new DB server

6. After project details, you need to fill up networking details. In networking details, we need to select “Public endpoint” as the Connectivity method. Under Firewall rules toggle to “Yes” for “Allow Azure services and resources to access this server” & “Yes” for “Add current IP address”. After all the details are filled as per the screenshot click on “Additional settings”

Networking details

7. Additional settings select “Sample” for “Use existing data”, this will provide you with some sample data. We will be copying this sample data into blob storage as a CSV file. “Tags” are optional you can directly click on “Review + create”

Sample data

8. In the end, you will be shown all the details that you have filled up in the previous steps. Press “Create” and your resource will be deployed, once its deployed you will get the notification and you can click “Go to resource”

Review page for Azure SQL database resource deployed successfully

9. In the resource page i.e. Database page, you will get multiple options on the left-hand side pane, click on “Query editor” this will open a new window with query editor you need to provide login credentials that you provided while creating the resource.

Open query editor

10. In the query editor, you can give the SQL server query to see the data as per your need. You can see the sample tables on the left-hand side.

Sample data of DB in the query editor

This brings us to the end of part one i.e. creation of SQL server with sample data.

Create a Storage account — blob, file, table, queue.

The storage account will act as the sink in this blog. We will move the data from Azure SQL table to CSV file in this storage account.

  1. From the “Dashboard” go to “All resources” and search “Azure storage” in the search box and click on “Storage account — blob, file, table, queue”, this is similar to how we searched Azure SQL.
Storage account — blob, file, table, queue

2. Fill up the details in the “Create storage account” page as per the screenshot below. The subscription can be changed based on your subscription. We will be using the same “practice_rg” which we used while creating the Azure SQL database. Everything else can be left as default we don’t need to change anything else. Click on “Review + create”, this will validate your settings.

Create a storage account

3. Once you receive “Validation passed” notification, hit “Create” and you should be done with creating the Storage account.

Validation passed

4. Once the resource is deployed you will see the below screen. Click on “Go to resource”, to get into the “Storage account” page.

Go to resource

5. In the storage account home page, we need to create a container that will hold our blob files. Click on the “Containers” to create a new container.

Create a new container

6. In the “Containers” page click on “+ Container” to add a container that will store our blob file. On the right pane “New container” give the “Name” of the container and make the “Public access level” to “Blob (anonymous read access for blobs only)”. Hit “Create” and you should see the container in the list on the same page.

Add container

This brings us to the end of the second part, i.e. Setting up the Storage account for storing the blob file.

Now that we have successfully set up our Source and Sink let’s dive into creating the Copy data pipeline from Azure data factory.

What is Azure data factory?

Azure Data Factory is the platform that solves such data scenarios. It is the cloud-based ETL and data integration service that allows you to create data-driven workflows for orchestrating data movement and transforming data at scale. Using Azure Data Factory, you can create and schedule data-driven workflows (called pipelines) that can ingest data from disparate data stores. You can build complex ETL processes that transform data visually with data flows or by using compute services such as Azure HDInsight Hadoop, Azure Databricks, and Azure SQL Database.

Additionally, you can publish your transformed data to data stores such as Azure SQL Data Warehouse for business intelligence (BI) applications to consume. Ultimately, through Azure Data Factory, raw data can be organized into meaningful data stores and data lakes for better business decisions.

I hope the short introduction helped you get the feel of what Azure data factory is, if you want to get in more details please visit the link.

Let’s begin with the final part of this blog!

Setting up the Azure data factory.

  1. Go to “Dashboard > All resource”, search for Azure data factory and click on the resource.
Data Factory

2. After you click on the “Data factory” as shown in the above screenshot, another window will open where you need to fill up details for “New data factory”, after filling up the details click on “Create”.

New data factory

3. Data factory resources will be deployed instantly without much waiting time. Click on “Go to resource”, this will redirect you to Data factory home page

Data factory resource deployed successfully

4. In the data factory resource page, you will see “Resource group” and “Subscription” details. Below this, there are 2 options “Documentation” & “Author & Monitor”. We need to click on “Author & Monitor”, this will take us to another page where we will be able to create our copy data pipeline.

Select Author & Monitor

5. This page below is your Azure data factory home page from where you can Navigate between 3 options “Home”, “Author”, “Monitor” from the left side of the window. For us to create a copy data pipeline there is a direct option, which we can use directly and create the pipeline. Click on Copy data, this will take you to page where we will put all the required details to get the copy data pipeline up and running.

Azure data factory home page

6. Copy data pipeline is a 6 stage process, this will utilize the Azure SQL (Source), Storage account (Sink) that we have created at the start of the blog.

6.1. The first step is to give the name of pipeline, description, and task schedule, as we need to run this only once we will select the “Run once now” option. This option will run the pipeline once after the pipeline is validated and deployed. Hit “Next” after filling up all the details.

Set properties for the copy data task.

6.2. Source datastore has 2 sub-steps, first to set up a Connection and the other is selecting the dataset.

Why we need a connection? — Since the data factory independent component and is not aware of the data source and the data sink it needs connection details for the source and the sink, this achieved with the help of “Linked services”

Before you create a dataset, you must create a linked service to link your data store to the data factory.

What are “linked services”? — Linked services are much like connection strings, which define the connection information needed for Data Factory to connect to external resources.

What is a “dataset”? — A dataset is a named view of data that simply points or references the data you want to use in your activities as inputs and outputs.

Think of it this way; the dataset represents the structure of the data within the linked data stores, and the linked service defines the connection to the data source. For example, an Azure Storage linked service links a storage account to the data factory. An Azure Blob dataset represents the blob container and the folder within that Azure storage account that contains the input blobs to be processed.

Create new connection
Select Azure SQL Database

After selecting the “Azure SQL Database”, we need to provide connection details for the database this includes Linked service name, Connection string details.

(Note — These are the details which we had given during the creation of the SQL database server including user-id and password. A more secure way to handle user-id & password is using Azure key vault service for this blog we will go ahead with hardcoded user-id and password for the database.)

After filling all the details click on “Test connection”, if you have filled up details properly then you will see “Connection successful” message. If the connection is successful then click “Create” and your connection to the SQL database will be successfully established and you will be able to see the data from the database that you have mentioned in the connections.

New linked service (Azure SQL Database)

6.3. Since our connection was successful in the previous step now we will be able to browse the tables from the mentioned database in the linked service, in our case the DB name was “practice_db”. We will select the sample table “SalesLT.Product”, once you select the table we can “preview” the data at the bottom of the window and also check the “Schema” of the table. Click “Next”, this will take you to “Apply filter” stage.

Note: We can either copy all the data or you can use the custom query and fetch output records to the blob storage, for this blog we will be fetching all the records for a single table.

Select the table to copy the data to blob storage

6.4. You can leave the “Apply filter” as is and click “Next”, this will end the source connection setup and move to Destination (i.e. Sink).

6.5. Similar to the source linked services and dataset, we need to establish the connection for destination i.e. blob storage account, this is going to be fairly straight forwards since we just need to give connection details and a landing path where the blob file will be copied from SQL server.

Click on “Create new connection” and select “Azure Blob Storage”

Select Azure Blob Storage

6.6. Fill in the details for the New linked service for Azure Blob Storage as per the screenshot below and test your connection if the connection is successful then click on “Create”. Once the connection is created select the blob storage connection and hit enter.

Create new linked service for Blob storage
Select blob connection “AzureBlobStorageConn”

6.7. In dataset details, we will be giving “File format settings” (which is a text file, column delimiter will be a comma (,) & we will checkmark the Add header to file option) and specifying the output file or folder details. (Provide the folder path i.e the container name that we have created at the start and file name that we want in the blob storage)

Select the options as per the below screenshots.

File format settings
Choose the output file or folder

6.8. On the Settings page, select “Next” to use the default configurations.

6.9. On the “Summary” page, review all settings, and select “Next.

Summary

6.10. On the “Deployment complete” page, select “Monitor” to monitor the pipeline that you created.

Deployment page

6.11. The application switches to the “Monitor” tab. You see the status of the pipeline on this tab. Select “Refresh” to refresh the list. Click the link under “PIPELINE NAME” to view activity run details or rerun the pipeline.

Monitor tab

This brings us to the end of the 6 stage process for copying data from source to sink using Azure data factory!

Now the only thing left out is to check the CSV file that is created with the data from the table SalesLT.Products”.

7. Go back to protal.azure.com and open the “My Dashboard” and select the “Storage account — practicestorageblob”

My Dashboard

8. The screen will be switched to selected “Storage account” and under “containers” select the container that we had created at the start “productblobstorage”. Under this container, you will be able to see the CSV file that was create using the “Copy data” pipeline that we created in the “Azure data factory”.

CSV file created from Azure data factory using Copy data pipeline

9. Select the file and you will be able to download and check the data that’s present in the file.

Select the file and download to view the data
Data in the CSV file

10. Last but not the least, clean up all the resources that we have deployed. Since if we keep them up and running it will cost us, and since we have done this activity as part of self-learning it’s recommended to stop all the resources.

Clean up is very straight forward, we need to go to “My Dashboard” select the resource group under which all the resources have been deployed practice_rg in our case, and in the resource group page click on “Delete”. This will delete all the resources that are created under this resource group.

Deleting resource group

This is it, we have reached the end of the blog, hope this blog helps you learn how to copy data from Azure SQL to Blob storage using Azure data factory.

Please drop your questions/feedback in the comments section.

--

--

Mayur Saparia
Mayur Saparia

Written by Mayur Saparia

Data engineering is my profession, making data available for analytics from various source is my responsibility. Passionate about Big data technology and cloud.

Responses (2)