This article will show you how to import CSV files from Azure Data Lake Storage Gen 2 using the ABFS(S) protocol:

For more information about importing CSV files using pandas and working with different separators, date and numeric formats: Import Local CSV File into Pandas Dataframe.

Ok let’s dive into the steps:

Step 1: Install the package adfls: https://pypi.org/project/adlfs/. This package is needed to use the abfs protocol and authenticate using a Storage Account Name and Key.

Step 2: Get the file url with the abfs(s) protocol using a tool like the Azure Storage Explorer or the Azure Portal:

Step 3: Put the Account Name and Key in an options object. In production code make sure to save the key in Azure Key Vault! Find more information about how to read the Secret: Get Secret from Azure Key Vault in Python Notebook

storage_options={
    'account_name': 'xtremedlsg2',   
    'account_key' : 'z+FNV9o8V1sKdIZJdwouEVtpVdTgR3RpfY08QYACjuU+....'
} # Store secrets in Azure Key Vault!

Step 4: Call the read_csv() function passing in the fileurl + storage_options:

!pip install adlfs

import pandas as pd

storage_options={
    'account_name': 'xtremedlsg2',   
    'account_key' : 'z+FNV9o8V1sKdIZJdwouEVtpVdTgR3RpfY08QYACjuU+....'
} # Store secrets in Azure Key Vault!

# Load CSV File from ADLS Gen 2:
fileUrl = 'abfs://xtremefs/Files/Orders.csv'
df = pd.read_csv(fileUrl, storage_options = storage_options,  encoding='utf-8',sep=',')