Excel files are extremely popular amongst business users and a lot of important data is saved in them. Importing these types of files can be tricky especially when they are saved in Sharepoint. This article will show you how to import Excel files from OneLake and SharePoint Online into a Pandas Dataframe.
Import from OneLake
Ok let’s dive into the steps to import a Excel sheet form OneLake. This is pretty straightforward:
Step 1: Install the package openpyxl
This package can be found here: https://pypi.org/project/openpyxl/. It is needed to work with the XLSX format.
Step 2: Get the Absolute or Relative fileurl
Save the Absolute or Relative fileurl to the Excel sheet in a variable. For more info on how to get this url: Import OneLake CSV File into Pandas DataFrame from MS Fabric Notebook.
# Relative File Url:
fileUrl= "/lakehouse/default/Files/Customers.xlsx"
Step 3: Read the Excel file into a Pandas Dataframe
Call the read_excel() function passing in the fileurl to convert the Excel sheet into a Pandas Dataframe:
import pandas as pd
# load xlsx file into DataFrame
dfCustomer = pd.read_excel(fileUrl, sheet_name="Customers", engine='openpyxl')
Import from SharePoint
A lot of times Excel sheets will be saved in Sharepoint Online or MS Teams. We need some extra steps and use the Client Side Object Model (CSOM) to download the file from SharePoint.
Step 1: Install the package openpyxl
This package can be found here: https://pypi.org/project/openpyxl/. It is needed to work with the XLSX format.
Step 2: Install the package Office365-REST-Python-Client
You can find more info about the package here: https://pypi.org/project/Office365-REST-Python-Client/. This package makes it easy to work with the CSOM without having to worry too much about the REST API details like OAuth authentication. For more information about this API: Get to know the SharePoint REST service | Microsoft Learn
Step 3: Download the file from SharePoint.
I created a Function for you to call to make this as simple as possible. Please save the credentials in Azure Key Vault, for more info: Get Secret from Azure Key Vault in Fabric Notebook with Python.
from office365.runtime.auth.user_credential import UserCredential
from office365.sharepoint.client_context import ClientContext
from io import BytesIO
import requests
def getExcelSheetFromSharePoint(site_url, file_url ):
# Authenticate with SharePoint: Save Credentials in Azure KeyVault!!
ctx = ClientContext(site_url).with_credentials(UserCredential("poweruser@xtreme.rocks", "P@ssw0rd"))
# Get the file info from SharePoint
file = ctx.web.get_file_by_server_relative_url(file_url)
print( file.time_last_modified)
# Get the file content from SharePoint
file_stream = file.open_binary_stream()
ctx.execute_query()
file_content = file_stream.value
# Load specific worksheet into pandas DataFrame
excel_file = BytesIO(file_content)
return excel_file
You can call this function from your Notebook passing in the site_url and file_url as parameters:
site_url = "https://arthurgraus.sharepoint.com/sites/xtreme"
file_url = "/SalesDataLibrary/Customers.xlsx"
excel_file = getExcelSheetFromSharePoint(site_url, file_url )
Step 4: Read the Excel file into a Pandas DataFrame.
This step is the same like with the OneLake file:
df = pd.read_excel(excel_file, sheet_name="Customers", engine='openpyxl')