Skip to main content

How to work with OneLake files locally using Python

Last week I shared how you could use the  OneLake File Explorer to sync your Lakehouse tables to your local machine. It's a convenient way to get your Parquet and Delta Lake files off the cloud and onto disk — but what do you actually do with them once they're there?

In this post, I’ll walk you through how to interact with your locally synced OneLake files using Python. We'll cover four practical approaches, with real code you can drop straight into a notebook.

Where are your files?

When OneLake File Explorer syncs your files, they land in a path that looks something like this:

C:\Users\<you>\OneLake - <workspace name>\<lakehouse name>.Lakehouse\Tables\<table name>

Keep that path in mind— you'll be passing it into every example below. Delta Lake tables are stored as folders containing multiple Parquet files plus a _delta_log/ directory, so make sure you're pointing at the table's root folder, not an individual file.

Reading a parquet file using Pandas

If you just need to load a parquet file into a DataFrame for analysis or transformation, pandas is the fastest way to get started.

Make sure you have Aandas and PyArrow installed:

pip install pandas
pip install pyarrow

Now you can run the following Python code

import pandas as pd

# Read a single Parquet file
df = pd.read_parquet(r"C:\Users\you\OneLake - MyWorkspace\MyLakehouse.Lakehouse\Tables\orders\part-00001.parquet")

print(df.head())
print(df.dtypes)

Pandas uses PyArrow under the hood for Parquet reading, so you get solid type handling and decent performance for most workloads. For large tables, be mindful of memory — it loads everything into RAM.



Reading delta tables using delta-rs

If your table is in Delta format (which is the default in Fabric Lakehouses), you'll want delta-rs to take advantage of Delta-specific features like schema enforcement, versioning, and time travel.

First install deltalake:

pip install deltalake

Now you can use this library to query the delta table:

from deltalake import DeltaTable

table_path = r"C:\Users\you\OneLake - MyWorkspace\MyLakehouse.Lakehouse\Tables\orders"

# Load the latest version
dt = DeltaTable(table_path)
df = dt.to_pandas()

# Check table metadata
print(dt.schema())
print(f"Current version: {dt.version()}")

Where delta-rs really shines is time travel — you can read any historical version of your table:

# Read a specific version

dt_v2 = DeltaTable(table_path, version=2)
df_v2 = dt_v2.to_pandas()

# Or go back to a point in time
from datetime import datetime
dt_yesterday = DeltaTable(table_path, version=datetime(2026, 2, 22))
df_yesterday = dt_yesterday.to_pandas()

This is invaluable when you're debugging a pipeline and need to see what the data looked like before a bad write landed.

That’s it!

Popular posts from this blog

Kubernetes–Limit your environmental impact

Reducing the carbon footprint and CO2 emission of our (cloud) workloads, is a responsibility of all of us. If you are running a Kubernetes cluster, have a look at Kube-Green . kube-green is a simple Kubernetes operator that automatically shuts down (some of) your pods when you don't need them. A single pod produces about 11 Kg CO2eq per year( here the calculation). Reason enough to give it a try! Installing kube-green in your cluster The easiest way to install the operator in your cluster is through kubectl. We first need to install a cert-manager: kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.14.5/cert-manager.yaml Remark: Wait a minute before you continue as it can take some time before the cert-manager is up & running inside your cluster. Now we can install the kube-green operator: kubectl apply -f https://github.com/kube-green/kube-green/releases/latest/download/kube-green.yaml Now in the namespace where we want t...

Azure DevOps/ GitHub emoji

I’m really bad at remembering emoji’s. So here is cheat sheet with all emoji’s that can be used in tools that support the github emoji markdown markup: All credits go to rcaviers who created this list.

Podman– Command execution failed with exit code 125

After updating WSL on one of the developer machines, Podman failed to work. When we took a look through Podman Desktop, we noticed that Podman had stopped running and returned the following error message: Error: Command execution failed with exit code 125 Here are the steps we tried to fix the issue: We started by running podman info to get some extra details on what could be wrong: >podman info OS: windows/amd64 provider: wsl version: 5.3.1 Cannot connect to Podman. Please verify your connection to the Linux system using `podman system connection list`, or try `podman machine init` and `podman machine start` to manage a new Linux VM Error: unable to connect to Podman socket: failed to connect: dial tcp 127.0.0.1:2655: connectex: No connection could be made because the target machine actively refused it. That makes sense as the podman VM was not running. Let’s check the VM: >podman machine list NAME         ...