Skip to main content

How to work with OneLake files locally using Python

Last week I shared how you could use the  OneLake File Explorer to sync your Lakehouse tables to your local machine. It's a convenient way to get your Parquet and Delta Lake files off the cloud and onto disk — but what do you actually do with them once they're there?

In this post, I’ll walk you through how to interact with your locally synced OneLake files using Python. We'll cover four practical approaches, with real code you can drop straight into a notebook.

Where are your files?

When OneLake File Explorer syncs your files, they land in a path that looks something like this:

C:\Users\<you>\OneLake - <workspace name>\<lakehouse name>.Lakehouse\Tables\<table name>

Keep that path in mind— you'll be passing it into every example below. Delta Lake tables are stored as folders containing multiple Parquet files plus a _delta_log/ directory, so make sure you're pointing at the table's root folder, not an individual file.

Reading a parquet file using Pandas

If you just need to load a parquet file into a DataFrame for analysis or transformation, pandas is the fastest way to get started.

Make sure you have Aandas and PyArrow installed:

pip install pandas
pip install pyarrow

Now you can run the following Python code

import pandas as pd

# Read a single Parquet file
df = pd.read_parquet(r"C:\Users\you\OneLake - MyWorkspace\MyLakehouse.Lakehouse\Tables\orders\part-00001.parquet")

print(df.head())
print(df.dtypes)

Pandas uses PyArrow under the hood for Parquet reading, so you get solid type handling and decent performance for most workloads. For large tables, be mindful of memory — it loads everything into RAM.



Reading delta tables using delta-rs

If your table is in Delta format (which is the default in Fabric Lakehouses), you'll want delta-rs to take advantage of Delta-specific features like schema enforcement, versioning, and time travel.

First install deltalake:

pip install deltalake

Now you can use this library to query the delta table:

from deltalake import DeltaTable

table_path = r"C:\Users\you\OneLake - MyWorkspace\MyLakehouse.Lakehouse\Tables\orders"

# Load the latest version
dt = DeltaTable(table_path)
df = dt.to_pandas()

# Check table metadata
print(dt.schema())
print(f"Current version: {dt.version()}")

Where delta-rs really shines is time travel — you can read any historical version of your table:

# Read a specific version

dt_v2 = DeltaTable(table_path, version=2)
df_v2 = dt_v2.to_pandas()

# Or go back to a point in time
from datetime import datetime
dt_yesterday = DeltaTable(table_path, version=datetime(2026, 2, 22))
df_yesterday = dt_yesterday.to_pandas()

This is invaluable when you're debugging a pipeline and need to see what the data looked like before a bad write landed.

That’s it!

Popular posts from this blog

Podman– Command execution failed with exit code 125

After updating WSL on one of the developer machines, Podman failed to work. When we took a look through Podman Desktop, we noticed that Podman had stopped running and returned the following error message: Error: Command execution failed with exit code 125 Here are the steps we tried to fix the issue: We started by running podman info to get some extra details on what could be wrong: >podman info OS: windows/amd64 provider: wsl version: 5.3.1 Cannot connect to Podman. Please verify your connection to the Linux system using `podman system connection list`, or try `podman machine init` and `podman machine start` to manage a new Linux VM Error: unable to connect to Podman socket: failed to connect: dial tcp 127.0.0.1:2655: connectex: No connection could be made because the target machine actively refused it. That makes sense as the podman VM was not running. Let’s check the VM: >podman machine list NAME         ...

Azure DevOps/ GitHub emoji

I’m really bad at remembering emoji’s. So here is cheat sheet with all emoji’s that can be used in tools that support the github emoji markdown markup: All credits go to rcaviers who created this list.

VS Code Planning mode

After the introduction of Plan mode in Visual Studio , it now also found its way into VS Code. Planning mode, or as I like to call it 'Hannibal mode', extends GitHub Copilot's Agent Mode capabilities to handle larger, multi-step coding tasks with a structured approach. Instead of jumping straight into code generation, Planning mode creates a detailed execution plan. If you want more details, have a look at my previous post . Putting plan mode into action VS Code takes a different approach compared to Visual Studio when using plan mode. Instead of a configuration setting that you can activate but have limited control over, planning is available as a separate chat mode/agent: I like this approach better than how Visual Studio does it as you have explicit control when plan mode is activated. Instead of immediately diving into execution, the plan agent creates a plan and asks some follow up questions: You can further edit the plan by clicking on ‘Open in Editor’: ...