Simplifying data movement in Microsoft Fabric

If you started using Microsoft Fabric, one of the first things you want to do is to get data in the platform. You could of course, start creating your own Data Factory pipelines, but there is a less complicated alternative to get started; the Microsoft Fabric's Copy Job feature. It offers a streamlined, no-code solution that eliminates the need for complex pipeline development.

In this post, we'll explore what Copy Jobs are, why they matter, and how you can leverage them in your data workflows.

What is a Copy Job?

Copy Job is Microsoft Fabric Data Factory's answer to simplified data movement. It's a purpose-built solution designed to move data from various sources to multiple destinations without requiring you to build traditional data pipelines. Whether you're working with databases, cloud storage, or on-premises systems, Copy Job provides an intuitive, guided experience that handles the complexity for you.

At its core, Copy Job addresses a common challenge: moving data efficiently while maintaining reliability and tracking changes over time. The tool automatically manages the state of your data transfers, remembering what was copied in the last successful run and only moving what's new or changed in subsequent runs.

Key features

Multiple copy modes

Copy Job supports three primary delivery patterns to match your specific data movement needs:

Full Copy: Every execution copies all data from source to destination. This approach is ideal when you need a complete refresh of your data or when working with smaller datasets where incremental tracking isn't necessary.
Incremental Copy: After an initial full copy, subsequent runs only transfer new or changed data. This dramatically reduces processing time and resource consumption. For databases, you select an incremental column (typically a timestamp or auto-incrementing ID) that acts as a watermark. For storage systems, Copy Job automatically tracks file modification times.

Remark: When you have enabled Change Data Capture (CDC) on your source database, Copy Job can automatically detect and replicate inserted, updated, and deleted rows without requiring you to specify an incremental column.

User-friendly experience

Copy Job was designed with accessibility in mind. The guided wizard walks you through:

Selecting your source and destination connections
Choosing which tables, folders, or files to copy
Configuring copy mode and incremental settings
Setting up schedules or one-time runs
Reviewing and validating your configuration

No deep technical expertise is required, making it accessible to data engineers, analysts, and business users alike.

Why choose Copy Job over other options?

Microsoft Fabric offers multiple data movement approaches, so when should you use Copy Job versus alternatives like Mirroring or traditional Copy Activities in pipelines?

Use Copy Job when you:

Need to move data without transformations
Want a quick, low-code setup for regular data ingestion
Require incremental or CDC-based replication
Prefer a guided experience over custom pipeline development
Need to move data across multiple clouds or between on-premises and cloud

Consider alternatives when you:

Need complex transformations during data movement
Require orchestration of multiple activities in sequence
Want real-time, continuous replication (consider Mirroring for this)
Need extensive custom logic or error handling

The recent addition of Copy Job Activity bridges this gap, allowing you to orchestrate Copy Jobs within Data Factory pipelines when you need the simplicity of Copy Job combined with pipeline flexibility.

Setting up your first copy job

Let's walk through a practical example of creating a Copy Job that incrementally copies data from a database to a Lakehouse.

Prerequisites

A Microsoft Fabric workspace with appropriate permissions
Source data (database table with an incremental column or storage with files)
Destination configured (Lakehouse, Warehouse, or other supported target)

Step-by-Step Process

Create the Copy Job: In your Fabric workspace, select "+ New Item" and choose "Copy job" under the Get data section.

Configure Source: Select your source data store (for example, Azure SQL Database, Fabric Warehouse, or Azure Data Lake Storage Gen2). Provide connection details and credentials. For secure environments, you can use on-premises or virtual network gateways.

Select Data: Choose which tables, folders, or files to copy. Use the preview feature to verify you're selecting the right data.

Choose Copy Mode: Select between Full copy or Incremental copy. For incremental mode with databases, specify the incremental column that tracks changes. For storage, the system automatically uses file modification timestamps.

Configure Destination: Select or create your destination. You can choose to write to existing items or create new Lakehouses, Warehouses, or other targets.

Map and Review: Verify column mappings and review the job summary. Configure advanced settings if needed.

Set Schedule: Choose to run once or on a recurring schedule. You can adjust the frequency to match your data refresh requirements.

Save and Run: The Copy Job starts immediately if you select "Start data transfer immediately."

Conclusion

Copy Job makes a great starting point for moving data into Fabric. By abstracting away the complexity of pipeline development while maintaining power and flexibility, it enables to move data faster and with less technical overhead. Whether you're new to data integration or an experienced data engineer looking for efficiency gains, Copy Job offers a compelling solution for your data movement needs.

More information

What is Copy job in Data Factory - Microsoft Fabric | Microsoft Learn

Kubernetes–Limit your environmental impact

Reducing the carbon footprint and CO2 emission of our (cloud) workloads, is a responsibility of all of us. If you are running a Kubernetes cluster, have a look at Kube-Green . kube-green is a simple Kubernetes operator that automatically shuts down (some of) your pods when you don't need them. A single pod produces about 11 Kg CO2eq per year( here the calculation). Reason enough to give it a try! Installing kube-green in your cluster The easiest way to install the operator in your cluster is through kubectl. We first need to install a cert-manager: kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.14.5/cert-manager.yaml Remark: Wait a minute before you continue as it can take some time before the cert-manager is up & running inside your cluster. Now we can install the kube-green operator: kubectl apply -f https://github.com/kube-green/kube-green/releases/latest/download/kube-green.yaml Now in the namespace where we want t...

The art of simplicity

Search This Blog