Being new to Microsoft Fabric I noticed that you have multiple options when writing notebooks using Python: run your code with PySpark (backed by a Spark cluster) or with Python (running natively on the notebook's compute). Both options look almost identical on the surface — you're still writing Python syntax either way — but under the hood they behave very differently, and picking the wrong one can cost you time, money, and unnecessary complexity. In this post I try to identify the key differences and give you some heuristics for deciding which engine to reach for. Python vs PySpark: what's actually different? When you select PySpark in a Fabric notebook, your code runs on a distributed Apache Spark cluster. Fabric spins up a cluster, distributes your data across multiple worker nodes, and executes transformations in parallel. The core abstraction is the DataFrame (or RDD), and operations are lazy — nothing actually runs until you trigger an action like .show() ...
Ever find yourself mapping multiple string values to the same result? Being a C# developer for a long time, I sometimes forget that the C# has evolved so I still dare to chain case labels or reach for a dictionary. Of course with pattern matching this is no longer necessary. With pattern matching, you can express things inline, declaratively, and with zero repetition. A small example I was working on a small script that should invoke different actions depending on the environment. As our developers were using different variations for the same environment e.g. "tst" alongside "test" , "prd" alongside "prod" . We asked to streamline this a long time ago, but as these things happen, we still see variations in the wild. This brought me to the following code that is a perfect example for pattern matching: The or keyword here is a logical pattern combinator , not a boolean operator. It matches if either of the specified pattern...