Skip to main content

Posts

MarkItDown with Ollama–Process images inside documents

Yesterday I showed how we can use MarkItDown to convert multiple document types to markdown to make them easier consumable in an LLM context. I used a simple CV in  PDF format as an example. But what if you have images inside your documents? No worries! MarkItDown allows to process images inside documents as well. Although I couldn’t find a way to directly use and call it through the command line, it certainly is possible by writing some Python code. First make sure that you have the MarkItDown module installed locally: pip install 'markitdown[all]' Now we can first try to recreate our example from yesterday through code: Remark: The latest version of MarkItDown requires Python 3.10. If that works as expected, we can further extend this code to include an LLM to extract image data. We’ll use Ollama in combination with LLaVA (Large Language and Vision Assistant), a model designed to combine language and vision capabilities, enabling it to process and understand...
Recent posts

Convert documents to Markdown to build a RAG solution

Context is key in building effective AI enabled solutions. The most popular way to extend the pretrained knowledge set of a Large Language Model is through RAG, or Retrieval-Augmented Generation. By augmenting LLMs with external data we ensure that outputs are not only coherent but also factually grounded and up-to-date. This makes it invaluable for applications like chatbots, personalized recommendations, content creation, and decision support systems. What is MarkItDown? For any RAG solution to function effectively, the quality and format of the input data are critical. This is where MarkItDown, a lightweight Python utility created by Microsoft, stands out. It specializes in converting various files into Markdown format, a token-efficient and LLM-friendly structure. From the documentation : MarkItDown is a lightweight Python utility for converting various files to Markdown for use with LLMs and related text analysis pipelines. To this end, it is most comparable to textract , ...

VSCode - Change Python version

After installing the latest Python version on my local machine, I noticed that VSCode was still referring to an old(er) version. In this post I'll show how to fix this. Let's dive in! I installed a new Python version using the official installer: Download Python | Python.org . However when I tried to run a Python program in VSCode, I noticed that an older version was still used when I looked at the output in the terminal: & C:/Users/bawu/AppData/Local/Microsoft/WindowsApps/python3.9.exe d:/Projects/Test/MarkItDownImages/example.py Traceback (most recent call last):   File "d:\Projects\Test\MarkItDownImages\example.py", line 1, in <module>     from markitdown import MarkItDown ImportError: cannot import name 'MarkItDown' from 'markitdown' (C:\Users\bawu\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\markitdown\__init__.py) To fix it, open...

Using Bolt.new locally using Bolt.diy and Ollama

Maybe you’ve heared about Bolt.new , the AI solution from StackBlitz that allows you to prompt, edit, and deploy full-stack web and mobile applications in a breeze. It uses an in-browser AI web development agent that leverages StackBlitz’s WebContainers to allow for full stack application development. The application presents users with a simple, chat-based environment in which one prompts an agent to make code changes that are updated in real time in the WebContainers dev environment. I find it a great way to get a head start when building small(ler) web applications. But what if due to company policies or other reasons, you are not allowed to use bolt online? In that case I have some good news for you, as the team from StackBlitz also created bolt.diy, the open source version of Bolt.new which allows you to choose the LLM that you use for each prompt. Installing Bolt.diy The easiest way to install Bolt.diy is through Docker. Start by cloning the repository locally: g...

Tackling Technical Debt- Where to start?

Every software project accumulates technical debt. Like financial debt, it compounds over time if left unaddressed, making future changes increasingly difficult and expensive. But knowing where to begin tackling technical debt can be overwhelming.  As our time is limited, we have to choose wisely. I got inspired after watching Adam Tornhill  talk called Prioritizing Technical Debt as If Time & Money Matters. So before you continue reading this post, check out his great talk: Back? Ok, let’s first make sure that we agree on our definition of ‘technical debt’… Understanding Technical Debt Technical debt isn't just "bad code." It represents trade-offs made during development—shortcuts taken to meet deadlines, features implemented without complete understanding of requirements, or design decisions that made sense at the time but no longer fit current needs. Technical debt manifests in several ways: Code smells : Duplicated code, overly complex methods, an...

Discontinuous improvement

One of the mantra’s I always preached for my teams was the concept of 'Continuous Improvement'. The idea is simple and appealing: we constantly seek incremental enhancements to our processes, products, and services. This approach, popularized by Japanese manufacturing methodologies like Kaizen, promises steady progress through small, ongoing adjustments rather than dramatic overhauls. However while reading the ‘Leadership is language’ by L. David Marquet, I started to wonder; what if this widely accepted wisdom is fundamentally flawed? What if true improvement doesn't actually happen continuously at all? The stairway, not the ramp In his book, David explains that improvement doesn't occur as a smooth, uninterrupted climb upward. Rather, it happens in distinct, intentional batches - like climbing stairs instead of walking up a ramp. This is what he calls "discontinuous improvement," and understanding this concept can transform how your team operates. ...

VSCode - Expose a local API publicly using port forwarding

I’m currently working on building my own Copilot agent(more about this in another post). As part of the process, I needed to create an API and expose it publicly so it is accessible publicly through a GitHub app. During local development and debugging I don't want to have to publish my API, so let's look at how we can use the VS Code Port Forwarding feature to expose a local API publicly. Port forwarding Port forwarding is a networking technique that redirects communication requests from one address and port combination to another. In the context of web development and VS Code, here's what it means: When you run a web application or API locally, it's typically only accessible from your own machine at addresses like localhost:3000 or 127.0.0.1:8080 . Port forwarding creates a tunnel that takes requests coming to a publicly accessible address and forwards them to your local port. For example, if you have an API running locally on port 3000: Without port forw...