Skip to main content

MarkItDown with Ollama–Process images inside documents

Yesterday I showed how we can use MarkItDown to convert multiple document types to markdown to make them easier consumable in an LLM context. I used a simple CV in  PDF format as an example.

But what if you have images inside your documents?

No worries! MarkItDown allows to process images inside documents as well. Although I couldn’t find a way to directly use and call it through the command line, it certainly is possible by writing some Python code.

First make sure that you have the MarkItDown module installed locally:

pip install 'markitdown[all]'

Now we can first try to recreate our example from yesterday through code:

Remark: The latest version of MarkItDown requires Python 3.10.

If that works as expected, we can further extend this code to include an LLM to extract image data. We’ll use Ollama in combination with LLaVA (Large Language and Vision Assistant), a model designed to combine language and vision capabilities, enabling it to process and understand both text and images.

First we need to install some extra packages:

pip install openai

We update our code to use this modules:

Processing images

I first tried a small example image:

And here is the result I got back after processing the image:

# Description:
Three adorable puppies stand together as if they are part of a larger family or pack, looking out with a hint of anticipation and a strong sense of curiosity. They seem to be young pups, possibly Corgis or Terriers given their short legs and distinctive ears. The backdrop is plain and white, directing all attention to the puppies. In front of them on a tabletop, there appears to be an object that could represent knowledge or learning, perhaps indicating that these puppies are in a comfortable, domestic environment designed for their care and education.

OK, that seems to work.

Processing images in docx

Let’s now try this with a word document containing images:

Here is the word document I tried:

And here is the result:

Unfortunately it seems that MarkItDown didn’t process the embedded images in the docx but it could keep the document structure intact.

Processing images in pptx

Let’s give it another try, this time with a PPTX:

And here is the result:

That look’s better!

Processing images in PDF

Let’s do a last try, this time with the docx converted to a PDF:

And here is the result:

It didn’t process the embedded image and all the formatting is gone as well. Too bad!

Conclusion

The tool is still in active development but the support to process embedded images looks still limited at the moment. I noticed that you can integrate with Azure AI Document Intelligence, so maybe that gives better results but I keep that for another post…

More information

microsoft/markitdown: Python tool for converting files and office documents to Markdown.Deep Dive into Microsoft MarkItDown - DEV Community

Popular posts from this blog

.NET 8–Keyed/Named Services

A feature that a lot of IoC container libraries support but that was missing in the default DI container provided by Microsoft is the support for Keyed or Named Services. This feature allows you to register the same type multiple times using different names, allowing you to resolve a specific instance based on the circumstances. Although there is some controversy if supporting this feature is a good idea or not, it certainly can be handy. To support this feature a new interface IKeyedServiceProvider got introduced in .NET 8 providing 2 new methods on our ServiceProvider instance: object? GetKeyedService(Type serviceType, object? serviceKey); object GetRequiredKeyedService(Type serviceType, object? serviceKey); To use it, we need to register our service using one of the new extension methods: Resolving the service can be done either through the FromKeyedServices attribute: or by injecting the IKeyedServiceProvider interface and calling the GetRequiredKeyedServic...

Azure DevOps/ GitHub emoji

I’m really bad at remembering emoji’s. So here is cheat sheet with all emoji’s that can be used in tools that support the github emoji markdown markup: All credits go to rcaviers who created this list.

Kubernetes–Limit your environmental impact

Reducing the carbon footprint and CO2 emission of our (cloud) workloads, is a responsibility of all of us. If you are running a Kubernetes cluster, have a look at Kube-Green . kube-green is a simple Kubernetes operator that automatically shuts down (some of) your pods when you don't need them. A single pod produces about 11 Kg CO2eq per year( here the calculation). Reason enough to give it a try! Installing kube-green in your cluster The easiest way to install the operator in your cluster is through kubectl. We first need to install a cert-manager: kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.14.5/cert-manager.yaml Remark: Wait a minute before you continue as it can take some time before the cert-manager is up & running inside your cluster. Now we can install the kube-green operator: kubectl apply -f https://github.com/kube-green/kube-green/releases/latest/download/kube-green.yaml Now in the namespace where we want t...