MarkItDown with Ollama–Process images inside documents

Yesterday I showed how we can use MarkItDown to convert multiple document types to markdown to make them easier consumable in an LLM context. I used a simple CV in PDF format as an example.

But what if you have images inside your documents?

No worries! MarkItDown allows to process images inside documents as well. Although I couldn’t find a way to directly use and call it through the command line, it certainly is possible by writing some Python code.

First make sure that you have the MarkItDown module installed locally:

pip install 'markitdown[all]'

Now we can first try to recreate our example from yesterday through code:

Remark: The latest version of MarkItDown requires Python 3.10.

If that works as expected, we can further extend this code to include an LLM to extract image data. We’ll use Ollama in combination with LLaVA (Large Language and Vision Assistant), a model designed to combine language and vision capabilities, enabling it to process and understand both text and images.

First we need to install some extra packages:

pip install openai

We update our code to use this modules:

Processing images

I first tried a small example image:

And here is the result I got back after processing the image:

# Description:
Three adorable puppies stand together as if they are part of a larger family or pack, looking out with a hint of anticipation and a strong sense of curiosity. They seem to be young pups, possibly Corgis or Terriers given their short legs and distinctive ears. The backdrop is plain and white, directing all attention to the puppies. In front of them on a tabletop, there appears to be an object that could represent knowledge or learning, perhaps indicating that these puppies are in a comfortable, domestic environment designed for their care and education.

OK, that seems to work.

Processing images in docx

Let’s now try this with a word document containing images:

Here is the word document I tried:

And here is the result:

Unfortunately it seems that MarkItDown didn’t process the embedded images in the docx but it could keep the document structure intact.

Processing images in pptx

Let’s give it another try, this time with a PPTX:

And here is the result:

That look’s better!

Processing images in PDF

Let’s do a last try, this time with the docx converted to a PDF:

And here is the result:

It didn’t process the embedded image and all the formatting is gone as well. Too bad!

Conclusion

The tool is still in active development but the support to process embedded images looks still limited at the moment. I noticed that you can integrate with Azure AI Document Intelligence, so maybe that gives better results but I keep that for another post…

More information

microsoft/markitdown: Python tool for converting files and office documents to Markdown.Deep Dive into Microsoft MarkItDown - DEV Community

Kubernetes–Limit your environmental impact

Reducing the carbon footprint and CO2 emission of our (cloud) workloads, is a responsibility of all of us. If you are running a Kubernetes cluster, have a look at Kube-Green . kube-green is a simple Kubernetes operator that automatically shuts down (some of) your pods when you don't need them. A single pod produces about 11 Kg CO2eq per year( here the calculation). Reason enough to give it a try! Installing kube-green in your cluster The easiest way to install the operator in your cluster is through kubectl. We first need to install a cert-manager: kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.14.5/cert-manager.yaml Remark: Wait a minute before you continue as it can take some time before the cert-manager is up & running inside your cluster. Now we can install the kube-green operator: kubectl apply -f https://github.com/kube-green/kube-green/releases/latest/download/kube-green.yaml Now in the namespace where we want t...

The art of simplicity

Search This Blog

MarkItDown with Ollama–Process images inside documents

Processing images

Processing images in docx

Processing images in pptx

Processing images in PDF

Conclusion

More information

Labels

Popular posts from this blog

Kubernetes–Limit your environmental impact

Azure DevOps/ GitHub emoji

DevToys–A swiss army knife for developers