Yesterday I showed how we can use MarkItDown to convert multiple document types to markdown to make them easier consumable in an LLM context. I used a simple CV in PDF format as an example. But what if you have images inside your documents? No worries! MarkItDown allows to process images inside documents as well. Although I couldn’t find a way to directly use and call it through the command line, it certainly is possible by writing some Python code. First make sure that you have the MarkItDown module installed locally: pip install 'markitdown[all]' Now we can first try to recreate our example from yesterday through code: Remark: The latest version of MarkItDown requires Python 3.10. If that works as expected, we can further extend this code to include an LLM to extract image data. We’ll use Ollama in combination with LLaVA (Large Language and Vision Assistant), a model designed to combine language and vision capabilities, enabling it to process and understand...
Context is key in building effective AI enabled solutions. The most popular way to extend the pretrained knowledge set of a Large Language Model is through RAG, or Retrieval-Augmented Generation. By augmenting LLMs with external data we ensure that outputs are not only coherent but also factually grounded and up-to-date. This makes it invaluable for applications like chatbots, personalized recommendations, content creation, and decision support systems. What is MarkItDown? For any RAG solution to function effectively, the quality and format of the input data are critical. This is where MarkItDown, a lightweight Python utility created by Microsoft, stands out. It specializes in converting various files into Markdown format, a token-efficient and LLM-friendly structure. From the documentation : MarkItDown is a lightweight Python utility for converting various files to Markdown for use with LLMs and related text analysis pipelines. To this end, it is most comparable to textract , ...