Skip to main content

ElasticSearch–Speed up your bulk indexing

New indexed documents in ElasticSearch are not searchable until a refresh occurs. By default, every shard is refreshed once every second ‒ defined by a dynamic index level setting named refresh_interval. This forces Elasticsearch to create a new segment every second.

During bulk indexing it is recommended to increase this value. This allows larger segments to flush and decreases future merge pressure. (Replace $INDEX$ with your index name).

PUT /$INDEX$/_settings

{

    "index" : {

        "refresh_interval" : "60s"

    }

}

What also can help is setting the  index.number_of_replicas to 0. This is a tradeoff, as the loss any shard will cause data loss, but at the same time indexing will be faster since documents will be indexed only once.

PUT /$INDEX$/_settings

{

    "index" : {

        "number_of_replicas" : "0"

    }

}

Once the initial loading is finished, you can set index.refresh_interval and index.number_of_replicas back to their original values:

PUT /$INDEX$/_settings

{

    "index" : {

        "refresh_interval" : "1s",

        "number_of_replicas" : "1"

    }

}

Popular posts from this blog

Podman– Command execution failed with exit code 125

After updating WSL on one of the developer machines, Podman failed to work. When we took a look through Podman Desktop, we noticed that Podman had stopped running and returned the following error message: Error: Command execution failed with exit code 125 Here are the steps we tried to fix the issue: We started by running podman info to get some extra details on what could be wrong: >podman info OS: windows/amd64 provider: wsl version: 5.3.1 Cannot connect to Podman. Please verify your connection to the Linux system using `podman system connection list`, or try `podman machine init` and `podman machine start` to manage a new Linux VM Error: unable to connect to Podman socket: failed to connect: dial tcp 127.0.0.1:2655: connectex: No connection could be made because the target machine actively refused it. That makes sense as the podman VM was not running. Let’s check the VM: >podman machine list NAME         ...

Azure DevOps/ GitHub emoji

I’m really bad at remembering emoji’s. So here is cheat sheet with all emoji’s that can be used in tools that support the github emoji markdown markup: All credits go to rcaviers who created this list.

VS Code Planning mode

After the introduction of Plan mode in Visual Studio , it now also found its way into VS Code. Planning mode, or as I like to call it 'Hannibal mode', extends GitHub Copilot's Agent Mode capabilities to handle larger, multi-step coding tasks with a structured approach. Instead of jumping straight into code generation, Planning mode creates a detailed execution plan. If you want more details, have a look at my previous post . Putting plan mode into action VS Code takes a different approach compared to Visual Studio when using plan mode. Instead of a configuration setting that you can activate but have limited control over, planning is available as a separate chat mode/agent: I like this approach better than how Visual Studio does it as you have explicit control when plan mode is activated. Instead of immediately diving into execution, the plan agent creates a plan and asks some follow up questions: You can further edit the plan by clicking on ‘Open in Editor’: ...