Skip to main content

ElasticSearch–Speed up your bulk indexing

New indexed documents in ElasticSearch are not searchable until a refresh occurs. By default, every shard is refreshed once every second ‒ defined by a dynamic index level setting named refresh_interval. This forces Elasticsearch to create a new segment every second.

During bulk indexing it is recommended to increase this value. This allows larger segments to flush and decreases future merge pressure. (Replace $INDEX$ with your index name).

PUT /$INDEX$/_settings

{

    "index" : {

        "refresh_interval" : "60s"

    }

}

What also can help is setting the  index.number_of_replicas to 0. This is a tradeoff, as the loss any shard will cause data loss, but at the same time indexing will be faster since documents will be indexed only once.

PUT /$INDEX$/_settings

{

    "index" : {

        "number_of_replicas" : "0"

    }

}

Once the initial loading is finished, you can set index.refresh_interval and index.number_of_replicas back to their original values:

PUT /$INDEX$/_settings

{

    "index" : {

        "refresh_interval" : "1s",

        "number_of_replicas" : "1"

    }

}

Popular posts from this blog

Kubernetes–Limit your environmental impact

Reducing the carbon footprint and CO2 emission of our (cloud) workloads, is a responsibility of all of us. If you are running a Kubernetes cluster, have a look at Kube-Green . kube-green is a simple Kubernetes operator that automatically shuts down (some of) your pods when you don't need them. A single pod produces about 11 Kg CO2eq per year( here the calculation). Reason enough to give it a try! Installing kube-green in your cluster The easiest way to install the operator in your cluster is through kubectl. We first need to install a cert-manager: kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.14.5/cert-manager.yaml Remark: Wait a minute before you continue as it can take some time before the cert-manager is up & running inside your cluster. Now we can install the kube-green operator: kubectl apply -f https://github.com/kube-green/kube-green/releases/latest/download/kube-green.yaml Now in the namespace where we want t...

Azure DevOps/ GitHub emoji

I’m really bad at remembering emoji’s. So here is cheat sheet with all emoji’s that can be used in tools that support the github emoji markdown markup: All credits go to rcaviers who created this list.

.NET 9 - Goodbye sln!

Although the csproj file evolved and simplified a lot over time, the Visual Studio solution file (.sln) remained an ugly file format full of magic GUIDs. With the latest .NET 9 SDK(9.0.200), we finally got an alternative; a new XML-based solution file(.slnx) got introduced in preview. So say goodbye to this ugly sln file: And meet his better looking slnx brother instead: To use this feature we first have to enable it: Go to Tools -> Options -> Environment -> Preview Features Check the checkbox next to Use Solution File Persistence Model Now we can migrate an existing sln file to slnx using the following command: dotnet sln migrate AICalculator.sln .slnx file D:\Projects\Test\AICalculator\AICalculator.slnx generated. Or create a new Visual Studio solution using the slnx format: dotnet new sln --format slnx The template "Solution File" was created successfully. The new format is not yet recognized by VSCode but it does work in Jetbr...