A customer contacted me this week explaining that they noticed very long wait times before a build or release was triggered when using CI. These kind of triggers are scheduled using the TFSJobAgent queue, so I expected that for an unknown reason the queue was saturated with too much work.
I asked them to collect some data using the TFS administration pages. Here is snapshot of the data we collected:
These charts tell us clearly there is an issue with the TFVC_RepositoryCodeIndexing job responsible for indexing our source code and making them available for search through ElasticSearch. This is a clear indication that there is something wrong in ElasticSearch itself.
Time to dig into the ElasticSearch logs…but we’ll leave that for another post.