More reliable agents
Over the last two weeks, we’ve spent time guiding our agents to perform more advanced workflows.
It was rough. For several days I was truly frustrated, because the results were atrocious.

Over the last two weeks, we’ve spent time guiding our agents to perform more advanced workflows.
It was rough. For several days I was truly frustrated, because the results were atrocious.
This is a follow-up to my previous post about Claude Code.
We are building a tool that can backport upstream git-commits into CentOS Stream autonomously using AI coding assistants.
I am writing this blog post as Claude Code is working on upsint, a tool we worked on many years back. I haven’t touched upsint’s codebase for some time. It worked just fine all those years but recently I started getting 401 and 403 while creating pull requests, probably due to my API token expiring. I have never implemented any serious error handling in the tool so it was hard to diagnose the issue quickly:
requests.exceptions.RetryError: HTTPSConnectionPool(host='api.github.com',
port=443): Max retries exceeded with url: /repos/packit/ai-workflows/pulls
(Caused by ResponseError('too many 403 error responses'))
…
Log Detective service is live for more than two weeks now. Running an LLM inference server in production is a challenge.
We started with llama-cpp-python’s server initialy but switched over to llama-cpp server because of its parallel execution feature. I still need to benchmark it to see how much speedup we are getting.
This blog post highlights a few common challenges you might face when operating an inference server.
…In Log Detective, we’re struggling with scalability right now. We are running an LLM serving service in the background using llama-cpp. Since users will interact with it, we need to make sure they’ll get a solid experience and won’t need to wait minutes to get an answer. Or even worse, see nasty errors.
What’s going to happen when 5, 15 or 10000 people try Log Detective service at the same time?
Let’s start the research.
In the last blog (Using InstructLab in Log Detective), we went through the installation and set up process for InstructLab. The post finished with knowledge preparation. We’ll continue with that and hopefully end this one with data generated by InstructLab.
We are going to continue in the Log Detective series:
This time we’ll start exploring using InstructLab in the Log Detective infrastructure.
In this first post, we’ll obtain InstructLab and start the exploration. We will use the official RHEL AI container image that got recently released: https://www.redhat.com/en/about/press-releases/red-hat-enterprise-linux-ai-now-generally-available-enterprise-ai-innovation-production
This is a follow up to my previous post “Running logdetective on an EC2 VM with CUDA”. Though this time, we’ll run the service and do our first inference!
From the previous post, we already have:
This is a followup to my previous blog about running logdetective on RHOAI with CUDA.
Here we’re starting with a fresh EC2 VM that has a nvidia GPU.
We have two challenges ahead of us:
Storage: CUDA takes a lot of space so we need to think ahead where we’ll store gigabytes of these binaries.
GCC: Right now CUDA support gcc from F39, while we have F40 as our host system.
We’ll run a F39 container rootless with the graphroot stored on an external volume to address both issues.
…Let’s run Logdetective in Red Hat OpenShift AI using a Jupyter notebook with llama-cpp-python and CUDA.