First test of Claude Sonnet 4.5 for an agent that backports a patch for an RPM

06 October 2025

One of the daily tasks we have when developing AI agents is to review their runs. We have to read dozens of decisions so we can evaluate if the agents did the right thing. If not, we have to adjust our user prompts, system prompts, and tools.

Let’s review how Sonnet 4.5 performs while backporting a complex patch (with multiple conflicts).

…

Keep reading

More reliable agents

08 August 2025

Over the last two weeks, we’ve spent time guiding our agents to perform more advanced workflows.

It was rough. For several days I was truly frustrated, because the results were atrocious.

…

Keep reading

Backporting upstream patches with Code Assistants

01 August 2025

This is a follow-up to my previous post about Claude Code.

We are building a tool that can backport upstream git-commits into CentOS Stream autonomously using AI coding assistants.

…

Keep reading

Trying Claude Code in July 2025

22 July 2025

I am writing this blog post as Claude Code is working on upsint, a tool we worked on many years back. I haven’t touched upsint’s codebase for some time. It worked just fine all those years but recently I started getting 401 and 403 while creating pull requests, probably due to my API token expiring. I have never implemented any serious error handling in the tool so it was hard to diagnose the issue quickly:

requests.exceptions.RetryError: HTTPSConnectionPool(host='api.github.com',
port=443): Max retries exceeded with url: /repos/packit/ai-workflows/pulls
(Caused by ResponseError('too many 403 error responses'))

…

Keep reading

Lessons learned from running the Log Detective service

07 February 2025

Log Detective service is live for more than two weeks now. Running an LLM inference server in production is a challenge.

We started with llama-cpp-python’s server initialy but switched over to llama-cpp server because of its parallel execution feature. I still need to benchmark it to see how much speedup we are getting.

This blog post highlights a few common challenges you might face when operating an inference server.

…

Keep reading

Comparing llama-cpp and vllm in model serving

01 November 2024

In Log Detective, we’re struggling with scalability right now. We are running an LLM serving service in the background using llama-cpp. Since users will interact with it, we need to make sure they’ll get a solid experience and won’t need to wait minutes to get an answer. Or even worse, see nasty errors.

What’s going to happen when 5, 15 or 10000 people try Log Detective service at the same time?

Let’s start the research.

…

Keep reading

Generating first set of data for LogDetective using InstructLab

19 September 2024

In the last blog (Using InstructLab in Log Detective), we went through the installation and set up process for InstructLab. The post finished with knowledge preparation. We’ll continue with that and hopefully end this one with data generated by InstructLab.

Fennel flower in our garden

…

Keep reading

Using InstructLab in Log Detective

06 September 2024

We are going to continue in the Log Detective series:

This time we’ll start exploring using InstructLab in the Log Detective infrastructure.

In this first post, we’ll obtain InstructLab and start the exploration. We will use the official RHEL AI container image that got recently released: https://www.redhat.com/en/about/press-releases/red-hat-enterprise-linux-ai-now-generally-available-enterprise-ai-innovation-production

Eggplant flower in our garden

…

Keep reading

Running logdetective service in containers with CUDA on EC2

15 August 2024

This is a follow up to my previous post “Running logdetective on an EC2 VM with CUDA”. Though this time, we’ll run the service and do our first inference!

From the previous post, we already have:

All steps to create a Containerfile
The EC2 VM with Tesla T4
Podman set up

…

Keep reading

Running logdetective on an EC2 VM with CUDA

30 July 2024

This is a followup to my previous blog about running logdetective on RHOAI with CUDA.

Here we’re starting with a fresh EC2 VM that has a nvidia GPU.

We have two challenges ahead of us:

Storage: CUDA takes a lot of space so we need to think ahead where we’ll store gigabytes of these binaries.
GCC: Right now CUDA support gcc from F39, while we have F40 as our host system.

We’ll run a F39 container rootless with the graphroot stored on an external volume to address both issues.

…

Keep reading