Lessons learned from running the Log Detective service
Log Detective service is live for more than two weeks now. Running an LLM inference server in production is a challenge.
We started with llama-cpp-python’s server initialy but switched over to llama-cpp server because of its parallel execution feature. I still need to benchmark it to see how much speedup we are getting.
This blog post highlights a few common challenges you might face when operating an inference server.
…