The causes of many IT issues are usually found in your log data. It can show you what happened, when and where. They are extremely versatile and hold a lot of value when it comes to investigating and getting to the root cause of a problem.
ESL, a long time customer of Unomaly has been evolving alongside us. We interviewed them back in 2018 and decided to check in again. Thomas Pohler (VP of IT) and Felix Fienhals (Site Reliability Engineer) sat down with us and took us through how they are using Unomaly now and how they see the company continuing to use Unomaly in future.
We are making a major shift for how we are going to release updates of our product to our customers and users by releasing weekly.
Where there is code, there will be bugs. Where there is change, there will be new bugs. And there is a limit to how much time that can be spent making sure code is bug free. It’s impossible to completely prevent them.
Our first hypothesis is that experiments are good. In practice, what does this mean? Over the past several months we’ve designed and tested several experiments including Frequency Anomalies, log transforms and an alternate scoring method.
This post is a summary of a talk presented 27 February, 2018 at the Stockholm DevOps Meetup on patterns of behavior when remediating incidents.
For the Unomaly 2.28 release, we have completely reworked our tokenizer in order to pick up on nested structures and key-value pairs in the unstructured log data that we ingest — without any schema specification. To make it understandable we’ll go through a bit of how Unomaly works and then delve into the technical details of the new structural tokenizer.
Think of logs as a journal for your program — that should be both readable by humans — and parsable by computers. It contains messages that describe what’s going on, along with the relevant context as key-value pairs. Also, keep in mind that it should be used for data mining purposes. Therefore — include relevant identifiers such as request ID’s, PID’s, user ID’s, etc.
The task of producing good software and making it run reliably is associated with a plethora of words and concepts: monitoring, log analysis, pen.testing, auditing, metrics, reliability engineering, etc. However, something that is central to all of this is observability