Interview with ESL: Transition to microservices & more
[Using Unomaly is] like brushing our teeth, its an integral part of our workflow and operations... It’s for everything that is super critical, it’s become our entry point to knowing where to look for problems.
Thomas Poehler Senior VP of IT, ESL
ESL, a long time customer of Unomaly has been evolving alongside us. We interviewed them back in 2018 and decided to check in again. Thomas Poehler (Senior VP of IT) and Felix Feinhals (Head Site Reliability Engineer) sat down with us and took us through how they are using Unomaly now and how they see the company continuing to use Unomaly in future.
Disclaimer: questions and answers have been edited for brevity and clarity.
When we spoke to you previously you mentioned that you wanted to become more proactive - where are you at with this now?
We’re a lot more direct in how we identify issues, however still not where we want to be. Unomaly has become our entry point to knowing where to look for problems, they essentially point us in the right direction and from there we can effectively deal with a problem. In addition, our way of working is a lot better than before as in the past it might have happened that our customers made us aware of issues.
What did your stack used to look like and how has it evolved?
The biggest change has been moving towards microservices and moving onto a microservices orchestration platform. Back in 1998, the core of the software was built on PHP 3. Since then roughly 60% has been re-written and moved, however there is still some legacy which is not solved yet.
Removing logic out of the web server tier has always been a priority. We have good experience with kubernetes, and we were trying to restructure the big legacy monolith into smaller microservices. Today our stack is primarily scala microservices deployed via Gitlab CI/CD to our orchestration platforms based on mesos/marathon or kubernetes.
Before Unomaly how were you monitoring?
We didn’t have any tools that were remotely similar to Unomaly. We run an ELK stack that we use in parallel with Unomaly. Back in the days we had log files on speedy storage and used custom scripts mainly driven by “grep” to filter out what we needed. We started using ELK around the same time that we on-boarded Unomaly.
What has the transition to Unomaly been like?
There was no real challenges using Unomaly throughout this transition. We used to have physical servers, where our web servers were pushing logfiles to Unomaly. Now our microservices, push their own log files to the Logstash end point . Each of our microservice applications uses a name tag as a source definition in Unomaly. By using unomalys logstash plugin, our microservices are pushing log files to Unomaly and Elastic search. Unomaly for anomaly detection and real time alerts, ELK for archive and analysis.
Are the microservices auto-scaling so that they can exist as multiple instances? Are they completely event driven?
They are long running processes, but ephemeral. States exist either in the postgres database or the services only consume events without the need to persist data. The services are packaged in a container (kubernetes) or jar file (mesos/marathon) which allows us to easily scale up and down.
How has your usage of Unomaly changed?
In the beginning, we tested Unomaly for intrusion detection and security issues. Now where we also log microservices to Unomaly, we use it to quickly identify errors after deployments to staging. Primarily, Unomaly is a tool that gives us insight into the usage of microservices, so whenever there is something wrong, we’ll get an instant notification in slack.
Is slack the primary channel of communication within your teams?
Sometimes the user interface is checked to look at the broader context to the situations, but usually we act on the slack notifications. This enables us to collaborate more efficiently between teams. The product owners and team leads check the interface more regularly to obtain feedback on the state of things.
Do you have shared logging practices between your team?
Yes, our teams use a standardized Scala logging configuration. This standard is rolled out across our organization. This was very much needed when we started the work on breaking up the monolith into microservices to speak the same language across teams.
How often do you deploy? What’s the rate of change?
We deploy continuously. Once the deployment has been made, a message in slack is created, which is closely monitored to quickly identify issues. Ultimately, the quality control is done by the developers themselves.
Is noise a concern you have with Unomaly?
Its working better than it was in the past, there were situations where deployments changed a logging schema or at the launch of an API, which brought false positives. However, since Unomaly’s model has been optimized over time it's much better now. We put more microservices and more data into the learning model which benefits the quality over time.
The last time we spoke with your team, you mentioned you were monitoring 300 hosts, now you have more microservices, has the number of hosts grown?
Its shifting. Some of the physical servers fade out where new microservices are being added. In the end we have roughly the same number of entities that we are monitoring.
What are your concerns in terms of site reliability? What is the most important service and how do you think about it?
ESL stands for high quality esport productions. Our users expect the same from our online products. Availability, performance, security is the core of a good product. If you don’t deliver on that, all features built on top are basically useless. As a consequence we invest a lot in these areas. We employee dedicated teams for site reliability engineering which support all products across our development teams.
It's a core principle for us to not have ONE important service. Everything is connected and everything is equally important to be functional. We embrace failure to design systems which treat incidents as expected behaviour.
Why is Unomaly valuable for you?
Unomaly gives us an instant view on situations which require attention. It automatically finds the needle in the haystack. The automated postings into our slack channels smoothly integrates unomaly into our workflows. Additional tools like the Unomaly-Logstash module makes integration into existing infrastructure easy.
Unomaly helped us from being reactive to proactive. Nowadays issues are noticed very early on which is different from back in the day where we didn't know there were issues until the customer was affected.
Do you have any stories of things you’ve discovered- detected with Unomaly?
What was especially remarkable, was the incident where we could follow an SQL Injection attempt in real time through Unomaly situations popping up. We were very relaxed because we could closely monitor which attack vectors were used against which part of our infrastructure. This helped us to focus on the important areas in our in-depth monitoring. Another incident that comes to mind was a case where a customer used a wrong route on one of our APIs. Unomaly enabled us to instantly help the customer by reaching out to him. He was very impressed by our response.
Today, Unomaly is like brushing your teeth, its an integral part of our workflow and operations. Back in the day we would have missed such things mentioned, manually going through the logs, as this would have taken too much time. At the end of the day it comes down to Unomaly being the core system for security, error alerting, debugging and notifications.
How often do your teams use Unomaly?
Constantly. It is part of our routines. From security to debugging, unomaly is deeply integrated into our workflows. We still plan to onboard more non-development related services to Unomaly like Azure AD logs for security and compliance monitoring.
What does the future look like for ESL?
ESL is the place where everybody can be somebody. We work hard to tighten relationships with our fans and partners to deliver the best esport experience on a global scale. Delivering excellent esport products which define a new standards is what we strive for.