DevConf.US 2022 has ended
Registration is now OPEN! Please register HERE as soon as possible!

DevConf.US 2022 is the 5th annual, free, Red Hat sponsored technology conference for community project and professional contributors to Free and Open Source technologies coming to Boston this August!!
Thursday, August 18 • 16:00 - 16:25
Kafka as a Service: The SRE Maturity Journey

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Building a managed service is a huge undertaking. The Site Reliability Engineering (SRE) model pairs nicely with designing, building and maintaining a service. In this session, we'll cover the SRE journey of the managed Kafka service at Red Hat. The unique SRE implementation at Red Hat combined with the complexity of building a 'Kafka on demand' service made for a lot of challenges.
How much does a central SRE team need to know about Kafka?
How can that SRE team scale and get involved in building the service?
What can the engineers building the service do to set up for a successful SRE relationship?
These questions will tease out a journey that we hope others can learn from and see a little bit of themselves in.

The journey starts in August 2020 with the a large team of engineers from different products, services and backgrounds. The word from on high was to build a managed Kafka service, and it would be the showcase of managed services in Red Hat.

The session focuses on one of the sub teams and its evolution from the 'SRE View' team, to the 'Observability' team and finally to the 'Running the Service' team.
The project starts with an initial 30 day goal, followed by 60 day, 90 day and beyond goals.
These goals formed the basis of the SRE journey:
* Understand what is needed for an SRE team, & start to identify tools and technologies, maybe even get a prototype working
* Make something repeatable, not just a prototype, and start to lay the foundations of monitoring and alerting
* Start running the service at a reduced SLO in preparation for onboarding to the central SRE team

As the onboarding to SRE progresses, new and unforseen challenges are uncovered. This is where a lot of lessons are and continue to be learned. Things that we'd like to be further along that now slow us down. Equally, there are things we are quite far along with, but perhaps went too far without sufficient focus and depth. This can also slow us down as we retroactively add that depth required

avatar for David Martin

David Martin

Principal Software Engineer, Red Hat
Software Engineer working on Managed Services on Openshift.

Thursday August 18, 2022 16:00 - 16:25 EDT
Metcalf Small