Gwen is a product manager at Confluent. She has 15 years of experience working with code and customers to build scalable data architectures, integrating relational and big data technologies. Gwen is the author of “Kafka - The Definitive Guide” and "Hadoop Application Architectures", and a frequent presenter at industry conferences. Gwen is a PMC member on the Apache Kafka project and committer on Apache Sqoop. When Gwen isn't building data pipelines or thinking up new features, you can find her pedaling on her bike exploring the roads and trails of California, and beyond.
Every business has a central nervous system through which all information flows and around which all decisions are made. Sometimes this system is ad-hoc and non-standard, resulting in an architecture that is difficult to reason about and even harder to keep running.
Kafka operators need to provide guarantees to the business that Kafka is working properly and delivering data in real time, and they need to identify and triage problems so they can solve them before end users notice them. This elevates the importance of Kafka monitoring from a nice-to-have to an operational necessity.
In this talk, Kafka operations experts Xavier Léauté and Gwen Shapira share their best practices for monitoring Kafka and the streams of events flowing through it. How to detect duplicates, catch buggy clients, and triage performance issues — in short, how to keep the business’s central nervous system healthy and humming along, all like a Kafka pro.
80% of the time in every project is spent on data integration: Getting the data you want the way you want it. This problem remains challenging despite 40 years of attempts to solve it. We want a reliable, low latency system that can handle varied data from wide range of data management systems. We want a solution that is easy to manage and easy to scale. Is it too much to ask?
In this presentation, we’ll discuss the basic challenges of data integration and introduce design and architecture patterns that are used to tackle these challenges. We will explore how these patterns can be implemented using Apache Kafka and share pragmatic solutions that many engineering organizations used to build fast, scalable and manageable data pipelines.