DevConf.US 2022 has ended
Registration is now OPEN! Please register HERE as soon as possible!

DevConf.US 2022 is the 5th annual, free, Red Hat sponsored technology conference for community project and professional contributors to Free and Open Source technologies coming to Boston this August!!
Friday, August 19 • 10:30 - 10:55
Open Data and AI/ML for Storage System Reliability

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Storage system failure is among the main culprits behind data center failure. It can lead to service degradation, resulting in poor customer experience, hiccups in business operations, loss of revenue, and even loss of data. So improving storage system reliability is often a top priority for organizations. But how exactly can this be done? We believe that analyzing disk health and predicting disk failure can significantly help in anticipating and mitigating storage system issues.

However, this has historically been a difficult task, especially in open source. Most disk health datasets are sourced from just one or a handful of data centers. Of these datasets, only a few are made publicly available. And in the ones that are publicly available, the data is representative of very few types of workloads.

But not anymore! We believe these pain points can be addressed with the Ceph Device Telemetry Dataset. This open source dataset contains SMART metrics collected from disks running in thousands of Ceph clusters, spanning >20 vendors and >2000 models. Since Ceph is used by several types of organizations and individuals, this dataset can capture usage patterns for a wide variety of real world workloads. Moreover, it is a continuously growing dataset not just in terms of number of datapoints, but also in terms of the metrics (features) collected. This makes it a go-to dataset for the disk failure analysis problem that is of great interest in both academia and industry.

In this talk, we will introduce the Ceph telemetry dataset and show how we visualize cluster and device trends on dashboards. Then, we will show how to set up data science workflows to extract insights on disk behavior, and implement machine learning approaches to anticipate disk failure. Finally, we will describe how you can contribute to this effort as a data scientist, a domain expert, or a Ceph user.

avatar for Karanraj Chauhan

Karanraj Chauhan

Data Scientist, Red Hat
Karan is a data scientist in the Emerging Technologies team at Red Hat. He works on improving and optimizing cloud operations by applying data science to it as well as building an open source community around this domain.

Friday August 19, 2022 10:30 - 10:55 EDT
East Balcony