Loading…
DevConf.US 2022 has ended
Registration is now OPEN! Please register HERE as soon as possible!

DevConf.US 2022 is the 5th annual, free, Red Hat sponsored technology conference for community project and professional contributors to Free and Open Source technologies coming to Boston this August!!

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

Big Data HPC & Data Science [clear filter]
Friday, August 19
 

10:30 EDT

Open Data and AI/ML for Storage System Reliability
Storage system failure is among the main culprits behind data center failure. It can lead to service degradation, resulting in poor customer experience, hiccups in business operations, loss of revenue, and even loss of data. So improving storage system reliability is often a top priority for organizations. But how exactly can this be done? We believe that analyzing disk health and predicting disk failure can significantly help in anticipating and mitigating storage system issues.

However, this has historically been a difficult task, especially in open source. Most disk health datasets are sourced from just one or a handful of data centers. Of these datasets, only a few are made publicly available. And in the ones that are publicly available, the data is representative of very few types of workloads.

But not anymore! We believe these pain points can be addressed with the Ceph Device Telemetry Dataset. This open source dataset contains SMART metrics collected from disks running in thousands of Ceph clusters, spanning >20 vendors and >2000 models. Since Ceph is used by several types of organizations and individuals, this dataset can capture usage patterns for a wide variety of real world workloads. Moreover, it is a continuously growing dataset not just in terms of number of datapoints, but also in terms of the metrics (features) collected. This makes it a go-to dataset for the disk failure analysis problem that is of great interest in both academia and industry.

In this talk, we will introduce the Ceph telemetry dataset and show how we visualize cluster and device trends on dashboards. Then, we will show how to set up data science workflows to extract insights on disk behavior, and implement machine learning approaches to anticipate disk failure. Finally, we will describe how you can contribute to this effort as a data scientist, a domain expert, or a Ceph user.


Speakers
avatar for Karanraj Chauhan

Karanraj Chauhan

Data Scientist, Red Hat
Karan is a data scientist in the Emerging Technologies team at Red Hat. He works on improving and optimizing cloud operations by applying data science to it as well as building an open source community around this domain.


Friday August 19, 2022 10:30 - 10:55 EDT
East Balcony

11:00 EDT

Modernizing HPC in the hybrid cloud via opensource
In today’s environment, enterprises across multiple industries face challenges that include:

* New and more complex calculations due to regulatory demands, product discovery, environmental modeling, etc.
* Explosive data growth and the need to move and analyze this data
* Limited on-premise compute capacity

To help solve these challenges, Marius Bogoevici and Aric Rosenbaum have worked with industry experts and market leaders to design a forward-thinking, open source HPC solution to help enterprises better address their current and future needs.

In this presentation, which is based on real-world engagements with customers in the financial services industry to help them improve their market and credit risk calculations, we will demonstrate an open source HPC solution that:

* Scales the compute platform by leveraging the power of the hybrid cloud to perform new and more complex calculations in less time at a lower TCO
* Automates the setup of a multi-cluster compute environment across multiple datacenters via Ansible
* Uses the replication capabilities of Ceph to ensure proximity of data to compute
* Automatically schedules the computations to the optimal cluster in a transparent manner
* Orchestrates pre/post-calculation steps with a framework such as Apache Airflow


Friday August 19, 2022 11:00 - 11:50 EDT
East Balcony

13:00 EDT

Powering Open Data Hub with Ray
Ray is quickly gaining momentum as a parallel computing environment that provides a scalable cluster model pioneered by tools such as Spark and Flink, yet also supports a lightweight Serverless style workflow designed natively for modern container platforms. Open Data Hub (ODH) is a flexible and customizable federation of open source data science tools that is a great fit for taking advantage of Ray compute clusters. In this talk, Erik will explain how to integrate Ray with Open Data Hub, by configuring ODH profiles that deploy on-demand Ray clusters for Jupyter notebooks. He’ll demonstrate Ray in action as a compute resource for ODH. Along the way he’ll also discuss the logistics of adapting Ray to OpenShift’s security features. Attendees will learn about the basics of Ray’s compute model, how it fits with Open Data Hub’s architecture, and how to leverage the power of ODH customization features.

Speakers
avatar for Erik Erlandson

Erik Erlandson

Senior Principal SW Engineer, Red Hat
Erik Erlandson is a Software Engineer at Red Hat’s AI Center of Excellence, where he explores emerging technologies for Machine Learning and Data Science workloads on Kubernetes, and assists customers with migrating their Data Science workloads onto the cloud. Erik is a committer... Read More →


Friday August 19, 2022 13:00 - 13:25 EDT
East Balcony

13:30 EDT

When To Stop: Optimize Test Runtimes Using AI4CI
In this era of automation and streamlining systems by removing human involvement, GitHub makes it easy to automate all your software development workflows, now with world-class CI/CD. Build, test, and deploy your code right from GitHub. However, every new Pull Request to a repository with new code changes is subjected to an automated set of builds and tests before being merged. Some tests may run for longer durations than expected. Longer running tests are often painful as they can block the CI/CD process for lengthy periods of time. How can we optimize the running time of our tests and prevent bottlenecks in our CI/CD pipeline? By understanding the test failure times from historical data, we aim to predict an Optimal Stopping Point for a test/build to help developers and managers better allocate the development resources, and ensure efficiency, consistency, and transparency for manual and time-consuming processes. With the help of machine learning and statistical models, we can predict the optimal stopping point for a given test and build. In this talk, we will demonstrate how you can leverage Operate First, an open source cloud platform consisting of various tools to collect open CI/CD data from various sources such as GitHub, Prow, TestGrid, etc, analyze it, and visualize key performance indicator metrics on a Superset dashboard to gain greater insights into your software development process. We will use Jupyter notebooks to train an ML model for predicting the optimal stopping point for tests. Finally, we will see how to package our prediction pipeline and deploy it as a service using Seldon Core on OpenShift.

Speakers
avatar for Aakanksha Duggal

Aakanksha Duggal

Software Engineer, Red Hat
Aakanksha Duggal is a Software Engineer at Red Hat working in the AI Centre of Excellence and Office of the CTO. She is a part of the AIOps team and works in developing open source software that uses AI and machine learning applications to solve engineering problems.
avatar for Hema Veeradhi

Hema Veeradhi

Senior Software Engineer, Red Hat
Hema Veeradhi is a Senior Software Engineer working in the Open Services Group at Red Hat, exploring and integrating open source AI operations. Her current work focuses on fostering data driven development through the lens of data analytics and machine learning. Outside of work, Hema... Read More →


Friday August 19, 2022 13:30 - 13:55 EDT
East Balcony
 
Filter sessions
Apply filters to sessions.