AWS Open Source Symposium

January

1

,

2021

|

9:00AM

-

5:00PM

PST

About the Open Source Symposium

This event will focus on Big Data and Observability solutions available in the Cloud Native Computing Foundation (CNCF) ecosystem and most relevant to our strategic customers' use cases. For Big Data, we will discuss Big Data schedulers, Spark on Kubernetes, and Iceberg. For Observability, we invite customers to talk about Kubernetes Observability and Monitoring with events, logs, traces, and metrics, how Observability is utilized to enable data analytics workloads, and how Big Data can help with faster mean time to detect (MTTD) or mean time to recover (MTTR) in case of a failure.

Add To Waitlist

Learning Objectives

This event will focus on Big Data and Observability solutions available in the Cloud Native Computing Foundation (CNCF) ecosystem and most relevant to our strategic customers' use cases. For Big Data, we will discuss Big Data schedulers, Spark on Kubernetes, and Iceberg. For Observability, we invite customers to talk about Kubernetes Observability and Monitoring with events, logs, traces, and metrics, how Observability is utilized to enable data analytics workloads, and how Big Data can help with faster mean time to detect (MTTD) or mean time to recover (MTTR) in case of a failure.

Schedule

9:55am

Welcome and Kickoff

10:00am - 10:35am

AIOps and Customer-Centric Observability at Intuit

Intuit has been increasingly focused on ensuring we deliver the best customer experience and having better visibility and response to problems encountered by customers. Developers at Intuit leverage our Real User Monitoring Solution to measure and alert on availability of key user functionalities such as payments or tax filing. Intuit's AIOps platform powers our alerting capability, which detects anomalies in customer experiences using real-time anomaly detection.

In this talk, Venkatesh Rangarajan and Vigith Maurice will share how we apply BigData and AIOPs techniques with Real User Monitoring and real-time tracing data to reduce the time it takes to detect and triage customer impacting issues to near zero time.

Speakers: Venkatesh Rangarajan, Observability Product Manager, Vigith Maurice, Observability Technical Lead

10:45am - 11:20am

Self-service Diagnostics for Developers at Adobe

Developers at Adobe leverage Ethos multi-tenant and multi-cloud compute platform, based on Kubernetes. This platform is leveraged across Adobe by various product teams to host their solutions. As time has progressed, the complexity of services has increased which in-turn has required investment into solutions beyond the core observability stack. To address these challenges, Ethos has built capabilities centered around diagnosing issues and identifying root causes. The focus is to provide self-service capabilities to development teams to detect root cause and remediate issues. This has helped in reducing MTTR for various incidents and improved overall reliability. These tools have also been integrated with "Get Help" workflow and is a key capability of Adobe's Internal Developer platform (IDP). In this session, attendees will learn about Adobe's observability stack and self-service diagnostics capabilities offered by Ethos.

Speakers: Shibashis Mishra, Senior Engineering Manager & Rohan Kapoor, Product Manager

11:30am - 12:30pm

Lunch

12:30pm - 1:05pm

Integrating an Observability solution for Amazon Aurora Database Into Workday's Centralized Observability Platform

Cloud Relational Database Services such as Amazon Aurora, offer robust, native observability features to help customers. Large enterprise companies like Workday build their own centralized Observability platform. Continuing to depend on the native observability suite for cloud databases like Aurora, can result in a fragmented picture creating a challenge integrating logging and monitoring of a complex cloud database system to the existing Observability platform, to achieve a "single pane of glass". In this session, Sandesh Achar, Director Cloud Engineering, and Nathan Tisuela, Software Engineer, from Workday discuss how they integrated an observability solution for Amazon Aurora database with their centralized Observability platform.

Speaker: Sandesh Achar, Director Cloud Engineering & Nathan Tisuela, Software Engineer, Workday

1:05pm - 1:15pm

Break

1:15pm - 1:50pm

Seeing Through Abstractions: A Cloud Performance Methodology

Cloud native solutions have a large number of software layers between system hardware and the layers at which consumers receive a desired form of a service whether it is IaaS, PaaS, or SaaS, running in parallel with many other services. These abstractions make it difficult to observe and understand the observed performance effects at the service delivery level in terms of the underlying performance phenomena in hardware and at each of the intermediary layers of abstraction or virtualization. This talk will sketch how one may utilize the capabilities of hardware performance monitoring units in conjunction with different other types of monitoring in software layers to obtain insights into sources of performance loss or opportunities for performance gain. Several factors complicate this exercise in connecting the dots; for example, many operations in hardware and software run in parallel, even logically sequential instructions execute out of program order to emulate data flow machines, and multi-tenant operation (for efficiency, and for economy of scale) makes it hard to account for the effects of different tenants upon each others' performance. In this presentation we describe some lessons in putting together the big picture view by collecting and analyzing key pieces of explanatory detail from the hardware PMUs and correlating them to various other measures of performance in the different software layers.

Speaker: Harshad Sane, Principal Engineer, Intel

2:00pm - 2:35pm

Optimizing Cloud Workload Resources with Observability Metrics

Accurate prediction of cloud workload resource consumption is a crucial tool for optimizing the cloud capacity usage and maximizing the value of cloud assets. Estimating resource requirements and setting the service configuration based on experience is a useful approach. However, resource requirements backed by data evidence allows operators and administrators to make informed decisions for setting service configurations. In this talk, Shrey will describe resource tier recommendations for data science workloads in the cloud, using Jupyterhub application as an example. He will demonstrate how to fetch CPU and memory telemetry data from user pods on the Operate First cluster and train a learning algorithm to recommend tiers. He will then discuss the implications of such an approach and how it can be extended for use cases like detecting and forecasting node failures and reducing the service energy footprint. Attendees will learn how to use telemetry data from their clusters to optimize their resource usage and drive decisions with AIOps for cloud environments.

Speaker: Shrey Anand, Data Scientist, Emerging Technologies, Red Hat

2:35pm - 2:55pm

Break

2:55pm - 3:30pm

Interactive Observability Analytics with Trino at Scale

Trino is a distributed SQL engine that is widely used for big data analytics. However, it comes with a set of challenges when run at large scale, particularly when it comes to using Trino for batch ETL. In this presentation, we discuss our experiences developing Huron, Salesforce's internal observability platform which leverages Trino to run analytics across all our telemetry. Huron is used by service owners, SREs, and engineers to obtain insights from their observability data with the goal of enhancing service availability. We describe some of the key considerations involved in running large scale ETLs with Trino in the cloud, including challenges with scaling writes against object stores, factors impacting cost to serve, and how we made the decision to switch to Trino-Iceberg.

Speakers: Conor McAvoy, Software Engineer, Salesforce & Vincent Poon, Software Architect, Salesforce Monitoring Cloud

3:40pm - 4:15pm

Lessons Learned Performance Testing Spark on Kubernetes at Scale

Pinterest is a company that provides inspiration to build a life you love. Through advanced recommendations engines, ML pipelines, in-house analytics engine and exabyte-scale datalake, we are able to delight our users with the highly targeted inspiration for their projects. Over the years, Pinterest DataEngineering has developed several in-house analytics platforms based on Hadoop ecosystem and YARN. Recently we ran an experiment to explore moving Spark workloads to k8s, evaluate the performance at scale and the overall viability using k8s for spark.

In this presentation, we will talk about our business drivers for considering k8s for batch workloads, results on running large scale tests on EKS during our evaluation, lessons learned and what we’d do differently.

Speakers: Rainie Li - Software Engineer, DataEng, William Tom - Software Engineer, DataEng

4:30pm - 5:30pm

Panel Discussion

Panelists: Netflix - Kelley Yohe, Director Growth Engineering, Pinterest - Dave Burgess, Director, Data, Autodesk - Ben Cochran, VP, HPE - Praveena Patchipulusu, Senior Director, Engineering, GreenLake Cloud Platform

Host: Lorraine Knerr, Sr. Manager Solutions Architects, AWS

5:30pm - 6:30pm

Networking Happy Hour

Agenda

9:55AM - 10:00am

Welcome and Kickoff

10:00AM - 10:35AM

Keynote: Kubernetes Observability with Open Source and Open Standards

Speaker: Alolita Sharma, Engineering Leader, Co-Chair CNCF Observability, Apple, Inc.

10:45AM - 11:20AM

Kubernetes Observability at Airbnb

Speaker: Saurabh Mehta, Senior Engineering Manager

11:30aM - 12:00PM

Lunch

12:00PM - 12:35PM

Coming Soon

12:45PM - 1:20PM

Self-service Diagnostics for Developers at Adobe

Developers at Adobe leverage Ethos multi-tenant and multi-cloud compute platform, based on Kubernetes. This platform is leveraged across Adobe by various product teams to host their solutions. As time has progressed, the complexity of services has increased which in-turn has required investment into solutions beyond the core observability stack. To address these challenges, Ethos has built capabilities centered around diagnosing issues and identifying root causes. The focus is to provide self-service capabilities to development teams to detect root cause and remediate issues. This has helped in reducing MTTR for various incidents and improved overall reliability. These tools have also been integrated with "Get Help" workflow and is a key capability of Adobe's Internal Developer platform (IDP). In this session, attendees will learn about Adobe's observability stack and self-service diagnostics capabilities offered by Ethos.

Speakers: Shibashis Mishra, Senior Engineering Manager & Rohan Kapoor, Product Manager

1:20PM - 1:30pM

Break

1:30PM - 2:05PM

Coming Soon

2:15PM - 2:50PM

Cluster-less Spark as a service on AWS

In this session, we will introduce how we build a scalable, cluster-less Spark service
for both batch and interactive analytics on AWS. Our design hides most of the infra
complexity from the users, who do not need to care about resource provisioning,
cluster management, and other heavy operations. We provide the simplest APIs for
users to submit and monitor their jobs so that they can focus on building applications.
The backend infra is highly automated and scalable with resources shared by many
orgs/teams. The service can scale both in-cluster by adding more instances, or
horizontally by adding more compute clusters. It is built on top of AWS EKS, which
leverages some key open source components, such as Apache YuniKorn and Skate.

Speakers: Weiwei Yang, Staff Software Engineer, AIML Data Infra, Apple & Tianqi Tong, Senior Software Engineer, AIML Data Infra, Apple

2:50PM - 3:00PM

Break

3:00PM - 3:35PM

Lessons Learned Performance Testing Spark on Kubernetes at Scale

Pinterest is a company that provides inspiration to build a life you love. Through advanced recommendations engines, ML pipelines, in-house analytics engine and exabyte-scale datalake, we are able to delight our users with the highly targeted inspiration for their projects. Over the years, Pinterest DataEngineering has developed several in-house analytics platforms based on Hadoop ecosystem and YARN. Recently we ran an experiment to explore moving Spark workloads to k8s, evaluate the performance at scale and the overall viability using k8s for spark.

In this presentation, we will talk about our business drivers for considering k8s for batch workloads, results on running large scale tests on EKS during our evaluation, lessons learned and what we’d do differently.

Speakers: Rainie Li - Software Engineer, DataEng, William Tom - Software Engineer, DataEng, Autodesk

3:45pm - 4:20pm

Coming Soon

4:30PM -5:30PM

Panel Discussion

Panelists: Netflix - Kelley Yohe, Director Growth Engineering, Pinterest - Dave Burgess, Director, Data

Host: Lorraine Knerr, Sr. Manager Solutions Architects, AWS & Ben Cochren - VP, Engineering, Autodesk

Session Proficiency Levels Explained

Level 100
Introductory

Sessions will focus on providing an overview of AWS services and features, with the assumption that attendees are new to the topic

Level 200
Intermediate

Sessions will focus on providing best practices, details of service features and demos with the assumption that attendees have introductory knowledge of the topics

Level 300
Advanced

Sessions will dive deeper into the selected topic. Presenters assume that the audience has some familiarity with the topic, but may or may not have direct experience implementing a similar solution

Level 400
Expert

Sessions are for attendees who are deeply familiar with the topic, have implemented a solution on their own already, and are comfortable with how the technology works across multiple services, architectures, and implementations

Speaker

Mackenzie Kosut

Global Startup Evangelist, Amazon Web Services

Mackenzie is the Global Startup Evangelist at AWS. His days are spent traveling the globe to meet startups, share their stories, and connect engineering teams together. Every day there are a large number of startups launching on AWS across every imaginable industry. It’s Mackenzie’s mission to find stories of startups that are helping to improve the world and share these stories with a wide audience.

Agenda Navigation Anchor.

Timing & Location

July

21

,

2020

|

12:30PM

-

2:00PM

EDT

Location

January

1

,

2021

|

9:00AM

-

5:00PM

The Final Countdown!

Time left for the event days hours minutes seconds

The countdown doesn't work if the event start date is set to TBD

The Final Countdown!

Time left for the event days hours minutes seconds

The countdown doesn't work if the event start date is set to TBD