oin this talk to see how Dunnhumby's data scientists and product teams uses Dataproc as a data platform to run ETL and machine learning routines.
We encourage product teams to autonomously spin up clusters only when they need to and to use Apache Airflow to coordinate workloads. We share a hive metastore across those many short-lived clusters and isolate workloads following the principal of least privilege. We provide JupyterLab and other utilities for data engineers and scientists to work with.
Come and learn how we do it.
-----
BIO
Jamie Thomson
Speaker Bio: Jamie has worked for dunnhumby for five years, first building big data data solutions using on-premises infrastructure then moving on to build dunnhumby’s cloud platform. Prior to dunnhumby he spent ten years as a Microsoft Most Valuable Professional (MVP) for SQL Server.
---
TLDR: Join this online event to see how Dunnhumby leverages Google's managed Hadoop service, Dataproc, for ML and ETL usages.
#Dataproc #ML #ETL #Airflow
SHOW LESS
Podchaser is the ultimate destination for podcast data, search, and discovery. Learn More