Episode from the podcastUnleashing Data on Google Cloud

Democratising Dataproc

Released Monday, 20th July 2020

Good episode? Give it some love!

Democratising Dataproc

Monday, 20th July 2020

Good episode? Give it some love!

Rate Episode

List

oin this talk to see how Dunnhumby's data scientists and product teams uses Dataproc as a data platform to run ETL and machine learning routines.

We encourage product teams to autonomously spin up clusters only when they need to and to use Apache Airflow to coordinate workloads. We share a hive metastore across those many short-lived clusters and isolate workloads following the principal of least privilege. We provide JupyterLab and other utilities for data engineers and scientists to work with.

Come and learn how we do it.

-----
BIO
Jamie Thomson
Speaker Bio: Jamie has worked for dunnhumby for five years, first building big data data solutions using on-premises infrastructure then moving on to build dunnhumby’s cloud platform. Prior to dunnhumby he spent ten years as a Microsoft Most Valuable Professional (MVP) for SQL Server.

---
TLDR: Join this online event to see how Dunnhumby leverages Google's managed Hadoop service, Dataproc, for ML and ETL usages.

#Dataproc #ML #ETL #Airflow

SHOW LESS

Rate

List

Get this podcast via API

From The Podcast

Learn and be inspired on ways to use data on Google Cloud through sharing experiences and knowledge.

Join Podchaser to...

Rate podcasts and episodes
Follow podcasts and creators
Create podcast and episode lists
& much more

Download Audio Filehttps://www.buzzsprout.com/1076233/4636988-democratising-dataproc.mp3?blob_id=18317909

Do you host or manage this podcast?
Claim and edit this page to your liking.

Podchaser is the ultimate destination for podcast data, search, and discovery. Learn More