Democratising Dataproc

Democratising Dataproc

Released Monday, 20th July 2020
Good episode? Give it some love!
Democratising Dataproc

Democratising Dataproc

Democratising Dataproc

Democratising Dataproc

Monday, 20th July 2020
Good episode? Give it some love!
Rate Episode
List

oin this talk to see how Dunnhumby's data scientists and product teams uses Dataproc as a data platform to run ETL and machine learning routines.

We encourage product teams to autonomously spin up clusters only when they need to and to use Apache Airflow to coordinate workloads. We share a hive metastore across those many short-lived clusters and isolate workloads following the principal of least privilege. We provide JupyterLab and other utilities for data engineers and scientists to work with.

Come and learn how we do it.

-----
BIO
Jamie Thomson
Speaker Bio: Jamie has worked for dunnhumby for five years, first building big data data solutions using on-premises infrastructure then moving on to build dunnhumby’s cloud platform. Prior to dunnhumby he spent ten years as a Microsoft Most Valuable Professional (MVP) for SQL Server.

---
TLDR: Join this online event to see how Dunnhumby leverages Google's managed Hadoop service, Dataproc, for ML and ETL usages.

#Dataproc #ML #ETL #Airflow



SHOW LESS





Show More
Rate
List

Join Podchaser to...

  • Rate podcasts and episodes
  • Follow podcasts and creators
  • Create podcast and episode lists
  • & much more
Do you host or manage this podcast?
Claim and edit this page to your liking.
,