What No One Tells You About AI Infrastructure with Hugo Shi

What No One Tells You About AI Infrastructure with Hugo Shi

Released Friday, 4th July 2025
Good episode? Give it some love!
What No One Tells You About AI Infrastructure with Hugo Shi

What No One Tells You About AI Infrastructure with Hugo Shi

What No One Tells You About AI Infrastructure with Hugo Shi

What No One Tells You About AI Infrastructure with Hugo Shi

Friday, 4th July 2025
Good episode? Give it some love!
Rate Episode
List

In this episode of the ODSC AI Podcast, host Sheamus McGovern, founder of ODSC, sits down with Hugo Shi, Co-Founder and CTO of Saturn Cloud, a platform that gives data scientists and ML engineers the tools and flexibility they need to work the way they want. From his early days as a desk quant during the 2008 financial crisis to founding Saturn Cloud, Hugo brings a wealth of experience across finance, open source, and AI infrastructure.

This conversation dives deep into the realities of building AI infrastructure at scale, advocating for self-service tools for AI practitioners, managing cloud costs, and why flexibility—not control—is the foundation for productive data teams. It’s a must-listen for anyone working with machine learning infrastructure, whether you're a beginner navigating your first platform or a seasoned engineer scaling multi-cloud operations.

Key Topics Covered:

  • Hugo’s career journey: From quant finance to co-founding Anaconda and then Saturn Cloud
  • What working as a desk quant during the 2008 crisis taught him about speed, impatience, and iteration
  • The pivotal role Anaconda played in democratizing Python data science
  • Why Saturn Cloud was founded: common infra pain points across data teams
  • How Saturn Cloud empowers teams through:
  • Interactive compute environments
  • Scheduled jobs
  • Long-running deployments
  • The importance of flexibility vs. opinionated platforms
  • Why data scientists should not suffer in silence over infra pain
  • Hidden cloud costs: compute, storage, and network—and how to manage them
  • Differences between AI cloud providers (CoreWeave, Lambda Labs) and traditional hyperscalers (AWS, Azure, GCP)
  • Scaling AI: lessons from working with massive clusters and thousands of parallel jobs
  • Security best practices in ML platforms, including role-based access and cost attribution
  • Why ML teams should collaborate across IT, product, and data engineering
  • Hard-won lessons from real-world AI infrastructure scaling

 Memorable Outtakes:

On infrastructure friction and self-advocacy:

“Data scientists, ML engineers, and AI engineers suffer in silence… They don’t perceive themselves as tech experts, so they think they have to accept infrastructure pain. They shouldn’t.”

On why Saturn Cloud avoids being too opinionated:

“Notebooks are fine—but making them the only way to work? That’s a career trap. People should graduate to full IDEs and better practices.”

3. On scaling AI operations:

“What can be done, will be done. If it’s possible someone will try it. At scale, low-probability failures become inevitable.”

References & Resources:

Hugo Shi:

Mentioned Companies and Tools:


Sponsored by:

Agentic AI Summit 2025

Join the premier virtual event for AI builders from July 15–30. Gain hands-on skills in designing, deploying, and scaling autonomous AI agents.

🔥 Use code podcast for 10% off any ticket.

Register now: https://www.summit.ai/


ODSC West 2025 – The Leading AI Training Conference

Attend in San Francisco from October 28–30 for expert-led sessions on generative AI, LLMOps, and AI-driven automation.

🔥 Use code podcast for 10% off any ticket.

Learn more: https://odsc.com/california

Show More