[QA] Deep Ignorance: Filtering Pretraining Data Builds Tamper-Resistant Safeguards into Open-Weight LLMs

Released Thursday, 14th August 2025

Good episode? Give it some love!

[QA] Deep Ignorance: Filtering Pretraining Data Builds Tamper-Resistant Safeguards into Open-Weight LLMs

Thursday, 14th August 2025

Good episode? Give it some love!

Rate Episode

List

This paper explores filtering dual-use topics from training data to enhance the tamper-resistance of open-weight AI systems, demonstrating significant improvements in adversarial fine-tuning resistance without degrading unrelated capabilities.

https://arxiv.org/abs//2508.06601

YouTube: https://www.youtube.com/@ArxivPapers

TikTok: https://www.tiktok.com/@arxiv_papers

Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016

Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

Rate

List

Get this podcast via API

From The Podcast

Running out of time to catch up with new arXiv papers? We take the most impactful papers and present them as convenient podcasts. If you're a visual learner, we offer these papers in an engaging video format. Our service fills the gap between overly brief paper summaries and time-consuming full paper reads. You gain academic insights in a time-efficient, digestible format. Code behind this work: https://github.com/imelnyk/ArxivPapers

Join Podchaser to...

Rate podcasts and episodes
Follow podcasts and creators
Create podcast and episode lists
& much more

Do you host or manage this podcast?
Claim and edit this page to your liking.

Podchaser is the ultimate destination for podcast data, search, and discovery. Learn More