![]() ![]() So from your published work, it seems like you’re really interested in neural network interpretability. Interpretability for engineers Why interpretability? For links to what we’re discussing, you can check the description of this episode and you can review the transcript at. We’ll be talking about his ‘ Engineer’s Interpretability Sequence’ of blog posts, well as his paper on benchmarking whether interpretability tools can find Trojan horses inside neural networks. Stephen was previously an intern working with me at UC Berkeley, but he’s now a Ph.D student at MIT working with Dylan Hadfield-Menell on adversaries and interpretability in machine learning. In this episode I’ll be speaking with Stephen Casper. Benchmarking Interpretability Tools (for Deep Neural Networks) (Using Trojan Discovery).Deceptive alignment and interpretability. ![]()
0 Comments
Leave a Reply. |