Nathan Lambert

02/03/25

@ Lex Fridman

DeepSeek's implementation of mixture of experts innovates by changing the routing mechanism, allowing for a high sparsity factor that activates only a small fraction of the model's parameters while ensuring all experts are utilized during training to improve model performance.

Video

DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters | Lex Fridman Podcast #459

Related Takeaways