Y Combinator Cast
02/06/25
@ Y Combinator
DeepSeek's V3 model utilizes a mixture of experts architecture, activating only 37 billion out of 671 billion parameters for each token prediction, significantly reducing computation compared to models like Llama 3, which activates all 405 billion parameters.