Mixture of Experts
Image Source: https://newsletter.maartengrootendorst.com/p/a-visual-guide-to-mixture-of-experts Basic MoE structure Experts are FFNN themselves, instead of passing input representation to only one dense FFNN we now have option to route them to more FFNNs. Since most LLMs have several decoder blocks, a given text will pass through multiple experts before the text is generated. Down the line it could use multiple experts but at different blocks i.e (layers) A routing layer is set to choose experts depending on how many experts are selected MoE are categorized into two i....