Summary Through Apple Intelligence: Apple developed a technique for large language models to predict multiple tokens simultaneously, speeding up responses by 2-3x for general tasks and up to 5x for coding and math. The technique, called “multi-token prediction,” uses mask tokens to allow the model to speculate on upcoming words while ensuring accuracy. submitted by /u/Fer65432_Plays |