I will give a brief (and somewhat informal) presentation of the paper On the Limitations of the Transformer Architecture. The goal is to provide a mathematical definition of a transformer—more precisely, the self-attention mechanism—without focusing on its practical implementations. The paper connects this formalism to communication complexity to argue for inherent limitations of the architecture. The talk is expected to last 15–30 minutes, depending on the level of discussion.