ai-ml

Transformer FLOPs

by matthew June 12, 2025

by matthew June 12, 2025 0 comment

https://www.adamcasson.com/posts/transformer-flops

Counting the number of floating-point operations (FLOPs) in Transformers is a useful way to estimate compute requirements and measure efficiency. As training runs get larger and larger (thus more expensive) it becomes more important to understand how many FLOPs we need to do and how well we utilize our hardware.

Counting FLOPs in Transformers

One commonly used method for counting FLOPs is from the OpenAI scaling law paper which uses

Cforward+backward≈6N

for estimating the number of FLOPs per token during the training of a decoder-only Transformer where NN is the number of non-embedding parameters in the model. T

Transformer FLOPs

Counting FLOPs in Transformers

matthew

The new attack surface: from space to smartphone

Online Forums: History, Use, and Management

You may also like

Online Forums: History, Use, and Management