https://blog.eleuther.ai/transformer-math/

Speed Estimation:

Accelerator BF16 FLOPS VRAM
B200 2250 TFLOPS 180 GB
H100 900 TFLOPS 80 GB
A100 300 TFLOPS 80 GB
TPU-v3-8 500 TFLOPS 128 GB
A6000 150 TFLOPS 48 GB
RTX 4090 300 TFLOPS 24 GB

FLOPs Estimation

What is a FLOP?

For each MxN, NxP matrix multiplication, it requires 2MPN FLOPS (half addition half multiplication)

Model Training FLOPS