
-
Tanh Function as Drop-In Replacement for Layernorm
Why LayerNorm is not without alternatives
-
Towards Group Equivariant Self-Attention
-
Group Equivariant Self-Attention
-
Stand-Alone Self-Attention in Vision From Scratch
-
Towards Stand-Alone Self-Attention in Vision
A deep dive into the application of the transformer architecture and its self-attention operation for vision