Llama 3.2 3B Architecture Visualization (Accurate)

Component	Weight Name	Shape	Description
Embedding	embed_tokens.weight	[128256, 3072]	Vocabulary → Hidden
Self-Attention	self_attn.q_proj.weight	[3072, 3072]	Query: 24 heads × 128 dim
	self_attn.k_proj.weight	[1024, 3072]	Key: 8 heads × 128 dim (GQA)
	self_attn.v_proj.weight	[1024, 3072]	Value: 8 heads × 128 dim (GQA)
	self_attn.o_proj.weight	[3072, 3072]	Output projection
MLP (SwiGLU)	mlp.gate_proj.weight	[8192, 3072]	Gate projection
	mlp.up_proj.weight	[8192, 3072]	Up projection
	mlp.down_proj.weight	[3072, 8192]	Down projection
LayerNorm	input_layernorm.weight	[3072]	RMSNorm (no bias)
LayerNorm	post_attention_layernorm.weight	[3072]	RMSNorm (no bias)
Final Norm	model.norm.weight	[3072]	Final RMSNorm
LM Head	lm_head.weight	[128256, 3072]	Hidden → Vocabulary

🦙 Llama 3.2 3B Architecture Visualization