Eight VPDs, stitched. Integrated gradients on the gates, one prompt at a time.

Each circle is one alive subcomponent firing at one token; lime raises the prediction, copper suppresses it. Edges connect components that co-activate on the same token across adjacent layers. Toggle BASE / +LoRA on a single prompt to watch the topology change — the lit-up network is what we mean when we say routing.

ATTRIBUTION GRAPH

pub use crate::
Predicted next ·
Top-5
lime · raises target logit copper · suppresses target co-attribution edge