There is a race towards language models with longer context windows. But how good are they, and how can we know?
Sick harry potter figure
Sick harry potter figure