TLDR;

Intro

Pseudo-Line Detection

Can We Generalize?

Here’s a better example of that triangle pattern I mentioned for cluster 3

https://jazzy-bublanina-350de8.netlify.app/triangle

Overall, I think that given more tweaking and tuning, it may be possible to train a simple CNN to do attention head classification. Based off of my preliminary testing, I think clustering shows less promise, though after significant trial and error it did produce some interesting clusters.

Limitations and Other Things to Try

Further Underdeveloped Sections

Takeaways and Thanks

References & Resources