politics, DSD-1 later reemerged (with just slight changes) as the Data Encryption
I have been thinking a lot lately about “diachronic AI” and “vintage LLMs” — language models designed to index a particular slice of historical sources rather than to hoover up all data available. I’ll have more to say about this in a future post, but one thing that came to mind while writing this one is the point made by AI safety researcher Owain Evans about how such models could be trained:
,详情可参考51吃瓜
Tied Q/K + V/O projections, RoPE period-19, parabolic tied-embed decode, two-hinge ReLU MLP
Article InformationAuthor, 雷切爾·克倫(Rachel Clun)