Currently
Thomas Stephan Juzek
Computational linguist · Florida State University
A running note on what I am working on, reading, and excited by right now. I hope to update this page from time to time.
Last updated: June 2026
Right now
Working on
For a recent NYT piece, Vauhini Vara asked me about the what and why of em dash usage in chat models: how much do LLMs use them, and why is it so out of line with human writing? Digging into data from earlier word-choice projects, em dashes turned out to be far messier than word choices. With words, models heavily overuse a fixed set, and this overuse stems to a good degree from post-training. With em dashes the behaviour depends heavily on the model family, and the roles of the pre-trained model and of developer choices are still open questions. What is clear is that ChatGPT's em dash usage has climbed sharply across generations, so I am following up on this.
Recently out
The 34-languages paper, revised after reviews and now at preprint stage while it finds a venue.
Reading
Reviewing for several venues right now, so reading a lot of papers.
Thinking about
Still impressed by work from Marwa Abdulhai and the team around Natasha Jaques, How LLMs Distort Our Written Language (project page): they show that LLM editing pushes writing toward a homogenised common style even when asked only to fix grammar. It really raises the stakes of AI usage.