Currently

Thomas Stephan Juzek

Computational linguist · Florida State University

A running note on what I am working on, reading, and excited by right now. I hope to update this page from time to time.

Last updated: June 2026

Thomas Stephan Juzek (Tommie Juzek), computational linguist at Florida State University — Me in Lisbon

Right now

Working on

For a recent NYT piece, Vauhini Vara asked me about the what and why of em dash usage in chat models: how much do LLMs use them, and why is it so out of line with human writing? Digging into data from earlier word-choice projects, em dashes turned out to be far messier than word choices. With words, models heavily overuse a fixed set, and this overuse stems to a good degree from post-training. With em dashes the behaviour depends heavily on the model family, and the roles of the pre-trained model and of developer choices are still open questions. What is clear is that ChatGPT's em dash usage has climbed sharply across generations, so I am following up on this.

Recently out

The 34-languages paper, revised after reviews and now at preprint stage while it finds a venue.

Reading

Reviewing for several venues right now, so reading a lot of papers.

Thinking about

Still impressed by work from Marwa Abdulhai and the team around Natasha Jaques, How LLMs Distort Our Written Language (project page): they show that LLM editing pushes writing toward a homogenised common style even when asked only to fix grammar. It really raises the stakes of AI usage.