Short Thought: Em dashes, and the Great AI Self-Tell
A very short thought on how our usage and diet of language was rotten before LLMs came along.
The popularity of AI models has led to a thrilling new form of witch-hunt that is sweeping online spaces: whether a post was made using AI. Many an artist has found themselves in the guillotine because someone can “tell from the pixels that it’s AI”, but the latest victim of this pursuit is a piece of punctuation.
—
The versatility of the em dash is what causes it to appear in so much writing, and the frequency of its use is almost certainly the reason why it is used so extensively by large language models. To make a practical argument, if the em dash was a rare and specific symbol in most online texts (Wikipedia, fiction and non-fiction books), LLM usage of it would decline proportionately because it would also be rare within their training sets.
Where you don’t see the em dash being used is in the world of online “writing”: Twitter posts, YouTube comments, Instagram, Reddit, and so on. This is because people, broadly, do not give a shit about any sort of flourish or complexity in these spaces, instead electing to go with whatever communicates their primordial thought in as few keypresses as possible.
In the generative image and video spaces, the problem has always been that the quality of outputs is usually rock-bottom, as well as being highly uniform. Put another way, AI art usually just sucks to look at, and while iterative development has fixed common problems like subjects having too many fingers, the quality still leaves a lot to be desired.
Not so within the world of writing.
The current “tells” that we have generally rely upon the fact that the average YouTube or Reddit comment has a far lower quality than what an LLM would produce. “Huh? This post doesn’t contain any grammatical errors or typos, and is structured in a way that makes it easy to read? Must be AI”. Use an em dash in a comment, and someone will accuse you of using ChatGPT.
What pisses me off to the point of incandescence is comments like the above.
ALT-0151, that’s the alt-code for the beast. I know that because I’ve typed it out thousands of times, with almost every article I’ve written here containing one because it’s just such a handy piece of punctuation. It replaces the—in my opinion—far uglier colon, and it stops me from filling my articles with parentheses (mostly). It’s great.
“A human would use -”? No, a human would not use a fucking hyphen, a human would use an em dash. We use hyphens when we’re looking to join words together that are linked grammatically or in terms of meaning, such as in the sentence “read a book you goddamned chair-sniffer”.
A huge pile of internet commenters, in opining on whether a post is AI, have inadvertently performed a damning self-tell.
If you hold that a comment is likely to be AI because the em dash is almost never seen in online discourse, that’s a completely sound position, but if you think it’s because “nobody uses it at all”, then you are placing a bullhorn to your lips and loudly bellowing that your textual diet consists entirely of Reddit posts and YouTube comments.
The most terrifying revelation to me from the growing prevalence of LLMs in these spaces is not the number of people that are willing to offload their critical-thinking or writing skills to a model, but the number of people who think that a common piece of punctuation in anything other than Twitch chat is somehow an “AI exclusive” that people can’t actually type.
This is yet another feather in the cap of the “AI is a dark mirror” position, where the objection to this technology is so frequently in what it shows up about ourselves.
In this case, it’s that the vast majority of text in the spaces that people actually frequent isn’t worth the bytes used to store it, and that a life spent reading nothing but these comments causes something akin to chronic traumatic encephalopathy. If your first introduction to the em dash is in a ChatGPT-generated comment, I am begging you to read something that isn’t social media.
Start with more linguistically-nourishing text, such as a pharmacy receipt, or the instruction manual for your fridge.
I stand with you on this. THANK YOU FOR SPEAKING UP. I went looking for em dash posts on Substack, yours came up, and I came away smiling but also concerned re: "dark mirror". It ticks me off too, seeing Reddit comments like that, and worse still is when people aren't educating themselves. While I don't possess your gift of snark, I do wish to let you know I added your post to my compilation, aptly titled: https://torley.substack.com/p/--