Language is Not Merely a Vessel
The language you prompt in is doing more work than you think.
Eric Prouzet via Unsplash
A coworker shared his screen with our team today and showed us that he typically uses Claude Code in Chinese.
Chinese is his first language. Makes sense.
And another teammate commented, as an aside, that prompting in Chinese apparently uses far fewer tokens than English. Like thirty to forty percent fewer.
Most of our team doesn't speak Chinese. So while everyone found the statistic interesting, we didn't spend much time discussing it, and moved on.
But I felt an itch to dig into this further and began searching online to learn more. I found some LinkedIn posts on the topic and went further down the rabbit hole, and it turns out that token efficiency was perhaps the least interesting part of this thread.
The premise of the argument that the posts make is sound. Most large language models use a tokenization method called byte-pair encoding, which breaks text into chunks based on frequency patterns in the training data. Because most training data is English-heavy, common English words and subwords get their own tokens. But less common words get split into smaller pieces. "Running" might be one token; "tokenization" might be three.
Chinese works differently. Each character is already a discrete, meaning-bearing unit. For example, 学 means "learn," 习 means "practice," and together, 学习 means "study." Here, BPE has less splitting work to do. The characters map more cleanly, often one to one. The result is that Chinese can express more meaning per token than English in many contexts.
This is worth paying attention to. Token count affects cost, latency, and how much you can fit in a context window. If you're running thousands of prompts a day, a thirty percent reduction compounds.
But the thirty to forty percent figure gets thrown around as though it's neatly transferrable across all contexts, and it doesn't. It's model-dependent, tokenizer-dependent, and highly sensitive to what you're actually saying.
And technical content—code, variable names, domain jargon—often erases the gap entirely.
The efficiency gain is real in the right conditions. The conditions are just narrower than the LinkedIn post that intially popularized the idea implies.
Fine.
But here's what I find significantly more interesting. My teammate wasn't prompting in Chinese to save money. Rather, he was prompting in Chinese because that's how he thinks. And he was getting output that felt more natural to him—not just cheaper, but different in shape.
And there's a framework for this.
In the 1970s, anthropologist Edward Hall divided cultures into high-context and low-context communicators.
Low-context cultures—American English being the canonical example—make everything explicit. We over-explain. We hedge. We put the argument in the first paragraph and then prove it.
High-context cultures, Chinese among them, rely more on shared understanding, implication, what gets left unsaid. The meaning lives between the words as much as in them.
Language carries this with it. When you write in Chinese, you're not just translating English thoughts into a more token-efficient script. You're working within a different set of communicative paradigms and assumptions—about what needs to be said, what can be implied, how much context the reader is expected to bring.
Whether that shapes what the model gives back is an open question. But it's not an unreasonable one. These models were trained on human language at scale, which means they were trained on all of these cultural assumptions baked in. When you prompt in a high-context language, you might be pulling on a different set of those assumptions.
That's speculative. But it's more interesting than token counting.
There's a catch, though, and I think this is worth dissecting as well.
Most of the reasoning quality in these models—the RLHF loops, the benchmark tuning, the careful alignment work—was done in English. Not exclusively, but predominantly. Which means that for technical domains especially, English is still the language these models are best at. The nuance is sharper, the domain vocabulary is richer, the chain-of-thought patterns are more reliable.
So the tradeoff is real. You might save tokens prompting in Chinese and get output that feels more natural if you think in Chinese. But if you're debugging a distributed systems problem or writing a performance review or drafting a legal clause, you might also be leaving reasoning quality on the table.
My teammate, however, probably isn't. He's fluent in both English and Chinese. He knows what good output looks like in both languages, and he can tell when something has gone awry. But the person who switches to Chinese purely because a LinkedIn post told them it's cheaper—without the fluency to evaluate what they're getting back—might not notice the difference until the issue snowballs.
The efficiency gain is real. The quality tradeoff is real. They don't cancel each other out.
I keep thinking about how we treat prompts like search queries, or neutral inputs—like levers we pull to extract output from a machine. So, as engineers, our instinct to begin optimizing kicks in immediately—reduce token usage, prompt for faster responses, lower costs overall.
We want to treat the input language as a delivery mechanism and use it as a means to optimize the output of the pipeline.
But the issue is that language was never just a delivery mechanism.
Instead, language informs what you notice, what you reach for, and what feels like a complete thought. It is central to many of our higher-order thinking processes.
The Sapir-Whorf hypothesis—the idea that language influences cognition—has a complicated academic history, and the strong version of it is mostly discredited. But the weak version, that language nudges how you frame things, is difficult to dispute.
For those of you who are bilingual, you've felt likely felt it as a pain point during translation: when you attempted to explain something in a second language and found yourself saying something slightly different than you meant to, not because you lacked the words but because the structure of the second language pulled you in an alternate direction.
That pull doesn't disappear when the language in question is used to prompt. Prompting is writing. And writing in a language is not the same as writing through it.
My teammate isn't using Chinese as a workaround. He's thinking in it, which means he's prompting from a different starting point entirely—different assumptions about what needs to be explicit, different instincts about where a thought ends.
The token efficiency is a side effect. The real variable is cognition.
Maybe this is just a reinforcement of something we already know.
Exposure to different languages is good. Not just for communication but for thought as well. Every language you encounter gives you a slightly different set of tools for processing and navigating the world around us. This is why people who grow up bilingual don't just have two vocabularies. We have two sets of instincts.
We've known this about human cognition for a long time. It's just interesting that it might be true about how we work with AI too. The language you bring to the model likely isn't neutral. Different frameworks, different structures, different ways of organizing a thought—they can produce different outputs. Not just cheaper or faster ones.
Different worldviews in, different workflows out.
We've always known that broadening your inputs broadens your thinking. And I think that this might apply here too.