How a Sneaky Unicode Trick Slipped Past Everyone In curl

Date published

May 18, 2025

How a Sneaky Unicode Trick Slipped Past Everyone in curl

You’d think experienced developers and CI pipelines would catch a malicious code change. But a recent incident in the curl project proved otherwise and it was all thanks to a sneaky little Unicode character.

James Fuller, a contributor to curl, submitted what looked like a simple cleanup pull request. Hidden inside it was a small but deliberate change: he swapped a regular ASCII letter in a URL with a visually identical Unicode character. No one noticed. Not a single human reviewer. Not even the automated checks.

Later, during a talk, he revealed the trick. It was a wake-up call.

The Problem With Lookalike Characters

The issue? Unicode characters exist that look exactly like standard Latin letters, but they’re actually different symbols. In the example Fuller used, the lowercase ‘g’ was replaced with the Armenian letter that looks nearly identical.

To the human eye, nothing looked off. But technically, the character had changed. It was enough to cause subtle breakage or worse, redirect URLs or insert malicious behavior without raising any red flags.

Even GitHub’s diff viewer showed a difference, but it didn’t explain what had changed. It just flagged a line as modified without any obvious visual clue. Daniel Stenberg, the author of the blog post and maintainer of curl, flagged the issue to GitHub. He didn’t get much of a response.

Other Platforms Handle It Better

On Mastodon, someone shared a screenshot showing how Gitea, another code hosting platform, actually displays a warning when it detects ambiguous Unicode characters in a pull request. It even labels them as “suspicious.” That’s a step in the right direction.

Stenberg pointed out he’d love more detailed warnings, but at least some platforms are acknowledging the problem.

curl’s Response: Build Our Own Detection

While waiting for GitHub to act, the curl team took matters into their own hands. They created a new CI job that scans every file in the repository and validates all UTF-8 sequences.

Since most of the files in the curl repo are plain ASCII, they were able to whitelist a few specific UTF-8 sequences and files. Anything else triggers a failure in CI.

They also went through all the test files and replaced UTF-8 content with safer alternatives. Some of the previous uses were unintentional and easily fixed. From now on, if someone tries to pull the same Unicode trick, their CI will catch it before a human has to.

Want to Check This Yourself?

There are tools out there to help detect lookalike Unicode characters. The Unicode Consortium even has an official one:

https://util.unicode.org/UnicodeJsps/confusables.jsp

Security is Always Reactive Until It Isn’t

This whole incident is a perfect example of how security often improves: reactively. A problem gets exposed, and then teams build systems to prevent it from happening again. As Stenberg notes, there are probably a hundred other issues waiting to be discovered. The challenge is staying ahead of them.

Bad actors are constantly looking for new attack vectors. Sometimes they’ll exploit gaps we’ve never even imagined. That’s why projects like curl are trying to shift from being reactive to being as proactive as possible.

Still, it’s a race and one where the other side often has the advantage of silence and surprise.

Update: GitHub Finally Responds

Later that day, GitHub reached out to confirm they’ve raised the issue internally as a security concern and are now working on a fix. Whether it will be enough or come fast enough remains to be seen.