Hide secret message with zero-width characters

Zero-width characters are non-printing characters that are not displayed by most applications, which leads to the name “zero-width.” They are Unicode characters, typically used to mark possible line break or join/separate characters in writing systems that use ligatures.

As they are “invisible,” anyone can use them to co‎​‌​‌​​​​​‌‌​‌‌​​​‌‌​​‌​‌​‌‌​​​​‌​‌‌‌​​‌‌​‌‌​​‌​‌​​‌​​​​​​‌‌‌​​‌‌​‌‌‌​‌​‌​‌‌‌​​​​​‌‌‌​​​​​‌‌​‌‌‌‌​‌‌‌​​‌​​‌‌‌​‌​​​​‌​​​​​​‌‌​‌‌​‌​‌‌‌‌​​‌​​‌​​​​​​‌‌​​​‌​​‌‌​‌‌​​​‌‌​‌‌‌‌​‌‌​​‌‌‌​​‌‌‌​‌​​​‌​​​​​​‌‌​‌​‌‌​‌‌​‌‌‌‌​‌‌​‌‌​‌​‌‌‌​​‌‌​‌‌​​​‌‌​‌‌​‌​​‌​‌‌​​‌‌‌​‌‌‌​‌​‌​‌‌‌‌​​‌​​‌​‌‌‌​​‌‌​​​‌‌​‌‌​‌‌‌‌​‌‌​‌‌​‌‏nceal messages or information within plain text. Don’t believe me? I left a secret message in the first sentence. Read this post to know how it’s possible.

Available zero-width characters

So far I’ve found 9 zero-width characters in the Unicode characters table.

CharacterUnicode
Zero-width spaceU+200B
Zero-width non-joinerU+200C
Zero-width joinerU+200D
Left-To-Right MarkU+200E
Right-To-Left MarkU+200F
Left-To-Right EmbeddingU+202A
Right-To-Left EmbeddingU+202B
Word joinerU+2060
Zero-width no-break spaceU+FEFF

There may be more, but nine is more than enough. In theory, only two different zero-width characters are enough to insert any type of data. Though binary representation is usually large, we can make use of every zero-width characters to effectively reduce the length of encoded data.

Fingerprinting

Zero-width characters can be used to fingerprint text. For example, someone within your team is leaking confidential information but you don’t know who. Just send each member a classified text with their name encoded in it. Wait for it to be leaked, then extract the name, and do whatever you like with them.

Unlike other steganography techniques (such as utilizing noises in images, videos, sound as the container), zero-width characters are not removed if the text is formatted, copied, pasted. It’s really hard to detect them without special tools, as most text editors don’t render them. In addition, we’re not limited in the amount of data that can be encoded. However, editors do count zero-width characters, so encoding too much data within a short text makes it more suspicious.

Tool

To demonstrate the ability to hide secret messages with zero-width characters, I created a tool here.

How does it work?

  • Use TextEncoder to the secret message from String to Uint8Array, which is an array of 8-bit unsigned integers.
  • Convert each integer to 8 bits, then convert each bit to zero-width characters:
    • Bit value 0 is encoded as Zero-width space (U+200B)
    • Bit value 1 is encoded as Zero-width non-joiner (U+200C)
  • Hide the encoded string in the middle of the carrier message.

In addition, two other zero-width characters are used to mark the beginning and ending of the encoded string:

  • Left-To-Right Mark (U+200E) marks the beginning
  • Right-To-Left Mark (U+200F) marks the end

This makes it easier to detect the position of the encoded string when decoding it.

Please refer to source code for more details.

Detect zero-width characters

Use any text editor that supports rendering of zero-width characters.

For quick test, you can use Chrome Developer Tools console:

This Chrome extension will convert any zero-width characters to emojis.

References

0 0 vote
Article Rating
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x