Easier digit grouping for East Asia
The Myriad Project
East Asian languages count in ten-thousands.
Our digit grouping should too.
The mismatch
When writing large numbers in Arabic numerals, we put a comma every three digits: 1,000,000,000. In English that number is “one billion,” and the commas land neatly on thousands, millions, and billions. Read the same number in Chinese, Japanese, or Korean, and it’s “ten hundred-millions”—the natural grouping is by ten-thousands, not thousands.
East Asian languages count in multiples of 10,000 (萬). The named units are 萬, 億, 兆—each ten thousand times the last. Three-digit comma grouping cuts across these units at arbitrary points. Readers have to mentally regroup the digits every time they encounter a large number.
1.23 billion (≈ 12 hundred-millions)
See it side by side
Type any whole number. Watch how three-digit commas and four-digit underscores partition the same sequence of digits.
The proposal
Use an underscore (_) as a ten-thousand-place separator.
The underscore passes the boring tests: it is plain ASCII, easy to type, and already tolerated by programmers as a digit separator in languages like Python and Rust. It also does not look like a comma, which is the whole point.
Current
1,234,567,890
Proposed
12_3456_7890
Coexisting with thousands
The underscore can mark ten-thousand groups while the comma still marks thousands inside them. You can read both scales at once.
Why not four-digit commas?
The obvious alternative is commas every four digits. The comma is already taken, though. Too many people, spreadsheets, price lists, and standards read it as a three-digit separator. Reassigning it would make numbers easier for one group of readers and more error-prone for another.
A new separator is less elegant than reusing the comma, but it is clearer about what has changed.
Other candidates
We considered several other characters before settling on the underscore.
-
Middle dot
· U+00B7Clean on the page and familiar in East Asian typography. Visually preferable, but most keyboards do not make it easy to type.
-
Thin space
U+2009Has standards support through ISO 80000-1, but is too fragile in ordinary text. Many readers cannot tell it from a normal space, and many tools do not preserve it reliably.
-
Apostrophe
' U+0027Easy to type and used in Switzerland for thousand groups. Microsoft Word, Google Docs, and many mobile keyboards silently replace a straight apostrophe with a curly quotation mark—a bad foundation for casual notation.
-
Regular space
U+0020The easiest to type, but it splits a number into separate tokens. Search, copy-paste, and data parsing all break, and lines can wrap in the middle of a number.
How to start using it
In writing, you can start now. Just type an underscore.
For software and data pipelines, no standard yet handles this notation automatically; an explicit normalization step is needed—strip underscores before parsing.