Easier digit grouping for East Asia

The Myriad Project

East Asian languages count in ten-thousands.
Our digit grouping should too.

Try reading this number
1,234,567,890 12_3456_7890

The mismatch

When writing large numbers in Arabic numerals, we put a comma every three digits: 1,000,000,000. In English that number is “one billion,” and the commas land neatly on thousands, millions, and billions. Read the same number in Chinese, Japanese, or Korean, and it’s “ten hundred-millions”—the natural grouping is by ten-thousands, not thousands.

East Asian languages count in multiples of 10,000 (萬). The named units are 萬, 億, 兆—each ten thousand times the last. Three-digit comma grouping cuts across these units at arbitrary points. Readers have to mentally regroup the digits every time they encounter a large number.

Read aloud

1.23 billion (≈ 12 hundred-millions)

See it side by side

Type any whole number. Watch how three-digit commas and four-digit underscores partition the same sequence of digits.

Enter a number to compare.

  • Thousand grouping 1,234,567,890
  • Spoken (English) 1.23 billion
  • Myriad grouping 12_3456_7890
  • Spoken (Chinese) 12億3456萬7890

The proposal

Use an underscore (_) as a ten-thousand-place separator.

The underscore passes the boring tests: it is plain ASCII, easy to type, and already tolerated by programmers as a digit separator in languages like Python and Rust. It also does not look like a comma, which is the whole point.

Current

1,234,567,890

Proposed

12_3456_7890

Coexisting with thousands

The underscore can mark ten-thousand groups while the comma still marks thousands inside them. You can read both scales at once.

Combined 1,2_34,56_7,890

Why not four-digit commas?

The obvious alternative is commas every four digits. The comma is already taken, though. Too many people, spreadsheets, price lists, and standards read it as a three-digit separator. Reassigning it would make numbers easier for one group of readers and more error-prone for another.

A new separator is less elegant than reusing the comma, but it is clearer about what has changed.

Other candidates

We considered several other characters before settling on the underscore.

  1. Middle dot

    · U+00B7

    Clean on the page and familiar in East Asian typography. Visually preferable, but most keyboards do not make it easy to type.

  2. Thin space

    U+2009

    Has standards support through ISO 80000-1, but is too fragile in ordinary text. Many readers cannot tell it from a normal space, and many tools do not preserve it reliably.

  3. Apostrophe

    ' U+0027

    Easy to type and used in Switzerland for thousand groups. Microsoft Word, Google Docs, and many mobile keyboards silently replace a straight apostrophe with a curly quotation mark—a bad foundation for casual notation.

  4. Regular space

    U+0020

    The easiest to type, but it splits a number into separate tokens. Search, copy-paste, and data parsing all break, and lines can wrap in the middle of a number.

How to start using it

In writing, you can start now. Just type an underscore.

For software and data pipelines, no standard yet handles this notation automatically; an explicit normalization step is needed—strip underscores before parsing.