A Four-Character Rule

Hyphenation on the web is usually off. Browsers that support hyphens: auto hand the decisionto the operating system, which hands it to a dictionary that may or may not have beentrained on the language you are writing in. The result is inconsistent enough that mostdesigners disable it entirely and live with ragged-right prose that occasionally produces a very shortfirst line.

Pilcrow does not disable it. Pilcrow uses Hyphenopoly — a TeX-trained hyphenation library — to insertsoft hyphens at syllable boundaries before the typesetting pass runs. The soft hyphen is a hint: a U+00AD character that says here is a legal break point. Hyphenopoly knows English syllable structure; itrespects rightmin: 3, which means no post-hyphen fragment shorter than three characters will be sug-gested. That part works correctly.

The problem is that the thing reading those hints — pretext, the line-breaking primitive at Pilcrow’score — is grapheme-aware, not syllable-aware. pretext walks the line counting visible characters andstops when the next character would exceed the measure. When it encounters a soft hyphen that fitsthe current line, it takes the break. But it also packs as many graphemes from the post-hyphen segmentonto that line as will still fit. The post-hyphen segment ics in italics, for instance, might yield ital-i on oneline and cs on the next: a two-character fragment that Hyphenopoly would never have permitted butthat pretext produces anyway, because it has no visibility into what the hyphenation library intended.

Two characters on a line are not a break. They are a typographic accident. The eye reads them as a mis-print and stalls.

The orphan guard catches this. After pretext computes a paragraph’s lines, the guard inspects everyline-end that carries a visible hyphen. If the fragment that follows on the next line is fewer than four char-acters, the guard strips the soft hyphen that caused the break and re-runs pretext from the paragraphstart. Four characters is the threshold. Below four, the fragment reads as error. At four and above, theeye accepts the break as intentional — the line held its shape.

Why four specifically? Three is Hyphenopoly’s own rightmin: the minimum it would accept at the soft-hyphen position, before pretext’s grapheme-packing shortens it further. Four gives one character ofmargin beyond that, which in practice catches the cases the eye actually objects to. The guard has beenrunning across Pilcrow’s example posts for nine days without a false positive.

The guard is a local mitigation. pretext is the engine; Pilcrow is the editorial layer above it; the gapbetween syllable-aware insertion and grapheme-aware breaking is a pretext-level behaviour, not some-thing a wrapper can fully address. The right fix was always upstream. I filed it as pretext issue #162,with a minimal repro and the ital-i|cs case spelled out exactly. Cheng Lou, pretext’s author, shippedcommit f06fef0 on 2026-05-08. The fix changes pretext’s default behaviour: soft-hyphen breaks nowstay at the insertion point, and the post-hyphen segment carries whole to the next line. Hyphenopoly’srightmin is honoured end-to-end.

Once that commit reaches an npm release of @chenglou/pretext, the orphan guard comes out. Ninedays of earned keep, then a clean removal.

There is something worth noting about how this kind of bug gets fixed. The orphan guard existsbecause someone cared enough about ital-i|cs to find it objectionable rather than acceptable. Theupstream fix exists because Cheng Lou received a report with a clear repro and fixed it at the root,rather than suggesting a workaround.