One of the prime tools in the typographer’s toolbox is the hyphen. Typographers use hyphens to divide words across lines, thereby helping to equalise line-lengths, decreasing word-spacing and increasing readability. All books and newspapers of any quality use this technique to ‘justify’ their text, yet it is not a tool available to Web designers in any useful form.
Word division across lines should be available as a CSS property with the algorithms built into browser language packs (in fact the CSS 3 Text Module introduces word-break-inside:hyphenate). Such software, comprising hyphenation dictionaries, has been around for some time in page layout tools such as QuarkXPress and more recently in Adobe InDesign, but has yet to make it into word processing packages, let alone Web browsers.
It’s worth noting that the rules for word division are reasonably straight forward and eminently programmable. In English, one should divide words after a vowel, turning over the consonant. In present participles, take over -ing, as: carry-ing, divid-ing, crown-ing. Generally whenever two consonants come together, put the hyphen between them: splen-dour, forget-ting, tetraphyl-lidea, haemor-rhage. And always try to divide so the first part of the division suggests what is following: e.g. starva-tion not star-vation; re-adjust not read-just; cam-ellia not camel-lia. The rule for division of words is not one of etymology but of sound or pronunciation.
So without CSS to the rescue just yet, the next best method is to insert hyphen characters either manually or by post-processing a CMS. However, liquid layouts and resizable text mean that, on the Web, one never knows which word will be at the end of a line, the implication being that a special hyphen is required which only shows itself when a word is divided.
Such a character has existed as an entity since HTML 3.2; it is the soft hyphen (­ or ­). Hyphenation is well explained in HTML 4.0, so why does no-one use it? Not knowing of its existence may be one reason, typographical ignorance or time constraints others, but what’s really stopping us using it now is poor and inconsistent browser support.
Test cases reveal that, for once, IE6 gets it right, as does Opera 7. Mozilla hides all soft hyphens – an acceptable degradation but still wrong, however Safari displays all soft hyphens, rendering text virtually unreadable. In the words of the typographer Geoffrey Dowding:
If the typesetter has resolved never to divide words, such works would rarely, if ever be of any typographic distinction.
We have still some way to go.
Dunstan wrote:
I’ve often wanted to use a soft hyphen, but as you say, browser support has always stopped me.
Darn things.
Tomas Franzén wrote:
Such software, comprising hyphenation dictionaries, has been around for some time in page layout tools such as QuarkXPress and more recently in Adobe InDesign, but has yet to make it into word processing packages, let alone Web browsers.
I actually think AppleWorks does this. I might be wrong though.
James Craig wrote:
Great topic. Did you report those bugs to Safari and Mozilla? Can’t wait to have that CSS3 property. Cheers.
Phil Baines wrote:
“Not knowing of it’s existence may be one reason”
Erm, well, yeah.
Nice post. and nice to see that part of css 3. I havn’t looked at it in detail yet since B Bos sent the link out over the w3c style list.
Matt Southerden wrote:
There is an error in your first examples:
Here you are dividing before the consonant.
It is something that bugs me about typesetting for the web. Until there are tool available to render text in the manner that you would expect of a professional print media package, I, like yourself, will continue to use left justification.
Regards,
Matt.
Rich wrote:
It’s not an error in my examples, it’s an error in my clarity. The examples for the present participle (-ing) rule are in addition to the consonant rule, not examples of it.
Nic wrote:
Dave Hyatt, one of the Safari developers, has posted to his blog that he’s seen this post and fixed KHTML so that future releases of Safari (and Konqueror) will support the same way IE6 and Opera do.
Nick Richards wrote:
There’s also Hyphenation dictionaries for OpenOffice.org in loads of different languages (including english!)
Barry Abrahamsen wrote:
Well, two things come to mind – how do search engines deal with the soft hyphen mark-up? Will Google miss-index words if they contain them?
Also, pre-marking long words with soft hyphens for browsers to use if necessary, could end up making find-and-replace a hit or miss effort for Web site maintainers.
Ben Hollis wrote:
Perhaps someone could come up with a JavaScript that would dynamically hyphenate text, and re-hyphenate on browser resize. That way you’d have liquid layouts, hyphenation, and you don’t need soft hyphens. However, it has the potential to be a large script, and slow…
Michael S. wrote:
“Its worth noting that the rules for word division are reasonably straight forward and eminently programmable.”
It’s not quite that easy. For example, “record” is hyphenated differently depending on whether it’s a noun (rec-ord) or a verb (re-cord).
See Knuth:
“[...] computers are notoriously bad at hyphenation. When the typesetting of newspapers began to be fully automated, jokes about “the-rapists who pre-ached on wee-knights” soon began to circulate.
“It’s not hard to understand why machines have behaved poorly at this task, because hyphenation is quite a difficult problem. For exampmle, the word “record” is supposed to be broken as “rec-ord” when it is a noun, but “re-cord” when it is a verb. The word “hyphenation” itself is somewhat exceptional; if “hy-phen-a-tion” is compared to similar words like “con-cat-e-na-tion”, it’s not immediately clear why the “n” should be attached to the “e” in one case but not the other. Examples like “dem-on-stra-tion” vs. “de-mon-stra-tive” show that the alteration of two letters can actually affect hyphens that are nine
– Donald E. Knuth, The TeXbook, p. 449positions away.”
Herbert Schulz wrote:
I’m not quite sure why you think hyphenation rules are fairly simple. Even the same language used in different locations can use different rules; e.g., British English uses different rules than US English.
The most best set of rules I know of are part of TeX and even those have an exception list and must change with location.
Good Luck,
Herb Schulz
Peter Lindberg wrote:
I would like the shy entity to work as stated in the HTML 4.0 specification. When I researched this a while back, I found an article by Jukka Korpela that’s perhaps relevant to this discussion in some way.
Sascha Leib wrote:
Hello all,
I just checked the Soft-Hyphen test in IE 5 for MacOS X and the result was a worst-case scenario:
In this browser, soft hyphens are rendered as breves (i.e. accents on blanks) which is even worse than KHTML’s hard hyphen problem.
The problem from my side is, as long as there is no consistancy in the way browsers handle soft hyphens, I must not use them at all. Pitty.
/sascha
Pabini Gabriel-Petit wrote:
It seems like the best solution would be for all browsers to incorporate a standard hyphenation algorithm, so no hyphenation-data download or markup would be required. We should have a flag that we can set to turn automatic hyphenation on or off on individual Web pages though. Perhaps just the decision to use justified text would be sufficient.
It is not true that word processors do not have automatic hyphenation algorithms. Both Word and FrameMaker do, and they’re two of the most popular word processors.
BTW… Gorgeous Web site.
Roger S wrote:
Knuth’s hyphenation algorithm for TeX is in the public domain – it includes an exception list to handle the most basic contrary words. QuarkXPress, I believe, is one of the many commercial products that uses it in its base hyphenation. Why it’s not implemented in web browsers makes no sense.
Of course, you’ll either get re-cord when you want rec-ord or vice versa, but that’s what the soft hyphen should be for.
Chris Hoess wrote:
The Mozilla bug on soft hypens is bug 9101, but read comment 73 before you start adding comments.
As people have pointed out, automatic hyphenation is fairly tricky. TeX uses hyphenation tables and a list of exceptions for each language, and there are quite a few languages on the Internet. While obviously this isn’t impossible to implement, there’s a big performance cost involved, and browser developers aren’t going to be falling all over themselves to add code footprint and slow rendering speed for the relatively few web pages using justification. It’s too bad-I’d be ecstatic if we could get TeX-quality rendering in web browsers with CSS-but it will take a while.
Brian Schack wrote:
You are right, Tomas. AppleWorks 6 supports auto-hyphenation. I have heard that this has been around in much older versions of it.
francois wrote:
Here’s a test page for markup-significant entities including the soft hyphen:
http://www.fjordaan.uklinux.net/entities/entities_invisible.html
(The font dropdowns don’t work; my javascript abilities don’t extend that far.)
Yoz wrote:
I’ve put together a nasty but effective Javascript kludge to give Mozilla limited soft hyphen support:
http://cheerleader.yoz.com/archives/001889.html
oliver bunke wrote:
in safari 2.0 it looks similar to ie6 (of course smaller text-size).