Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Semantic Linefeeds (2012) (rhodesmill.org)
46 points by Tomte on Jan 22, 2023 | hide | past | favorite | 29 comments


I took up a similar style for most of my writing perhaps a year ago, with a strong preference for breaking at punctuation. (I even sometimes do it to a more limited extent in handwriting.) This has influenced my writing style, which naturally inclines to more complex structures, to favour shorter clauses—though certainly I still end up with plenty of non-punctuation-adjacent line breaks.

My greatest annoyance with the style (which honestly is the sole reason I didn’t take it up a few years earlier) is bad handling of em dashes, of which I am fond. In normal English typesetting, there is a line-breaking opportunity before and after the dash, but no space; but in all markup languages that I know of that support soft breaks, the line break is equivalent to a space. Well, actually this is no longer normative for HTML (see https://www.w3.org/TR/css-text-3/#line-break-transform, which shows the problem for CJK, which doesn’t use spaces between words), but for now, all user agents still implement it as “line break → space”. Consequently, when writing in these markup languages, I can’t break the line after an em dash as I would like to, because it would change the output.

One of the features of the lightweight markup language I’ve been designing, then, is the ability to control soft break behaviour in order to correctly handle at least CJK and em dashes that were not otherwise paired with a space. (And I’m curious if anyone has similar cases not well-served by current rules.)

—⁂—

(If you’re not sure what I’m talking of: source:

  Example: an em dash—
  like this.
Expected result:

> Example: an em dash—like this.

Actual result in the likes of HTML, Markdown and reStructuredText:

> Example: an em dash— like this.


> One of the features of the lightweight markup language I’ve been designing, then, is the ability to control soft break behaviour

What does your design for that feature look like?


I’m not certain yet; although I’ve been writing all my own stuff in this language for months now, my actual implementation is mostly unfinished. (The only piece that’s fully finished is list markers, which are based on CSS Counter Styles Level 3 plus parsing, so you can use e.g. cjk-decimal or bengali numbering in the source and it’ll retain that style in the output, correct values and all.)

For this particular aspect, I’m still undecided on how best to actually implement it. The most likely approach is to hard-code rules (possibly in a couple of groups, e.g. “CJK” and “other”), with the ability to opt into or out of them as part of dialect configuration (which is a bit like how you can define custom roles or change the default role in reStructuredText, but more general, able to change more aspects of the language’s syntax—things like change *…* to be something other than italic, make ~…~ strikethrough, define new counter styles). You could also generalise some form of declarative line-break-collapsing rule like “if preceded and succeeded (after whitespace trimming) by a character with Unicode property East_Asian_Width ∈ {Fullwidth, Halfwidth, Wide}” or “if the preceding line matches /(?!< )—$/” or “if preceded by U+200B (ZWSP)” (this last example borrowing from https://www.w3.org/TR/css-text-3/#line-breaking, basically applying the formatting rules in reverse for parsing, similar to what I’m proposing with the em dash and doing with Counter Styles; when laying out, ZWSP introduces a line break opportunity without adding a space, so when parsing ZWSP and a line break, you clearly shouldn’t add a space). But unless I can be convinced of actual value in generalising it, giving just one or two switches is likely to be the most sensible implementation, for complexity (this will probably be an optional feature, incidentally), manageability and performance.


Shouldn't an em dash be surrounded with spaces on each side anyway?


Not in English. The two equivalent English alternatives -- as far as I know -- are the en dash surrounded by spaces---or the em dash with no space. I actually like the look of the em dash with no space better, but for the reasons mentioned in the GP I end up using the en dash anyway.

(Another reason is that Swedish, which I also write, only has the en dash, so its nice that I can be consistent.)


i've read a lot of books professionally typeset in english that either do or don't space around em dashes — used to denote pauses, in the way we are discussing here. my preference is to space around them, but i prefer using \u2009 'thin space' instead of a full word space. unfortunately hn rewrites that to a full word space.


I don’t think I’ve ever seen a professionally-set work using spaces around em dashes. The two popular conventions are space–en-dash–space (“a – b”) and em-dash (“a—b”). I only recall ever encountering space–em-dash–space a few times (one of which was within the last few days, and the previous was probably years ago), in amateur work. THIN SPACE would certainly make me much more amenable to it, but frankly that suggests a font problem (and I do certainly know of fonts where the em dash spacing is awful and if forced to use that font I would insert THIN SPACE (or very occasionally NARROW NO-BREAK SPACE) every time).


i think space-em-dash-space was pretty common in both journals and monographs on engineering, science, and math in the 01940s to the 01970s, including some that include really excellent typesetting, though i don't have an excerpt ready to hand to post (and hn isn't really a usable medium for discussing visual things anyway)

i don't remember ever seeing it in fiction

i think of space-en-dash-space as just being an error, and i'm pretty sure i wasn't just misidentifying en dashes with spaces around them as em dashes


> 01940s to the 01970s

Out of interest, why leading zeros?


Might the em dash convention be a byproduct of the printing press and fixed-width layouts?


not in a way that is obvious to me


The spaced em dash occupies 10% more space.

The traditional syntax is particularly suited for newspapers who valued typographic density.


oh, i see what you mean now

but wouldn't that still be the case if you were handwriting your newspaper and xeroxing it or something

where does the printing press come in

also though i don't think fine typography is mostly determined by newspapers


"Em and en refer to units of typographic measurement, not to the letters M and N. (Yes, the homophony is confusing. To disambiguate, loud print shops referred to them as mutton and nut.) In a traditional metal font, the em was the vertical distance from the top of a piece of type to the bottom. The en was half the size of the em. Originally, the width of the em and en dashes corresponded to these units."

https://practicaltypography.com/hyphens-and-dashes.html

Handwriting allows one to vary the point (width) of individual letters. Printing presses do not afford that luxury.


sure, but you can still leave horizontal space around your dashes—or not

what we nowadays call 'microtypography' and think ourselves very avant-garde for employing is ubiquitous in medieval illuminated manuscripts; every line is full of subtle variations in letterforms to better fit the available space



If you prefer this in the form of a Creative Commons-licensed spec for some reason, https://github.com/sembr/specification


I have my résumé written in LaTeX and format it like this, mostly so that I can comment or un-comment specific bullet points depending on what’s relevant to a certain job application, rather than maintaining multiple versions.


A trove of links in that, a pdf-hoarder's delight. Some of the linked pages appear to be in Japanese, though scrolling down on those reveals a fair amount of English content, for those who don't grok Japanese.


Or write everything in one line

It is annoying when you search for two words and they are not found, because they are on different lines

We just need better diff/merge tools that can handle text without line breaks. wdiff is installed everywhere for in-line diffs, but no one seems to maintain it. There have been patches sitting around for years: https://savannah.gnu.org/patch/?group=wdiff


This is something I've attempted to adopt at a few times, but I come back to editing the text closer to the format in which I'll read it.

Whenever I try to analyse my reasons for this, I find they're all bogus except perhaps one: with fewer linebreaks, there's a higher density of stuff on screen, which means I can refer back to more text as I'm writing. I don't think this is a particularly good reason in the age of marks, though.


probably in 01974 with an asr-33 printing 10 characters per second†, where you also had the ability to tear off an unlimited amount of scrollback and arrange it on your desk wherever you wish, the ergonomic considerations were somewhat different

______

† previous versions of this comment erroneously stated higher (and implausible) data rates


cf https://vanemden.wordpress.com/2009/01/01/ventilated-prose/

(one of the early hints that Donald Knuth was of the tribe which would become known as "geek" is that as a schoolchild he loved diagramming sentences)


The lengths to which people go to accommodate line-based diffing algorithms instead of creating better diff tools.


Breaking lines at logical boundaries in the "source code" makes it easier for me to see the structure of the text. It also gives me the ability to see the text in two different layouts; this often lets me spot errors which would otherwise slip by.


I picked up the habit of putting a line feed after most puncuation from writing troff(mandoc actually) manual pages, where it is almost required. That it also works in html and markdown is nice.

The main advantage is that you get really nice diffs.


Or just use

  git diff —-word-diff



As far as I can tell, that's more-or-less her position. Code has semantic breakable-points throughout lines of code, and she's advocating breaking.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: