notes

language design

Not sure if we want to support multiple trailing blocks... probably? might be useful for certain "environment" situations.

some care is needed for verbatim environments; I think there's no alternative to explicit begin/end tokens. we can have something like \rawbegin_foo ... \rawend_foo where foo is just any alphanumeric thing.

on CFF subsetting

So we got OpenType-CFF subsetting mostly working, but it doesn't work consistently in all viewers. So, instead of embedding the whole OTF, we only embed the raw CFF data. This follows what TeX does, which means it's probably fine.

on font size optimisation

There are quite a number of potential size savings that we haven't taken advantage of, in increasing order of complexity:

remove the vhea/vmtx table if we did not use the font in vertical writing mode
chopping off glyph ids after the last one that we use, letting us:
- cut the size of the loca table (esp for large fonts that need 4 bytes per entry)
- cut the size of the hmtx table
renumber the glyphs entirely:
- completely eliminates the bulk of the cmap table
- also cuts loca and hmtx dramatically
- also drastically shrinks the size of the text itself (since we no longer need 2 bytes for glyph ids)

on unicode (de)composition

The problem is that unicode combining characters let people write arbitrary things that combine with arbitrary other things. Normally, this isn't a problem (if your font doesn't have it, then it won't show).

However, combining diacritic marks are a (potentially common) thing, but fonts (eg. Myriad Pro) might not actually contain a glyph for the diacritic mark --- instead having one glyph for each (valid) combination of base character + mark.

This means that, for maximum flexibility, if a font doesn't contain a glyph for a combining character, we should attempt to compose the codepoint, see if the font contains a glyph for that codepoint, and if so replace it. The conerse is also true --- if for some reason the font has the combining character glyph but not one for the composed form, then we should attempt to decompose the codepoint.

update: it probably makes more sense to maximally compose codepoints (i think we can do that with utf8proc), since we want to prefer an "actual" glyph eg. for an accented character if that exists. If that fails, then we should try the decomposed version, and if that also fails, .notdef time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

notes.md

notes.md

notes

language design

on CFF subsetting

on font size optimisation

on unicode (de)composition

Files

notes.md

Latest commit

History

notes.md

File metadata and controls

notes

language design

on CFF subsetting

on font size optimisation

on unicode (de)composition