Hi, I’m Erika Rowland (a.k.a. erikareads). Hi, I’m Erika. I’m an Ops-shaped Software Engineer, Toolmaker, and Resilience Engineering fan. I like Elixir and Gleam, Reading, and Design. She/Her. ← Constellation Webring → Published on September 19, 2023

Percent Encoding URLs

I use djot for the markup on my static site generator. In djot, the link syntax looks like this:

[link text](http://example.com)

This works fine for most links, but certain links that include parenthesis, it breaks. This is mostly a problem for Wikipedia, which uses parenthesis for disambiguation of topics. For example:

[Variety](https://en.wikipedia.org/wiki/Variety_(cybernetics)#Law_of_requisite_variety)

As written, this will break the link, since djot will parse the first closing ) as the end of the URL, which is not the complete link. This is because, by design, djot parses markup in linear time without backtracking.

Percent Encoding Parenthesis

This is where Percent Encoding comes in. Percent encoding is a method specified in the URI specification that allows for the encoding of arbitrary data in a URI.If you would like to read the details, they seem to be specified here in RFC 3986 Uniform Resource Identifier (URI): Generic Syntax.

Specifically, I needed to encode ) as %29.Technically, ( is also a reserved character and should be escaped, but I don’t run into any issues with djot with that character, so I leave it out here.

Using my link from before, that looks like this:

[Variety](https://en.wikipedia.org/wiki/Variety_(cybernetics%29#Law_of_requisite_variety)

If I use this markup: Variety, I correctly get the link to the Wikipedia article.

Elixir URI encoding

When I first learned about Percent Encoding, I quickly found the URI.encode/1 function in Elixir. To my dismay, it didn’t correctly escape parenthesis:

iex()> URI.encode("(hello)")
"(hello)"

Frustrated, I continued escaping my links for djot by hand.

Today, I re-read the documentation of URI.encode and found a line that I previously missedFull documentation here:

This function also accepts a predicate function as an optional argument. If passed, this function will be called with each byte in string as its argument and should return a truthy value (anything other than false or nil) if the given byte should be left as is, or return a falsy value (false or nil) if the character should be escaped. Defaults to URI.char_unescaped?/1.

The documentation for URI.char_unescaped?/1 explains that it is deliberately escaping the minimum required and purposely leaving reserved characters unescaped:

Checks if character is allowed unescaped in a URI.

This is the default used by URI.encode/2 where both reserved (char_reserved?/1) and unreserved characters (char_unreserved?/1) are kept unescaped.

It also hints at a better predicate function for URI.encode, URI.char_unreserved?/1, which escapes the parenthesis like I need:

iex()> URI.encode("(hello)", &URI.char_unreserved?/1)
"%28hello%29"

Takeaways

djot’s linear parsing forces it to assume that the first ) it encounters is the end of a URL.

Using percent encoding, I can encode ) as %29, which solves the djot issue while allowing the browser to route the URL correctly.

Elixir’s URI.encode doesn’t escape parenthesis by default, but does allow an alternate predicate function that does.I wrote a short script that I can call using kakoune’s pipe command. Since it has no dependencies, the startup time is minimal, but I may rewrite it in a faster startup time language later.

← Constellation Webring →