Using Regex in Erlang
Recently, I was helping someone port some code to Erlang that involved a Regex. I hadn’t worked with Regex in Erlang before, so here are some notes on what I learned.
re
- Perl-like regular expressions for Erlang
The Erlang standard library has the re
module, which supports regular expression matching for Erlang strings and binaries.Both Gleam and Elixir use String to mean an Erlang binary. The default string type in Erlang is sugar over a linked list of code points.
The re
module provides three main functions: replace
, run
, and split
, which all operate on string-like inputErlang uses the iodata
type to represent a generalization on binaries that can be more efficiently worked with, all three of these functions take an iodata
or charlist
. and a regex pattern.
Regex matching can be done with run
like this:
> re:run("my string", "[myexpression]").
{match,[{0,1}]}
By default run
returns all captured parts of the input, as a list of {Offset, Length}
pairs. It also only returns the first match, not all matches.
To return the captures as strings, I can use the {capture, all, list}
option:
> re:run("my string",
"[myexpression]",
[{capture, all, list}]
).
{match,["m"]}
To get all of the matches, I can use the global
option:
> re:run("my string",
"[myexpression]",
[global,{capture, all, list}]
).
{match,[["m"],["y"],["s"],["r"],["i"],["n"]]}
Named Captures
re
supports Perl-style named captures, which look like this:
Pattern1 = "(?<myname>capture)".
It can used by changing the ValueSpec
component of the capture option:There isn’t a built in way to associate names with captures. Elixir provides a named_captures
function to easily do this.From reading the source code of the Elixir implementation, it should be possible to combine re:inspect
with lists:zip
to get a list of {Name, Capture}
pairs.
> re:run(
"my capture of my capture",
Pattern1,
[
global,
{capture, ["myname"], list}
]
).
{match,[["capture"],["capture"]]}
Compiling An Expression
Erlang provides a compile
function to compile a regular expression for re-use throughout the lifetime of a program:
Compiling the regular expression before matching is useful if the same expression is to be used in matching against multiple subjects during the lifetime of the program. Compiling once and executing many times is far more efficient than compiling each time one wants to match.
A regex can be compiled into a pattern for use with compile
:
{ok, Pattern2} = re:compile("[myexpression]").
The used like any other regex pattern:
> re:run(
"my string",
Pattern2,
[{capture, all, list}]
).
{match,["m"]}
Takeaways
The Erlang regular expression module is a bit difficult to use, but it will be nice if I have need of a zero-dependency regex module when using Erlang.