Hi, I’m Erika Rowland (a.k.a. erikareads). Hi, I’m Erika. I’m an Ops-shaped Software Engineer, Toolmaker, and Resilience Engineering fan. I like Elixir and Gleam, Reading, and Design. She/Her. ← Constellation Webring → Published on October 31, 2023

When is an Erlang process a shell?

Recently, I was trying to port an Erlang function to Elixir, with the goal of answering one question: “When is an Erlang process a shell?”

Answering that question involved source diving Erlang, learning new things about git, and reading mailing lists.

The Original Code

The code I wanted to port was this functionErlang’s syntax might seem a bit weird if you haven’t been exposed to it before. Erlang was originally implemented in prolog, which inspired much of the syntax.Variables in Erlang are UpperCase, which makes them easy to pick out of code once you’re used to it, but opposite of the norm in Algol inspired languages.The Prolog inspiration also led to a “sentence-like” structure, with commas, semicolons, and periods. Note that this function end with a . character.:

is_shell(ProcessId) ->
  case erlang:process_info(ProcessId, group_leader) of
    undefined -> false; %% process is dead
    {group_leader, Leader} ->
      case lists:keyfind(shell, 1, group:interfaces(Leader)) of
        {shell, ProcessId} -> true;
        {shell, Shell} ->
          case erlang:process_info(Shell, dictionary) of
            {dictionary, Dict} ->
              proplists:get_value(evaluator, Dict) =:= ProcessId;
            undefined -> false %% process is dead
          end;
        false -> false
      end
  end.

I can follow along here, we’re retrieving something called group_leaderErlang and Elixir have the concept of atoms. Atoms are equal to only themselves, and their value is exactly their name. This seems pretty useless until you consider pattern matching.I can use a pattern like {:my_atom, value} to match on the :my_atom as a key and assign the second tuple item to the variable value.Note that in Elixir atoms are typically denoted with a prefixed colon, like :this. In Erlang, atoms are lower case with no other markings, like this, which works because variables are upper case in Erlang. from the process’s metadata. The equivalent function call in Elixir would be Process.info(pid, :group_leader).

What is a group_leader? There was sparse information about group leaders in the Erlang documentation, but I found a mailing thread response that said this:Note, the original source uses outdated terminology for origin and peer, the peer module superseded the previous module in OTP 25.0. Original source here.

Group leaders let you redirect I/O to the right endpoint. It is one of the few properties that is inherited by one process to the other, creating a chain.

By default, an Erlang node has one group leader called ‘user’. This process owns communication with the stdio channels, and all input requests and output messages transit through that one.

Then, each shell you start becomes its own group leader. This means that any function you run from the shell will send all its IO data to that shell process.

A group leader gives us a sense of where each process belongs to, and since each shell has its own group leader is_shell/1 needs to unpack this information to find whether it’s a shell.

This answers the question about group leaders, the next curiosity in the function is group:interfaces which is called on the group leader process id.

`group:interfaces`!?

So the first thing I did was to extract the group leader of my Elixir IEx shell, and try to call :group.interfacesIn this story I’m switching back and forth between Erlang and Elixir syntax. Know that module:function/arity in Erlang is equivalent to :module.function/arity in Elixir.Elixir has first class support for calling Erlang functions, and this manifests as similar but subtly different syntax. on that process id.

Just one problem, IEx raises an UndefinedFunctionError, telling me that it doesn’t exist. That’s okay, it’s been 11 years since the Erlang code was written, perhaps that function was taken out.

So the next thing I try is to repeat the same experiment directly in the Erlang shell erl. I open a new terminal, invoke erl, get the erlang:process_info(self(), group_leader). I pass this process id to group:interfaces(GroupLeaderPid)., and I get a valid list of interfaces!?

I’m confused for a short while, before remembering that I use nix-shell to manage my Elixir version. So when I ran the :group.interfaces invocation in IEx, it was in OTP 26, erts-14.1. However, when I ran group:interfaces in erl, it was in OTP 25, erts-13.2.2. Something must have changed in the group module between OTP 25 and OTP 26.erts stands for the Erlang Run-Time System Application. Erlang versions its applications separately from the OTP as a whole, leading to the two version numbers I shared.

Git Spelunking

In order to figure out what what was going with group I needed to turn back time.Thanks to Jeff for the turn of phrase. group is an internal module in Erlang, and lacks external documentation. So I needed to explore the source code directly.

Thankfully, git gives us an easy way to checkout old code. Running git checkout OTP-25.0 on the otp repository drops me into the OTP code as it was at that release.

A quick ripgrep for interfaces/1 leads me to lib/kernel/src/group.erl and interfaces/1:In the middle of this process, I confused user_drv:interfaces with group:interfaces and convinced myself that it had mysteriously changed from a function that didn’t do what I needed to a function that worked between OTP 25.0 and OTP 25.3. I had no basis for why this change might have happened or why it happened so late after the function was created, when it’s clear that 11 years ago in the function that starts this article, it was used in the same way as OTP 25.3.Specifically, I think I got lost because I had backgrounded my text editor in user_drv.erl, and when I checked out OTP 25.0 the interfaces function appeared.

interfaces(Group) ->
    case process_info(Group, dictionary) of
	{dictionary,Dict} ->
	    get_pids(Dict, [], false);
	_ ->
	    []
    end.

get_pids([Drv = {user_drv,_} | Rest], Found, _) ->
    get_pids(Rest, [Drv | Found], true);
get_pids([Sh = {shell,_} | Rest], Found, Active) ->
    get_pids(Rest, [Sh | Found], Active);
get_pids([_ | Rest], Found, Active) ->
    get_pids(Rest, Found, Active);
get_pids([], Found, true) ->
    Found;
get_pids([], _Found, false) ->
    [].

In interfaces it first extracts the process dictionaryJoe Armstrong in his book, “Programming Erlang, 2nd Edition” says this: “Each process in Erlang has its own private data store called the process dictionary. The process dictionary is an associative array (in other languages this might be called a map, hashmap, or hash table) composed of a collection of keys and values. Each key has only one value.”He continues: “Note: I rarely use the process dictionary. Using the process dictionary can introduce subtle bugs into your program and make it difficult to debug.” then it tail recursively extract tuples that contain user_drv or shell as their first element.Erlang has ad-hoc polymorphism, which it uses here in get_pids to succinctly pattern match on the shape of the arguments passed in.

In our original function, we only care about the shell, but otherwise I now knew enough to port is_shell/1 into Elixir:

Port of function into Elixir

In this version, I skipped past implementing interfaces and simply pulled the :shell key out of the process dictionary of the group leader:In Elixir it’s convention to add a ? to the name of a function that returns a boolean.Joe Armstrong’s comment about the process dictionary implies that it is a keyword list, but most of the Erlang functions I see interacting with it treat it as a proplist. These two data structures are similar, both are built on top of the Erlang list.A keyword list, is a list of 2-tuples, with the first element being an atom key, and the second being a value for the key. A keyword list is a valid proplist, but a proplist has a shortcut where an atom key by itself stands in for {atom, true}, this breaks all of Elixir’s Keyword functions, that presume a well-formed keyword list.Keyword List Example: [{:key1, "value"}, {:key2, 5}, {:key3, true}]Proplist Example: [{:key1, "value"}, {:key2, 5}, :key3]In this code snippet, I use the pin operator ^. In Erlang, variables can only be assigned once, so after their first assignment, you can use them to pattern match.Elixir allows variable shadowing, meaning that you can reuse a variable name. In order to do pattern matching on a variable’s contents, instead of reassigning it, Elixir added the pin operator ^ to “pin” the value of the variable for the pattern match. Learn more here.

def is_shell?(pid) do
  case Process.info(pid, :group_leader) do
    :undefined ->
      false

    {:group_leader, leader} ->
      {:dictionary, leader_dict} = Process.info(leader, :dictionary)

      case :proplists.get_value(:shell, leader_dict) do
        ^pid ->
          true

        shell when is_pid(shell) ->
          case Process.info(shell, :dictionary) do
            {:dictionary, dict} ->
              :proplists.get_value(:evaluator, dict) === pid

            :undefined ->
              false
          end

        :undefined ->
          false
      end
  end
end

Refactor to use `with`

The nested case statements in the previous function can be refactored using Elixir’s with statement for a cleaner function:

def is_shell?(pid) do
  with {:group_leader, leader} <- Process.info(pid, :group_leader),
       {:dictionary, leader_dict} <- Process.info(leader, :dictionary),
       {:shell, ^pid} <- :lists.keyfind(:shell, 1, leader_dict) do
    true
  else
    {:shell, shell} ->
      with {:dictionary, dict} <- Process.info(shell, :dictionary) do
        :proplists.get_value(:evaluator, dict) === pid
      else
        _ -> false
      end

    _ ->
      false
  end
end

Shell improvements OTP 26

Earlier, I found that group:interfaces/1 was missing from OTP 26. This seems to be connected to the OTP 26 shell improvements.

It’s worth looking at group:whereis_shell/0 as a comparison with my is_shell?/1 function:

`group:whereis_shell`

The OTP 26 implementation of group:whereis_shell contains this snippet:

GroupPid ->
  {dictionary, Dict} = 
    erlang:process_info(GroupPid, dictionary),
  proplists:get_value(shell, Dict)

Which is nearly identical to my pre-with refactor implementation in Elixir. Which means I’m probably on the right track.

`group:start_shell/1`

I also wanted to call attention to this comment above group:start_shell/1:

%% start_shell(Shell)
%%  Spawn a shell with its group_leader from the beginning set to ourselves.
%%  If Shell [is] a pid the[n] set its group_leader.

Which confirms what we learned earlier about group leaders and how they relate to shells.

That would have been the end of the story, until I read the comment above the is_shell/1 function in the original codebase:

%% Theoretically pman_process:is_system_process/1 should say true for
%% the shell.  Well, it doesn't, so this is a workaround until it
%% does.

What is pman_process? Unlike :group.interfaces/1, the function isn’t missing, the module is gone. This led me down a rabbit hole that required learning how to bisect a git repo:

What is `:pman_process`?

A search for “Erlang pman” turns up a process viewing tool from an ancient version of Erlang. The screenshots of the interface look like they’re right out of Tab Window Manager.I briefly used twm when I used OpenBSD on the desktop, but I preferred cwm.

After a lot of digging, I found that PMan was a process viewing application built into Erlang, whose behavior was absorbed by Observer. PMan was written on the Graphics System (GS) application. GS was superseded by wxwx is an Erlang port of wxWidgets a cross-platform GUI library. for graphical applications for Erlang.

PMan and the GS related backends were removed in OTP 17.0. If I checkout a version of OTP prior to OTP 17, I can finally find what pman_process is doing. Prior to its removal, it was located in lib/pman/src/pman_process.erl.

The module includes a purpose comment:

A front-end to the erlang:process_info() functions, that can handle processes on different nodes in a transparent way.

Also some convenience functions for process info, as well as some application specific functions for process classification.

This purpose explains why it might contain an is_system_process/1 function.

One Last Mystery

pman_process contains a few -define calls that declare module constants that are lists of process names and function signatures that mark something as a “system” process.

Where does this list come from? Why is hard coded? How do you determine what needs to be on these lists?

git blame gives me no answers, the code for pman_process was added to the OTP git repo in its initial commit in 2009. And it remained untouched until it was removed prior to OTP 17.0.

← Constellation Webring →

When is an Erlang process a shell?

The Original Code

group:interfaces!?

Git Spelunking

Port of function into Elixir

Refactor to use with