Parsing TSVs in Go
I’m building a terminal application to learn Go, and I wanted to persist data using Tab Separated Values. Here are some of the things I learned while figuring out how to do that.
CSV Reader in Standard Library
Go has a CSV reader in their standard library. You can import it like this:
import (
"encoding/csv"
)
Once it’s imported, you can create a CSV Reader
by calling csv.NewReader
on an io.Reader
I learned about reading CSVs in Go from Go Samples.:
file, err := os.Open("my.tsv")
if err != nil {
log.Fatal(err)
}
// remember to close the file at the end of the program
defer file.Close()
csvReader := csv.NewReader(file)
Since I want to read Tab-Separated Values instead of Comma-Separated Values,
I can set the Comma
used by my CSV Reader to be a tab character:
csvReader.Comma = '\t'
I don’t expect my CSV to be memory intensive,
so I can read it all into memory with csvReader.ReadAll()
:
data, err := csvReader.ReadAll()
if err != nil {
log.Fatal(err)
}
Higher Order Functions in Go… Kinda
Now I have TSV loaded in memory,
but csvReader.ReadAll()
returns a slice of records of type []string
.
If I want to convert these record into structs,
I’ll need to parse the records myself.
I’m used to working with functional languages,
where I would take the list of records
and use map
to transform them into structs.
Generics
Go doesn’t have a builtin map
function,
but we can build one using generics:I learned how to build a generic map function from ZetCode.Note we have to call the function map2
because map
is reserved for the hashmap in Go.
func map2[InputType, OutputType any](
data []InputType, f func(InputType) OutputType) []OutputType {
result := make([]OutputType, 0, len(data))
for _, element := range data {
result = append(result, f(element))
}
return result
}
This function takes a slice of InputType
and a function that takes InputType
and returns any
OutputType
,
iterates over the slice building a return slice by calling the function on each element before appending.
Because Go supports Generics, we can vaguely specify that the input must be of a type, but not worry about which type that is until we see the input data. Same goes for the output type, it’s entirely determined by the specific function that’s passed in as the second argument.
Prior to learning about this version of map2
,
I didn’t know that Go had an any
type.
It’s perfect for a situation like this,
where we don’t care what type the function returns, as long as it’s consistent.
Slices
One thing that was new to me coming to go is how Go handles slices
.
Elixir uses immutable lists, but Go seems to use allocated arrays called slices.Question: Why are they called slices? Does Go have something called arrays?
The make
function from before takes a type, a starting size, and a capacity,
and allocates and returns an initialized object of that type.
In the case of map2
, that object is a slice of type OutputType
.
Then we call append
,
which will take an allocated slice of a type,
and items of that type, and append those items to the end of the slice.
If that slice still has sufficient capacity to accommodate the new items,
it will reslice to include the new items.
Otherwise, a new array will be allocated.The documentation for append
refers to arrays, which implies that arrays are distinct from slices. Based on the way this function is written, I would guess that arrays refer to the physical memory that has been allocated, and that slices are a reference to that allocated array, which may not be the whole array. I’ll need to do further reading to confirm.
Parsing at last
Now that we have our map2
function we can invoke it like this:
result := map2(data, func(s []string) Bookmark {
return Bookmark{Name: s[0], LastRead: s[1], Chapter: s[2]}
})
Here we define an anonymous function that takes a record of type []string
,
and returns a struct of type Bookmark
.
The result
here will be a slice of type Bookmark
,
with each Bookmark
containing the data from the relevant row from the TSV.