In the last post I mentioned how I’ve gotten back into
Haskell over the last months. Come December and time for another round of The Advent of Code. I haven’t been participating in this programming challenge series the last few year. However, since this is an excellent chance to apply
Haskell to new problems I decided to start follow the challenges. I have thoroughly enjoyed the ones I have attempted thus far, even though I’ve not been able to keep up with the daily challenges. They often end up taking a surprising amount of time, perhaps also due to me using
One common denominator with all the challenges in the
Advent of Code is the requirement to parse input data. Parsers may be a “solved problem”, but there are many ways to approach parsing when faced with challenges like these, ranging from pragmatic uses of regular expressions and ad-hoc string manipulation, to fully-specified grammars and parser generators.
Coincidentally, parsing is one of the areas in which
Haskell truly shines, especially in the domain of parser combinators. There are multiple parser combinator libraries for Haskell, yet I’m not going to focus on any of them here. For the
Advent of Code, I didn’t want to stray too far away from the default
base package, and to my joy I found that there actually is a quite capable parser combinator module included in
A parser combinator is a higher-order function which accepts parsers as input, returning a new parser as output. A combinator is a higher-order function that uses only function application and earlier defined combinators to define a result from its arguments1.
So, in short, a parser combinator library lets you construct complex parsers by composing smaller, “atomic” parsers. Here’s an example before we get into the details:
This is an example of a
ReadP parser which parses and yields a
String containing only digits.
many1 is an example of a combinator:
λ: :t many1 many1 :: ReadP a -> ReadP [a] λ: :t satisfy satisfy :: (Char -> Bool) -> ReadP Char
many1 repeats a given parser one or more times, yielding a list of the result, while
satisfy yields a single
Char if it passes a predicate function (
isDigit in our example).
Demystifying some types
ReadP is our parser type. Its actual internals are out of scope of this post, but it can help our intuition by imagining it encapsulates stateful “stuff” like the input along with the cursor location within the input, backtracking/lookahead information, and so on. We’ll treat it as an opaque data type, but here’s what
GHCi has to say about it (with some details removed):
λ: :i ReadP newtype ReadP a = Text.ParserCombinators.ReadP.R (...) instance Applicative ReadP instance Functor ReadP instance Monad ReadP
We don’t get to look at what happens within
ReadP, but we need a way to actually run our parser on a piece of input. To do this, we can use the aptly named
λ: :i readP_to_S readP_to_S :: ReadP a -> ReadS a
We don’t get much wiser from its type signature, so let’s recurse yet another turtle2:
λ: :i ReadS type ReadS a = String -> [(a, String)]
ReadS is just a type alias for a function accepting a
String and returning a list of matches along with the remaining, unmatched input. We can expand
ReadS in the type signature of
readP_to_S to illustrate this a bit better:
The reason the return value is a list, is because an input could result in multiple successful parses, and with varying lengths of matching input. This interface can be unexpected if you’re used to parsers either failing or succeeding with a single result.
Let’s give it a spin with the atomic
get parser, consuming a single character from the input:
λ: :t get get :: ReadP Char λ: readP_to_S get "foo" [('f',"oo")]
Since our simple parser only consumes a single character, there’s only one successful parse, and the unprocessed input is returned as the second tuple element. Let’s try out
λ: readP_to_S parseDigits "foo"  λ: readP_to_S parseDigits "123" [("1","23"),("12","3"),("123","")] λ: readP_to_S parseDigits "123foo" [("1","23foo"),("12","3foo"),("123","foo")]
We’re getting multiple successful parses with
many1 since we’re not restricting our parser (
parseDigits in particular) to parse until the end of input. Skipping ahead, here’s how we can use the
Applicative instance of
ReadP and the
<* operator to combine
parseDigits with the
λ: readP_to_S (parseDigits <* eof) "123" [("123","")]
Let’s get Funky!
We’ve already seen glimpses of how we combine predefined parsers from
ReadP into more complex parsers. Several of the
ReadP parsers accept other parsers as input, hence the combinator-part.
parseDigits uses only function application to construct a parser of digits from
satisfy with a predicate. There’s one major flaw with
parseDigits though, it doesn’t actually yield us a number!
One way to go at this, is to do our parse, then map a constructor accepting
String over the parse result:
λ: map (readInt . fst) $ readP_to_S parseDigits "123" [1,12,123]
Unfortunately this is both clunky, and won’t scale well when we want to expand our parser to construct more complex data structures. However, recalling the type of
readP_to_S we can see that indeed, it accepts parsers which are polymorphic in their return values:
This means we can create parsers which directly yield the data types we desire. In order to do that though, we need to familiarize ourselves with some of the
ReadP typeclass instances, mainly:
Applicative. There are also instances for
ReadP has a
Functor instance, our intuition should tell us that it should be quite possible to
read over our parser to convert our parser result type:
λ: :t readP_to_S parseInt "123" readP_to_S parseInt "123" :: [(Int, String)] λ: readP_to_S parseInt "123" [(1,"23"),(12,"3"),(123,"")]
If we’re concerned about the (un)safety of
read, we can choose to use
which gives us:
λ: readP_to_S parseMaybeInt "123" [(Just 1,"23"),(Just 12,"3"),(Just 123,"")] λ: readP_to_S (parseMaybeInt <* eof) "123" [(Just 123,"")]
2D coordinates are a frequent source of input in the
Advent of Code challenges. The way we like to represent a 2D coordinate in code is using a
Tuple of two elements, and the elements being the
y position of the coordinate, or point. Points are often serialized using surrounding parenthesis, a comma separating the two parts, and optional whitespace:
(1,2) (-10, 100) ( 5, -42)
We do not yet have the tools we need in order to create parsers for types which are constructed from multiple arguments, like
Tuple data constructor).
Functor only allows us to map over a (single) parse result to yield another type. Granted, we could give up on type safety and parse our input into substrings, which we again validate piece by piece to construct our values. But we don’t accept compromises like these, do we? No we don’t.
Applicative extends our toolbox with the ability to lift n-ary data constructors and functions into the world of
ReadP, allowing us to construct more complex types:
λ: readP_to_S ((,) <$> parseInt <*> parseInt) "123" [((1,2),"3"),((1,23),""),((12,3),"")]
We’re now getting tuples out of our parser, although we’re not parsing actual tuples yet. Also note that our parser is ambiguous. The result of the parser should be a tuple, but it’s equally valid to create a tuple of the first and second digit, as is splitting
y between the second and third digits. The input format (and thus our parser) must change to specify where one coordinate component ends and the other begins. We use static delimiters “(”, “)”, and “,” for this.
To match static input
λ: :t char char :: Char -> ReadP Char λ: :t string string :: String -> ReadP String
Both of these parsers accept a character (or string), resulting in a parser which yields the same value if the input matched successfully:
λ: readP_to_S (char 'a') "foo"  λ: readP_to_S (char 'f') "foo" [('f',"oo")] λ: readP_to_S (string "bar") "foo"  λ: readP_to_S (string "foo") "foo" [("foo","")]
We can combine
Applicative to parse tuples (in a strict manner, without any whitespace):
and applied to some input:
λ: readP_to_S parseCoord "(1,2)" [((1,2),"")]
*> operators just discard the result of the parsers on the right and left hand side, respectively. The arrows “point” at the part of the sequence whose value will be returned.
There are two issues with our coordinate parser: it’s quite strict in the way it supports no whitespace, and it does not support negative values for the
To add whitespace support, we can use the operators from
Applicative in combination with the provided
skipSpaces parser. Let’s update
parseInt to consume whitespace surrounding a number:
parseCoord should be updated to use
Not much change required at all, besides renaming a function reference! We’re now able to parse coordinates with whitespace:
λ: readP_to_S parseCoord "( 1 , 2 )" [((1,2),"")]
Finally, in order to support negative numbers prefixed with “-” we need to change
parseInt yet again. Using the provided
option parser we can add support of an optional prefix of “-”:
Note that we have to
fmap the list cons operator to prepend the sign to the resulting list of digits.
Int also supports whitespace around the number, which means
option can yield a blank space character if there is no “-” in front of the number.
option takes, along with a parser, a default value which it yields if the parse is not successful:
λ: :t option option :: a -> ReadP a -> ReadP a
Now we can parse negative numbers too!
λ: readP_to_S (parseSignedInt <* eof) "-123" [(-123,"")]
We’ve already mentioned that
ReadP has a
Monad instance, giving us access to
do notation. This allows a bit more flexibility and write parsers in a somewhat more imperative style. Let’s do a rewrite of our tuple parser:
Note that we can use the Monadic sequencing operator
>> and Applicative sequencing operator
*> interchangeably. While in a
do notation it might be more consistent to stick with the monadic operators.
One of the benefits of using the
Monad instance for
ReadP is that it simplifies sequencing parsers where later parts of the parser depends on earlier matched input. For instance, we might want to parse a piece of input which starts with a number of elements to parse, followed by the elements themselves:
We use the
count parser to repeat a given parser
λ: :t count count :: Int -> ReadP a -> ReadP [a]
This parser first reads a number
n stating how many elements to parse, then proceeds to parse
n numbers separated by newline:
λ: readP_to_S parseCountInts "2\n1\n2\n3\n4\n5" [([1,2],"3\n4\n5")] λ: readP_to_S parseCountInts "3\n1\n2\n3\n4\n5" [([1,2,3],"4\n5")] λ: readP_to_S parseCountInts "4\n1\n2\n3\n4\n5" [([1,2,3,4],"5")]
The parser does not proceed to process input beyond the number of elements we specify.
Functional programming in Haskell centers around breaking down problems into smaller, independent pieces. Then using various means of composition to combine these pieces into a working solution. Parser combinators are yet another example of how we can achieve proper reusability in Haskell. Recall how we reused
parseInt (and its derivatives) to create more complex parsers, which again could be composed to create even larger parsers.
I really encourage you to have a go at using either
ReadP or the many parser combinator libraries available. Some focus on performance and speed, others on diagnostics and error reporting.