The ParserComb structure

The ParserComb structure provides parser combinators over character readers. This structure is modeled after the Haskell combinators of Hutton and Meijer. The main difference is that they return a single result, instead of a list of results. This fact means that the or combinator is a committed choice; once one branch succeeds, the others will not be enabled. While this property is somewhat limiting, for many applications it will not be a problem.

Synopsis

signature PARSER_COMB
structure ParserComb : PARSER_COMB

Interface

type ('a, 'strm) parser =
      (char, 'strm) StringCvt.reader -> ('a, 'strm) StringCvt.reader

val result : 'a -> ('a, 'strm) parser

val failure : ('a, 'strm) parser

val wrap : (('a, 'strm) parser * ('a -> 'b)) -> ('b, 'strm) parser

val seq : (('a, 'strm) parser * ('b, 'strm) parser) -> (('a * 'b), 'strm) parser
val seqWith : (('a * 'b) -> 'c)
      -> (('a, 'strm) parser * ('b, 'strm) parser)
	-> ('c, 'strm) parser

val bind : (('a, 'strm) parser * ('a -> ('b, 'strm) parser))
      -> ('b, 'strm) parser

val eatChar : (char -> bool) -> (char, 'strm) parser

val char   : char -> (char, 'strm) parser
val string : string -> (string, 'strm) parser

val skipBefore : (char -> bool) -> ('a, 'strm) parser -> ('a, 'strm) parser

val or : (('a, 'strm) parser * ('a, 'strm) parser) -> ('a, 'strm) parser
val or' : ('a, 'strm) parser list -> ('a, 'strm) parser

val zeroOrMore : ('a, 'strm) parser -> ('a list, 'strm) parser
val oneOrMore  : ('a, 'strm) parser -> ('a list, 'strm) parser

val option : ('a, 'strm) parser -> ('a option, 'strm) parser
val join   : ('a option, 'strm) parser -> ('a, 'strm) parser

val token : (char -> bool) -> (string, 'strm) parser

Description

type ('a, 'strm) parser = (char, 'strm) StringCvt.reader -> ('a, 'strm) StringCvt.reader

A parser is a function that takes a character reader and returns reader for the type of values the parser accepts.

val result : 'a -> ('a, 'strm) parser

result v getc strm returns SOME(v, strm); i.e., result v lifts the value v to a parser that returns v without consuming any input.

val failure : ('a, 'strm) parser

failure getc strm returns NONE; i.e. it is the parser that does not accept any input.

val wrap : 'a, 'strm) parser * ('a -> 'b -> ('b, 'strm) parser

wrap parser f composes the function f with parser

val seq : (('a, 'strm) parser * ('b, 'strm) parser) -> (('a * 'b), 'strm) parser

seq (parser1, parser2) is the sequential combination of the two parsers; i.e., a parser that will first parse a value v1 from the input using parser1 and then parse a value v2 using parser2 yielding the pair (v1, v2).

val seqWith : (('a * 'b) -> 'c) -> (('a, 'strm) parser * ('b, 'strm) parser) -> ('c, 'strm) parser

seqWith f (parser1, parser2) is the sequential combination of the two parsers composed with the function f; i.e., a parser that will first parse a value v1 from the input using parser1 and then parse a value v2 using parser2 yielding the result of f(v1, v2). This expression is equivalent to

wrap (seq (parser1, parser2), f)
val bind : 'a, 'strm) parser * ('a -> ('b, 'strm) parser -> ('b, 'strm) parser

bind parser f returns a parser that first uses parser to parse a value v from the input and then continues using the parser that results from f v.

val eatChar : (char -> bool) -> (char, 'strm) parser

eatChar pred returns a parser that parses one character c for which pred c returns true.

val char : char -> (char, 'strm) parser

char c returns a parser that parses the character c.

val string : string -> (string, 'strm) parser

string s`returns a parser that parses the string `s.

val skipBefore : (char -> bool) -> ('a, 'strm) parser -> ('a, 'strm) parser

skipBefore pred parser returns a parser that first skips any prefix of characters that satisfy the predicate pred and then applies parser to the input.

val or : (('a, 'strm) parser * ('a, 'strm) parser) -> ('a, 'strm) parser

or (parser1, parser2) returns the ordered choice of the two parsers; i.e., it returns a parser that first attempts to parse the input using parser1; if parser1 fails on the input, then it uses parser2.

val or' : ('a, 'strm) parser list -> ('a, 'strm) parser

or' parsers returns the ordered choice of a list of parsers. This expression is equivalent to

List.foldr or failure parsers
val zeroOrMore : ('a, 'strm) parser -> ('a list, 'strm) parser

zeroOrMore parser returns a parser that parses a list of zero or more items using parser.

val oneOrMore : ('a, 'strm) parser -> ('a list, 'strm) parser

oneOrMore parser returns a parser that parses a list of one or more items using parser.

val option : ('a, 'strm) parser -> ('a option, 'strm) parser

option parser returns a parser that parses an optional item (i.e., zero or one occurrences) using parser.

val join : ('a option, 'strm) parser -> ('a, 'strm) parser

join parser returns a parser that requires the optional item parsed by parser to be present.

val token : (char -> bool) -> (string, 'strm) parser

token pred returns a parser for a string of characters, where every character satisfies the predicate function pred.

Examples

As noted above, the parser type and combinators are designed around the StringCvt.reader representation of input streams. Thus, the scan functions defined in the {basis-lib-url}/index.html[Basis Library] are compatible with the parser type defined here. For example,

val boolParser : (bool, 'strm) parser = Bool.scan
val intParser  : (int, 'strm) parser = Int.scan StringCvt.DEC

Let us define the abstract syntax of a small expression language with addition, numbers, and let-bound variables.

datatype exp
  = VAR of string
  | NUM of int
  | ADD of exp * exp
  | LET of string * exp * exp

We can use parser combinators to implement a simple parser for this language as follows.

We start by defining a few utility definitions:

structure P = ParserComb

val +> = P.seq
infixr 3 +>

fun skipWS getc = P.skipBefore Char.isSpace getc

We can then define parsers for the atomic expressions (numbers and variables):

fun numParser getc = P.wrap (Int.scan StringCvt.DEC, NUM) getc
fun idParser getc = P.seqWith
      (fn (a, SOME b) => a ^ b | (a, NONE) => a)
      (P.wrap (P.eatChar Char.isAlpha, str),
       P.option (P.token Char.isAlphaNum))
      getc
fun varParser getc = P.wrap(idParser, VAR) getc

We need the separate idParser to parse let-bound identifiers.

We then define three, mutually-recursive, functions to parse expressions.

fun letParser getc = P.wrap (
      P.string "let" +> skipWS(idParser) +> skipWS(P.char #"=") +> expParser
      +> skipWS(P.string "in") +> expParser,
      fn (_, (x, (_, (e1, (_, e2))))) => LET(x, e1, e2)) getc
and expParser getc = P.wrap (
      skipWS (P.seq (
	P.or' [letParser, numParser, varParser],
	addParser)),
      fn (e, es) => List.foldl (fn (a, b) => ADD(b, a)) e es) getc
and addParser getc =
      P.zeroOrMore (skipWS (P.wrap (P.char #"+" +> expParser, #2))) getc

Note that the letParser must appear before the varParser in the list of parsers combined by or' to avoid treating the string "let" as a variable. Another detail is that we use List.foldl with a function that swaps the order of its arguments in order that addition is left associative.

If we evaluate the expression

StringCvt.scanString expParser " let x = 1+2 in x + x ";

we get the expected result

SOME (LET ("x", ADD (NUM 1, NUM 2), ADD (VAR "x", VAR "x")))