The ParserComb
structure provides parser combinators over
character readers. This structure is modeled after the Haskell
combinators of Hutton and Meijer. The main difference is that
they return a single result, instead of a list of results.
This fact means that the or
combinator is a committed choice;
once one branch succeeds, the others will not be enabled. While
this property is somewhat limiting, for many applications it
will not be a problem.
Synopsis
signature PARSER_COMB
structure ParserComb : PARSER_COMB
Interface
type ('a, 'strm) parser =
(char, 'strm) StringCvt.reader -> ('a, 'strm) StringCvt.reader
val result : 'a -> ('a, 'strm) parser
val failure : ('a, 'strm) parser
val wrap : (('a, 'strm) parser * ('a -> 'b)) -> ('b, 'strm) parser
val seq : (('a, 'strm) parser * ('b, 'strm) parser) -> (('a * 'b), 'strm) parser
val seqWith : (('a * 'b) -> 'c)
-> (('a, 'strm) parser * ('b, 'strm) parser)
-> ('c, 'strm) parser
val bind : (('a, 'strm) parser * ('a -> ('b, 'strm) parser))
-> ('b, 'strm) parser
val eatChar : (char -> bool) -> (char, 'strm) parser
val char : char -> (char, 'strm) parser
val string : string -> (string, 'strm) parser
val skipBefore : (char -> bool) -> ('a, 'strm) parser -> ('a, 'strm) parser
val or : (('a, 'strm) parser * ('a, 'strm) parser) -> ('a, 'strm) parser
val or' : ('a, 'strm) parser list -> ('a, 'strm) parser
val zeroOrMore : ('a, 'strm) parser -> ('a list, 'strm) parser
val oneOrMore : ('a, 'strm) parser -> ('a list, 'strm) parser
val option : ('a, 'strm) parser -> ('a option, 'strm) parser
val join : ('a option, 'strm) parser -> ('a, 'strm) parser
val token : (char -> bool) -> (string, 'strm) parser
Description
type ('a, 'strm) parser = (char, 'strm) StringCvt.reader -> ('a, 'strm) StringCvt.reader
-
A parser is a function that takes a character reader and returns reader for the type of values the parser accepts.
val result : 'a -> ('a, 'strm) parser
-
result v getc strm
returnsSOME(v, strm)
; i.e.,result v
lifts the valuev
to a parser that returnsv
without consuming any input. val failure : ('a, 'strm) parser
-
failure getc strm
returnsNONE
; i.e. it is the parser that does not accept any input. val wrap : 'a, 'strm) parser * ('a -> 'b -> ('b, 'strm) parser
-
wrap parser f
composes the functionf
withparser
val seq : (('a, 'strm) parser * ('b, 'strm) parser) -> (('a * 'b), 'strm) parser
-
seq (parser1, parser2)
is the sequential combination of the two parsers; i.e., a parser that will first parse a valuev1
from the input usingparser1
and then parse a valuev2
usingparser2
yielding the pair(v1, v2)
. val seqWith : (('a * 'b) -> 'c) -> (('a, 'strm) parser * ('b, 'strm) parser) -> ('c, 'strm) parser
-
seqWith f (parser1, parser2)
is the sequential combination of the two parsers composed with the functionf
; i.e., a parser that will first parse a valuev1
from the input usingparser1
and then parse a valuev2
usingparser2
yielding the result off(v1, v2)
. This expression is equivalent towrap (seq (parser1, parser2), f)
val bind : 'a, 'strm) parser * ('a -> ('b, 'strm) parser -> ('b, 'strm) parser
-
bind parser f
returns a parser that first usesparser
to parse a valuev
from the input and then continues using the parser that results fromf v
. val eatChar : (char -> bool) -> (char, 'strm) parser
-
eatChar pred
returns a parser that parses one characterc
for whichpred c
returnstrue
. val char : char -> (char, 'strm) parser
-
char c
returns a parser that parses the characterc
. val string : string -> (string, 'strm) parser
-
string s`returns a parser that parses the string `s
. val skipBefore : (char -> bool) -> ('a, 'strm) parser -> ('a, 'strm) parser
-
skipBefore pred parser
returns a parser that first skips any prefix of characters that satisfy the predicatepred
and then appliesparser
to the input. val or : (('a, 'strm) parser * ('a, 'strm) parser) -> ('a, 'strm) parser
-
or (parser1, parser2)
returns the ordered choice of the two parsers; i.e., it returns a parser that first attempts to parse the input usingparser1
; ifparser1
fails on the input, then it usesparser2
. val or' : ('a, 'strm) parser list -> ('a, 'strm) parser
-
or' parsers
returns the ordered choice of a list of parsers. This expression is equivalent toList.foldr or failure parsers
val zeroOrMore : ('a, 'strm) parser -> ('a list, 'strm) parser
-
zeroOrMore parser
returns a parser that parses a list of zero or more items usingparser
. val oneOrMore : ('a, 'strm) parser -> ('a list, 'strm) parser
-
oneOrMore parser
returns a parser that parses a list of one or more items usingparser
. val option : ('a, 'strm) parser -> ('a option, 'strm) parser
-
option parser
returns a parser that parses an optional item (i.e., zero or one occurrences) usingparser
. val join : ('a option, 'strm) parser -> ('a, 'strm) parser
-
join parser
returns a parser that requires the optional item parsed byparser
to be present. val token : (char -> bool) -> (string, 'strm) parser
-
token pred
returns a parser for a string of characters, where every character satisfies the predicate functionpred
.
Examples
As noted above, the parser
type and combinators are
designed around the
StringCvt.reader
representation of input streams.
Thus, the scan
functions defined in the {basis-lib-url}/index.html[Basis Library]
are compatible with the parser
type defined here. For example,
val boolParser : (bool, 'strm) parser = Bool.scan
val intParser : (int, 'strm) parser = Int.scan StringCvt.DEC
Let us define the abstract syntax of a small expression language with addition, numbers, and let-bound variables.
datatype exp
= VAR of string
| NUM of int
| ADD of exp * exp
| LET of string * exp * exp
We can use parser combinators to implement a simple parser for this language as follows.
We start by defining a few utility definitions:
structure P = ParserComb
val +> = P.seq
infixr 3 +>
fun skipWS getc = P.skipBefore Char.isSpace getc
We can then define parsers for the atomic expressions (numbers and variables):
fun numParser getc = P.wrap (Int.scan StringCvt.DEC, NUM) getc
fun idParser getc = P.seqWith
(fn (a, SOME b) => a ^ b | (a, NONE) => a)
(P.wrap (P.eatChar Char.isAlpha, str),
P.option (P.token Char.isAlphaNum))
getc
fun varParser getc = P.wrap(idParser, VAR) getc
We need the separate idParser
to parse let-bound identifiers.
We then define three, mutually-recursive, functions to parse expressions.
fun letParser getc = P.wrap (
P.string "let" +> skipWS(idParser) +> skipWS(P.char #"=") +> expParser
+> skipWS(P.string "in") +> expParser,
fn (_, (x, (_, (e1, (_, e2))))) => LET(x, e1, e2)) getc
and expParser getc = P.wrap (
skipWS (P.seq (
P.or' [letParser, numParser, varParser],
addParser)),
fn (e, es) => List.foldl (fn (a, b) => ADD(b, a)) e es) getc
and addParser getc =
P.zeroOrMore (skipWS (P.wrap (P.char #"+" +> expParser, #2))) getc
Note that the letParser
must appear before the varParser
in the
list of parsers combined by or'
to avoid treating the string "let"
as a variable. Another detail is that we use
List.foldl
with a
function that swaps the order of its arguments in order
that addition is left associative.
If we evaluate the expression
StringCvt.scanString expParser " let x = 1+2 in x + x ";
we get the expected result
SOME (LET ("x", ADD (NUM 1, NUM 2), ADD (VAR "x", VAR "x")))