Object Language Embedding
Standard ML is often used to write programs that manipulate another
language, called the "object language", or OL.
For example, the syntax of the HOL logic in hol90 or the syntax of CCS for the
Concurrency Workbench. Typically, one defines the
abstract syntax of OL by a datatype
declaration. Then useful
functions over the datatype can be defined (such as finding the free
variables of a formula when OL is a logic). Soon afterwards, one
concludes that concrete syntax is easier for humans to read than
abstract syntax, and so writes a parser and prettyprinter for the OL.
In the situation just outlined, ML is called the metalanguage. (Edinburgh/INRIA/Cambridge ML, the precursor to Standard ML, was originally a programming metalanguage for a particular object language, the LCF logic, in the LCF proof assistant.) The purpose of a quotation/antiquotation mechanism is to allow one to embed expressions in the object language’s concrete syntax inside of ML programs, and to mix the object language expressions with ML expressions.
Quotation and Antiquotation
The quote/antiquote mechanism is enabled by setting
Compiler.Control.quotation : bool ref
to true. Then the backquote character ceases to be legal in symbolic identifiers, and takes on a special meaning.
A quotation is a special form of literal expression that represents the concrete syntax of an OL phrase. The backquote character is used to delimit quotations.
For a running example, suppose our OL is a simple propositional logic
with propositions represented as values of abstract type prop
.
We might wish to write propositional expressions such as A/\B/\C
.
The most common approximation to quotation is strings. This is not pleasant at times, especially when dealing with backslashes and newlines. Still, strings are bearable. Strings are not adequate, however, for the following idea.
The ML-OL relationship invites a notion of antiquotation: the temporary abandonment of parsing so that an ML value can be spliced into the middle of a quotation. Operations like this have cropped up under various names in various places: antiquote is due to Milner; Quine had a version called quasi-quotation in his 1940 book; Carnap used a notation much like it. It also closely resembles the Lisp backquote facility.
Using backquote, we write
- val f = `A /\ B \/ C`; val f = [QUOTE "A /\ B \/ C"] : 'a SMLofNJ.frag list
The compiler interprets this as a list of fragments,
using the frag
data type from the SMLofNJ structure:
structure SMLofNJ = struct
...
datatype 'a frag = QUOTE of string | ANTIQUOTE of 'a
end
More commonly, we invoke an OL parser to parse, enforce precedence, etc. By naming the parser something concise, such as “%”, we can use the syntax
- val % = my_proposition_parser;
val % : prop frag -> prop
- val p = % `A /\ B \/ C`;
val p = -: prop
An antiquote is written as a caret ^
character
followed by either an SML identifier or a parenthesized SML expression.
Antiquotation can be used to conveniently express contexts, which
are often used as a descriptive tool for syntax. A context could be
defined as a function taking a prop
and directly placing it at a
location in a quotation.
- fun foo a = % `^a ==> A`;
val foo : prop -> prop
In this case, foo p
would denote the same proposition as
% `(A /\ B \/ C) ==> A`
Antiquotations can have nested quotations (which may contain antiquotes of their own, etc.):
- let val K x y = x
val I x = x
in % `A /\ ^(K (% `B`) (I (% `C`))) \/ C`
end;
gives the same prop
as that denoted by p
. We note in
passing that the power of the OL parser is completely up to its author:
for example, in the framework offered here, one could write an OL
“parser” for Scheme that parses function plus arguments, evaluates the
function on the arguments, and finally prints the returned value.
Implementation of OL parsers
A concrete syntax quotation is mapped by the SML compiler into a
frag list
. Intuitively, a frag
is a contiguous part of a
quotation: A /\ B
maps to [ QUOTE "A /\ B" ]
while ^x /\ ^y
maps to
[QUOTE "", ANTIQUOTE x, QUOTE "/\", ANTIQUOTE y, QUOTE ""]
(Note that the names frag
, QUOTE
, and ANTIQUOTE
need to be qualified by the SMLofNJ structure-id, in practice.)
In this approach, the value of a quotation has type ol frag list
where
ol
is the type of object language expressions; the type
of the OL parser is ol frag list → ol
.
The OL parser (in our example, %
) must handle these lists
and insert the antiquoted ML values in the right places.
CAVEATS
Often one wants to parse stratified languages, such as first order logic, or typed lambda calculus, which requires a trick. Also, there is a bit of trickery when one wants to deal with ML-Yacc and ML-Lex, especially when functorizing the parser.