Using the parser

6 Using the parser

6.1 Parser Structure Signatures

The final structure created will have the signature PARSER:

signature PARSER =
sig
  structure Token : TOKEN
  structure Stream : STREAM
  exception ParseError

  type pos    (* pos is the type of line numbers *)
  type result (* value returned by the parser *)
  type arg    (* type of the user-supplied argument  *)
  type svalue (* the types of semantic values *)

  val makeLexer : (int -> string) ->
                    (svalue,pos) Token.token Stream.stream
  val parse :
      int * ((svalue,pos) Token.token Stream.stream) *
      (string * pos * pos -> unit) * arg ->
        result * (svalue,pos) Token.token Stream.stream
  val sameToken :
      (svalue,pos) Token.token * (svalue,pos) Token.token ->
        bool
end

or the signature ARG_PARSER if you used %arg to create the lexer. This signature differs from ARG_PARSER in that it which has an additional type lexarg and a different type for makeLexer:

type lexarg
val makeLexer : (int -> string)  -> lexarg ->
                  (svalue,pos) token stream

The signature STREAM (providing lazy streams) is:

signature STREAM =
sig
  type 'a stream
  val streamify : (unit -> 'a) -> 'a stream
  val cons : 'a * 'a stream -> 'a stream
  val get : 'a stream -> 'a * 'a stream
end

6.2 Using the parser structure

The parser structure converts the lexing function produced by ML-Lex into a function which creates a lazy stream of tokens. The function makeLexer takes the same values as the corresponding makeLexer created by ML-Lex, but returns a stream of tokens instead of a function which yields tokens.

The function parse takes the token stream and some other arguments that are described below and parses the token stream. It returns a pair composed of the value associated with the start symbol and the rest of the token stream. The rest of the token stream includes the end-of-parse symbol which caused the reduction of some rule to the start symbol. The function parse raises the exception ParseError if a syntax error occurs which it cannot fix.

The lazy stream is implemented by the Stream structure. The function streamify converts a conventional implementation of a stream into a lazy stream. In a conventional implementation of a stream, a stream consists of a position in a list of values. Fetching a value from a stream returns the value associated with the position and updates the position to the next element in the list of values. The fetch is a side-effecting operation. In a lazy stream, a fetch returns a value and a new stream, without a side-effect which updates the position value. This means that a stream can be repeatedly re-evaluated without affecting the values that it returns. If f is the function that is passed to streamify, f is called only as many times as necessary to construct the portion of the list of values that is actually used.

Parse also takes an integer giving the maximum amount of lookahead permitted for the error-correcting parse, a function to print error messages, and a value of type arg. The maximum amount of lookahead for interactive systems should be zero. In this case, no attempt is made to correct any syntax errors. For non-interactive systems, try 15. The function to print error messages takes a tuple of values consisting of the left and right positions of the terminal which caused the error and an error message. If the %arg declaration is not used, the value of type arg should be a value of type unit.

The function sameToken can be used to see if two tokens denote the same terminal, irregardless of any values that the tokens carry. It is useful if you have multiple end-of-parse symbols and must check which end-of-parse symbol has been left on the front of the token stream.

The types have the following meanings. The type arg is the type of the additional argument to the parser, which is specified by the %arg declaration in the ML-Yacc specification. The type lexarg is the optional argument to lexers, and is specified by the %arg declaration in an ML-Lex specifcation. The type pos is the type of line numbers, and is specified by the %pos declaration in an ML-Yacc specification and defined in the user declarations section of the ML-Lex specification. The type result is the type associated with the start symbol in the ML-Yacc specification.