Standard ML of New Jersey Change Log

This file documents changes to the Standard ML of New Jersey system since March of 2000 (around Version 110.26). The change log primarily covers the compiler, the compilation manager (CM), the MLRISC library, and the runtime system. There are occasional entries about other components (e.g., the SML/NJ Library and ML-LPT), but these components have their own change logs that should be consulted.

Version 110.91; 2019/06/20

[2019/06/20]: We added a new primop, REAL_TO_BITS that casts a floating-point value to the same-size word value. This primop allows the Assembly.logb function to be implemented in SML.

We have also refactored the implementation of the Math structure to share common code across the versions that are specialized for different levels of hardware support.

John Reppy

[2019/06/19]: Rewrote the assembly code for the x86 and AMD64 targets. Previously, there were separate source files for Unix and Windows; these have been replaced by a single common file (one for each architecture). The assyntax.h file has also been replaced by x86-syntax.h, which covers both the x86 and AMD64 on both UNIX and Windows.

The AMD64.prim.asm file now compiles, although there are a few minor issues that will have to be fixed once we have a working code generator. We have also fixed a number of issues in the garbage collector related to the use of the 2-level BIBOP on 64-bit targets.

John Reppy

[2019/06/18]: Some cleanup in the interval-timer code. In keeping with the other time-specific functions, I have switched the runtime-system API to use unsigned 64-bit nanoseconds to specify time values. I have also added an implementation for c-libs/smlnj-runtime/itick.c, which was missing. Lastly, moved the Windows-specific file win32-timers.c from runtime/kernel to runtime/mach-dep.

John Reppy

[2019/06/16]: Added 64-bit implementations of the target-specific Basis Library modules in directory Basis/Implementation/Target64Bit.

John Reppy

[2019/06/16]: Added PackWord64Big and PackWord64Little structures to Basis Library. Note that the implementation of these is target-specific.

John Reppy

[2019/06/16]: Added bigEndian flag to the TARGET signature.

John Reppy

Version 110.90; 2019/06/12

[2019/06/12]: Fixed the Concurrent ML library to use 64-bit positions (both Unix and Windows) versions.

John Reppy

[2019/06/11]: Moved the year offset from SML to the runtime system. This change is necessary because Windows uses 1601 as year 0, whereas UNIX uses 1900. We have also switched to using unsigned 64-bit times in nanoseconds as the interface between the Basis code and runtime system. This change is consistent with the other places where time values are communicated between the runtime and SML code.

John Reppy

[2019/06/07]: Fixed a problem with CM’s symbol filtering (see bug #222).

The problem could manifest itself when a library l2.cm imported two symbols A and B from l1.cm and then exported the same A but a different B (which could have been defined in terms of the imported B). Moreover, for the problem to occur both A and B within l1.cm must have come from the same SML source file.

With the above setup, when running

CM.make "l2.cm";

it was possible that instead of seeing the new A defined within l2.cm one would still see the original version that came from l1.cm.

+ Matthias Blume

[2019/06/04]: Various 64-bit porting changes to the Windows implementation of the Basis Library and runtime system:

Add a target-specific Handle structure to support the HANDLE type, which is a pointer-sized word value.
Changes to support the use of 64-bit file positions.
Replaced pairs of argumnents representing time values (seconds and microseconds) with a single 64-bit count of microseconds.

+ John Reppy

[2019/06/04]: Implemented Basis Library proposal 2019-001 (Correction to the PRIM_IO signature). This proposal changes the return type of the avail function in a reader to be Position.int option, which is necessary to support large files.

John Reppy

[2019/06/03]: Added primop support (PTR_TO_WORD and WORD_TO_PTR) for the c_pointer type that was added in 110.89. These primops are exposed in the new InlineT.Pointer structure. We define a PointerImp structure that is used inside the Basis implementation and a Unsafe.Pointer structure that is visible to users.

John Reppy

Version 110.89; 2019/06/01

[2019/06/01]: Switched the Position structure to be bound to Int64 and updated the runtime system to use 64-bit integers for file offsets and time values (in nanoseconds). This change fixes bugs #33 (Overflow exception with inputLine function) and #36 (Can’t open very large file).

John Reppy

[2019/06/01]: Added abstract c_pointer type to the primitive types. This type will be used to represent runtime-system pointers (e.g., the HANDLE values in the Windows implementation).

John Reppy

[2019/05/31]: Removed makefiles and code for architectures and operating systems that are no longer supported (e.g., the DEC Alpha and HPPA architectures).

John Reppy

[2019/05/31]: Switched the FixedInt and LargeWord structure aliases to be 64-bits (i.e., FixedInt is now bound to Int64 and LargeWord is bound to Word64).

John Reppy

[2019/05/30]: We are now assuming that we have at least C99 support (for practical purposes, this assumption is even true on Windows). With this assumption, the allocation of small objects in the runtime has been switched from macros to inline functions (see runtime/include/ml-objects.h). This change allows a graceful handling of 32-bit integers, which are heap allocated on 32-bit machines, but tagged on 64-bit machines.

John Reppy

[2019/05/29]: Fixed various bugs in the implementation of the Word64 operations. The addition and subtraction operators were using arithmetic right shifts, instead of logical right shifts. Also, the translation of 64-bit shift operations was incorrect because of a typo in the variable names.

John Reppy

[2019/05/27]: Created a simplified version of the MLRiscGen functor. This version of the functor, which is in the file CodeGen/main/mlrisc-gen-fn.sml does not include the memory disambiguation and GC types code. Since the old version (CodeGen/main/mlriscGen.sml) did not use these features by default, there should be no difference in the quality of the generated code.

The purpose of this change is to remove unused code that has 32-bit dependencies.

John Reppy

[2019/05/25]: Added contraction for unsigned REM and NEG operations in CPS/opt/contract-prim.sml.

John Reppy

Version 110.88; 2019/05/15

[2019/05/15]: Moved the compiler/DEVNOTES directory to the dev-notes tree and renamed it old-compiler-notes.

John Reppy

[2019/05/15]: Added 64-bit versions of NumFormat and NumScan. We use the 32-bit version for numbers of 32-bits or less and the 64-bit versions for numbers with up to 64 bits. Thus, on 32-bit machines, the default int and word types use NumFormat32 and NumScan32, while on 64-bit machines they use NumFormat64 and NumScan64. This change also required splitting out some common code into a ScanUtil structure and also splitting out the scanning of real numbers into the ScanReal structure (formatting of reals was already in its own structure).

John Reppy

[2019/05/15]: Reimplemented the 64-bit int and word types to put them on a (mostly) equal footing with the other precisions. In this new implementation, the basic types int64 and word64 are now PRIMITIVE (instead of being ABSTRACT type represented by pairs of boxed 32-bit words). Arithmetic and comparison operations on these types are represented as primops and are preserved as such up to just before closure conversion. At that point, the new Num64Cnv structure (compiler/CPS/opt/numcnv.sml) is used to expand 64-bit operations and constants into 32-bit operations. Most of the 64-bit primops are inline expanded, but multiplication and division operations are converted to calls to library code from the CoreInt64 and CoreWord64 modules (system/smlnj/init).

Because the type are primitive, we were able to change the runtime representation to use packed records (RK_RAWBLOCK) to represent them, which saves space and should also help with performance.

See the dev-notes/num64.md file for more details about the implementation.

John Reppy

[2019/05/09]: Reorganized the Basis Library source files (system/Basis) to isolate dependences on target word size.

In the Basis/Implementation directory, I created subdirectories (e.g., Target32Bit) to hold implementations that are specific to the target. These directories include a bind-structs.sml file that replaces the many bind-*.sml files in Basis/Implementation.

In the Basis/Exports directory, I replaced the many individual files (each with a single module renaming) with bind-common.sml (for target-independent bindings) and a target-specific file (either bind-target-32-bit.sml or bind-target-64-bit.sml).

John Reppy

[2019/05/05]: Some of the CPS optimization modules (Expand and EtaSplit were written as functors over the machine spec, when, in fact, they never reference their functor argument. Therefore, they have been converted to structures.

John Reppy

[2019/05/04]: We now use the InlineT.identity primop for Fn.id, so the compiler can optimize it.

John Reppy

[2019/05/03]: Fixed pretty-printing regression in 110.87; value of char type were missing their enclosing quotes.

John Reppy

Version 110.87; 2019/05/03

[2019/05/03]: Made the Char.chr operator inline (a primop was added to support this change in 110.86).

John Reppy

[2019/05/03]: Major renaming of the primitive operators in the Inline structure (as described in dev-notes/primop-list.md). Also cleaned up the Basis Library implementation to remove most (but not all) 32-bit dependencies.

John Reppy

[2019/05/03]: Added cases to the top-level pretty printer to handle the new basic types that were added in 110.86 (e.g., word8vector and chararray). Also changed the way that primitive types are handled to use a table keyed by tycons, instead of a sequence of nested conditionals.

John Reppy

Version 110.86; 2019/05/02

[2019/05/01]: Added word8vector and chararray to the primitive types that the compiler knows about. These will be used in the rewriting of the InlineT structure.

John Reppy

[2019/05/01]: Replaced the Primop.primop constructors NUMSUBSCRIPT and NUMUPDATE with

`sml | NUMSUBSCRIPT of numkind | NUMSUBSCRIPTV of numkind | NUMUPDATE of numkind | INLNUMSUBSCRIPT of numkind | INLNUMSUBSCRIPTV of numkind | INLNUMUPDATE of numkind `

This design matches the naming conventions for polymorphic subscripting and updating.

John Reppy

[2019/05/01]: Added Primop.INLCHR to implement Char.chr as an inline function. This change also required moving the definition of the Chr exception to the Core module so that it is accessible to the translate phase. The inline version of Char.chr will be enabled in the 110.87 release (we need the internal primop before we can use it).

John Reppy

[2019/05/01]

Major overhaul of the representation of primitive operators (both in the Primop and CPS.P structures). The primitive arithmetic and comparison operations are now defined in the ArithOps structure (ElabData/prim/arithops.sml). There are three datatypes defined in this module

arithop — integer arithmetic operations that may raise overflow
pureop — arithmetic operations that are pure
cmpop — comparison operations

These types are used in both the Primop and CPS.P modules, which makes the translation between representations more direct.

Some details:

inline division and modulo operations were added to the Primop.primop datatype; the expansion of these in the TransPrim module (FLINT/trans/transprim.sml) adds explicit checks for division by zero.
the FSGN operator was added to the Primop.primop datatype, since the new cmpop datatype does not include it (the CPS IR already had FSGN as a separate branch constructor).
unsigned comparison operations are now represented by using the UINT numkind, which is consistent with how they are represented in CPS.
Renamed the primop ROUND to REAL_TO_ROUND.
the encodings for operators were revised in the pickler, resulting in a more compact use of the numeric codes.

John Reppy

[2019/04/28]: Removed unused record kind constructors (RK_SPILL, RK_EXN, and RK_BLOCK) from CPS.record_kind datatype. Also renamed RK_I32BLOCK to RK_RAWBLOCK and RK_FBLOCK to RK_RAW64BLOCK. Various other renamings to remove 32-bit assumptions.

John Reppy

[2019/04/28]: Renamed DTAG_raw32 to DTAG_raw, since the semantics on 64-bit systems will be to require word-size aligned raw data. Also renamed ML_AllocRaw32 to ML_AllocRaw and ML_ShrinkRaw32 to ML_ShrinkRaw for similar reasons.

John Reppy

[2019/04/28]: Removed unused flags from the Control structure; most of these came from Control.CG, where roughly 20 out of 60 flags were no longer used.

John Reppy

[2019/04/27]: Split the contraction of primitive operators out of the Contract structure into its own ContractPrim structure.

John Reppy

[2019/04/27]: Split the translation of primops to PLambda out into its own file (compiler/FLINT/trans/transprim.sml).

John Reppy

[2019/04/27]: Fixed regression: Word32.toInt 0wx8002DE32; would return 187954 instead of raising Overflow. The problem was a mistake in the way that the overflow trap was being generated in MLRiscGen.

John Reppy

[2019/04/26]

Some minor primop cleanup.

Changed the types of Primop.ROUND and Primop.REAL to take bitwidths, instead of numkinds, since the kinds are always the same. Also, the fields are now called from and to (instead of fromkind and tokind) to be consistent with other conversion primops.
Renamed ABS to FABS, since it is only used on floating-point numbers.
Renamed the CPS primitive operator ROUND to REAL_TO_INT and the operator REAL to INT_TO_REAL.
Renamed the Primop.REAL to Primop.INT_TO_REAL so that it is not confused with the other constructors named REAL.

John Reppy

[2019/04/23]: Improvements to the core 64-bit int and word modules in system/smlnj/init. Replaced Int64.+, Int64.-, Word64.+, and Word64. with versions from *Hacker’s Delight that use fewer conditional branches. Also replaced the relational operators (<, ⇐, etc) with more direct implementations.

John Reppy

[2019/04/21]: Fix for bug #213 (Int32.div raises Div instead of Overflow when dividing minInt by ~1). Since the compiler generates an explicit test for division by zero, we know that the only arithmetic traps must be caused by other operations. Therefore, we can just map any arithmetic trap to Overflow.

Also removed the old SPARC assembly code for multiplication and division. The code generator always uses the native hardware instructions, so the assembly code is not needed.

John Reppy

[2019/04/21]: Yet another attempt to get the implementation of use in the REPL working in a sensible way.

With these changes, use should behave as follows. If an invocation of use encounters a compilation error (either in the initial file or in a nested invocation of use), then the compiler error message will be printed and the call to use will immediately return (). If an invocation of use raises an exception during execution of the compiled code (either in the initial file or in a nested invocation of use), then the exception will be reported at the top-level. Any change to the global state or environment that occurs before an error is encountered, will not be rolled back.

Files specified as command-line arguments to the sml command will be treated as if use was invoked on them. If there is an error, then the error will be reported and the sml command will terminate with a non-zero exit status (at least on Unix).

This change fixes bugs #193, #217, and #219. There is a connection between this change and #183, which was fixed in Version 110.82.

John Reppy

[2019/04/21]: Change to the CPS primops: moved the F_SGN operator (which is unary) from the fcmp datatype to the branch datatype (and renamed it FSGN).

John Reppy

[2019/04/21]: Finished conversion of the CPS IR to a form that is compatible with ASDL. Basically, this involved converting the datatype constructor names to upper-case identifiers.

These changes are a step in the plan to eventually switch to an LLVM-based code generator that will be given pickled CPS code as its input.

John Reppy

[2019/04/08]: Starting to migrate the CPS IR toward the ASDL version. Changed the names of the CPS.P.arith and CPS.P.cmpop constructors to be upper-case alpha IDs (many of them were symbolic identifiers). Also split out the various utility functions into the new CPSUtil module (CPS/cps/cps-util.sml). Lastly, moved the literals.sml file from FLINT/main to CPS/main (where it belongs).

Note that the CPS.P.arithop datatype is now identical to the Primop.arithop datatype

John Reppy

[2019/04/07]: Reorganized the backend of the compiler by moving the CPS-related code into its own directory tree (Compiler/CPS) and replacing the FLINTComp functor with the FLINTOpt structure and the CPSCompFn functor. The conversion from FLINT to CPS is part of the CPSCompFn functor, which takes the program representation all the way from FLINT to machine code segments.

John Reppy

Version 110.85; 2018/12/21

[2018/12/21]: Modified config/install.sh to look for a pre-Mojave SDK when trying to install on macOS 10.14 Mojave.

John Reppy

[2018/12/21]: Updated runtime/objs/cygwin.def so that the runtime system will build on 32-bit Cygwin. Also updated installation script to suggest using the 32-bit version of Cygwin when a user tries to install it on Cygwin64.

John Reppy

[2018/11/10]

Xcode 10.1, which is Apple’s development environment for macOS 10.14 Mojave, does not include the libraries needed to build 32-bit executables, such as the SML/NJ runtime, although 32-bit programs will still run.

To support building on Mojave, I added a new Makefile (mk.x86-darwin18) for the runtime system and modified the config/install.sh script to use this makefile when necessary. This new makefile expects that the MacOSX10.13.sdk directory from Xcode 9 has been copied into the Xcode 10 SDKs directory. Note that updating Xcode from the AppStore will likely remove the 10.13 SDK, so you should keep a copy in a safe place.

The Xcode SDKs live in Platforms/MacOSX.platform/Developer/SDKs under the Developer directory. One can determine the path to the current developer directory using the command

% xcode-select -p

John Reppy

[2018/10/10]: Removed several unsupported primitive operators from the compiler. In the CPS IR, these were free, acclink, setpseudo, setmark, and getpseudo. The pseudo-register operations were not supported in the code generator, while the others were no-ops. The corresponding operators GETPSEUDO, SETPSEUDO, SETMARK, and DISPOSE were removed from ElabData/prim/primop.sml and their bindings were removed from Semant/prim/primop-bindings.sml and the InlineT and Unsafe structures.

The AllocProf module in the compiler was also disabled, since it relied on the pseudo registers for recording profile information at runtime. Furthermore, uses of the acclink primitive operation in FLINT/cps/closure.sml when static profiling is enabled were removed.

These changes were committed as revision 4886.

John Reppy

[2018/10/08]: Fix for bug #216 (run-time system fatal error with large top-level value). The problem was in the code for building literals.

John Reppy

[2018/09/23]

Change CPS operators for wrapping/unwrapping integer and float values to be word-size flexible. We now use a single wrap (and unwrap) operator that is parameterized by a numkind value. We also changed the wrap/unwrap operators to box/unbox. The mapping from old operators to new ones is as follows:

wrap

⇒

box

unwrap

⇒

unbox

iwrap

⇒

wrap(INT defaultIntSz)

iunwrap

⇒

unwrap(INT defaultIntSz)

i32wrap

⇒

wrap(INT 32)

i32unwrap

⇒

unwrap(INT 32)

fwrap

⇒

wrap(FLOAT 64)

funwrap

⇒

unwrap(FLOAT 64)

John Reppy

[2018/09/13]: Further cleanup for 64BIT in function atomeq in PEqual. (base/compiler/FLINT/trans/pequal.sml). Added numKind, intEqTy, and uintEqTy functions. The numKind function should be extended once int64 and word64 are treated as primitive types in the compiler.

Dave MacQueen

[2018/09/12]: Fixed 64BIT issue in module MatchComp (base/compiler/FLINT/trans/matchcomp.sml). Added int64Ty and word64Ty cases to function numCon.

Dave MacQueen

[2018/09/12]: Fixed 64BIT issue in module Equal (base/compiler/FLINT/reps/equal.sml). Exports just one function: equal_branch, which is called once in reps/wrapping.sml to type-specialize branches on calls to POLYEQUAL.

Dave MacQueen

[2018/09/12]: The CPS optimizer had a mechanism for checking the CPS against the FLINT types, which required maintaining a mapping from lvars to their FLINT types. This code has long since bit-rotted and cannot even handle a simple expression like 1+2. Therefore, I’ve removed the mapping (a hash table) from the CPS optimizer and the vestigial code that modified it in the various CPS optimization passes.

John Reppy

[2018/09/12]: Modified the InfCnv (now named IntInfCnv) structure to remove 32-bit dependencies.

John Reppy

[2018/09/11]: Modified Pequal (in base/compiler/FLINT/trans/pequal.sml) and Translate (in base/compiler/FLINT/trans/translate.sml) to remove 32-bit dependencies. though further changes will be required to properly handle int64 and word64 types when defaultIntSz = 64.

Dave MacQueen

[2018/09/11]: Reimplemented the Switch module (int base/compiler/FLINT/cps). The new implementation follows the same basic design as before, but the code is better organized and documented, and it now uses the concrete CPS representations, instead of being parameterized over an abstraction of them. It also now uses binary search for boxed (e.g., Int32.int) switches.

John Reppy

Version 110.84; 2018/09/03

[2018/09/03]: Reimplemented the array/vector-slice modules to use a (base, start, length) representation (as does Substring in system/smlnj/init/substring.sml). Also fixed a bug in the slice findi functions, where the index being passed to the predicate function was not adjusted to be slice-relative.

John Reppy

[2018/09/02]: Implemented Basis Library proposal 2018-002 (Additional slice operations).

John Reppy

[2018/09/02]: Improved implementation of CharVectorSlice.map and CharVectorSlice.mapi to not build intermediate list of results.

John Reppy

[2018/08/28]: A beta-release of ASDL library and asdlgen tool have been added to the system. This version of the tool implements SML support, but the C++ support is not complete. There is a CM tool for ASDL, which recognizes the .asdl file suffix.

John Reppy

[2018/08/28]

Two changes to the installer (base/base/system/smlnj/installer):

The build scripts for programs are now named build.sh (instead of build) on Unix systems.
The config action has been added to support module configuration.

John Reppy

[2018/08/27]: Added RENAME extension style to CM tool support. This extension style allows arbitrary file names to be generated from the base name.

John Reppy

[2018/08/19]: Fixed a bug in the implementation of monomorphic buffers: the functions CharBuffer.add1 and Word8Buffer.add1 had an incorrect length test.

John Reppy

[2018/06/15]: Fixed a compiler bug (arg ty lists wrong length) in unifyTy that could occur when one of the type constructors is the ERRORtyc. This bug occurs because the ERRORtyc is equal to any other type constructor, which (incorrectly) implies that the number of type arguments should be equal.

John Reppy

Version 110.83; 2018/06/01

[2018/05/29]: Fixed #206 (Parsing of explicit type variables and val rec is broken). This bug was also bug number 1261 in the old bugs list.

John Reppy

[2018/05/29]: Fixed minor bug in Date.toString (missing leading "0" for day of month). This issue was bug number 1444 in the old bugs list.

John Reppy

[2018/05/29]: Cleaned up match compiler code (FLINT/trans/matchcomp.sml) and added typing and function comments. Added debugging and printing infrastructure, including new FLINT/trans/mcprint.sml file, and new Control.MC.debugging flag.

Dave MacQueen

[2018/05/29]: Fixed parser to allow parentheses around val rec patterns.

John Reppy

[2018/05/28]: Fixed the scanner to produce the correct error message for bad escape sequences in string literals.

John Reppy

[2018/05/26]: Fixed old bug number 1383: Char.toCString #"\000" returned "\\0", instead of "\\000", which caused String.toCString to produce invalid results.

John Reppy

[2018/05/19]: Fix for bug #201 (The AMD64.cm library is missing).

John Reppy

[2018/05/18]: Added MONO_BUFFER signature, with instances CharBuffer and Word8Buffer, to Basis implementation ( Basis Library Proposal 2018-001).

John Reppy

[2018/05/16]: Fix a bug where “0w” was being accepted as a prefix for a hexidecimal word value in Word.fromString/scan (ignoring case, only “0x” and “0wx” are valid prefixes). This change fixes bug number 1375 from the old bugs list.

John Reppy

[2018/05/13]: Fixed a bug in the parsing of bindings involving the op keyword. The parser was more restrictive than the definition. This change fixes bug number 1370 from the old bugs list.

John Reppy

[2018/05/12]: The lexer gave an unmatched close comment error on "*)", when it should have scanned it as the tokens "*" ")". This change fixes bug number 330 in the old bugs list.

Note: there is some ambiguity as to what the correct behavior should be here. The Definition of Standard ML (1997) only says that unmatched open comments should be signalled as errors, but the Commentary on the Definition of Standard ML (1991) says otherwise in Appendix D. SML/NJ started signalling an error in version 0.71, but we choose to revert to accepting this sequence, to match the 1997 Definition (and the behavior of other systems).

John Reppy

[2018/05/07]: The sameSign function returned incorrect results in the Int31 and Int32 modules.

John Reppy

[2018/05/07]

Fixed various minor parsing and scanning issues:

correct syntax for type variables
signature/structure/functor IDs should always be alpha IDs
the equality ID (=) cannot appear in a binding context. Note that we still allow the syntax val op = = … because it is needed to parse the file system/smlnj/init/built-in.sml.

John Reppy

[2018/05/05]: Completed overhaul of the way that int/word literals are handled in the compiler. We now use IntInf.int to represent the values in all IRs. This change also results in better CPS contraction, since we now perform constant folding for both signed and unsigned values at all sizes. We were also able to get rid of the tricky code that worries about large tagged integer values that might cause overflow during code generation.

John Reppy

[2018/04/21]: Improved the reporting of errors involving literal values. We now use the original source text when describing the value in the error message.

John Reppy

[2018/04/20]: Fix for bug #191 (Compiler crash when handling large reals). We now issue a warning for real literals that will round to zero and an error for real literals that are too large to represent. There still needs to be done some work to support sub-normal literal values (these are currently rounded to zero).

John Reppy

[2018/04/14]: Changed the representation of real literals from strings to RealLit.t.

John Reppy

[2018/04/13]: Removed real patterns from Absyn and FLINT, since they are not allowed by SML'93 and were not present in the AST representation.

John Reppy

[2018/04/12]: Fix for bug #194 (Real.fromString overflows or hangs). There were two issues here. First, the Overflow exception was being raised when scanning large exponents, but it was not being handled by the scanning code. The second issue was that the scaling loop for large exponents did not immediately terminate once infinity (or zero) was reached, so it could take a long time.

John Reppy

[2017/10/22]: Moved the Version-1 literal building code into gc/old-literals.c. This file can be removed once the compiler generates the Version-2 literal bytecode.

John Reppy

[2017/10/16]: Moved the check for whether a int or word literal is in range for its type from the absyn→plambda translation to the overload resolver (compiler/Elaborator/types/overload.sml).

John Reppy

[2017/10/14]: Part 1 of an overhaul of the way that the compiler treats int/word literals. The end goal is to use IntInf.int to represent literals throughout all phases of the compiler. In this step, we changed the representation of literals in the Absyn representation (earlier representations already used IntInf.int).

John Reppy

Version 110.82; 2017/10/16

[2017/10/01]: Fixed unnumbered bug in IntInf.mod and IntInf.rem functions, where the Div exception was not getting raised when both arguments are 0.

John Reppy

[2017/09/20]: Various bits of cleanup in the handling of primitive operations, such as removing the ptnum mechanism for translating from Absyn to FLINT.

David MacQueen

[2017/09/20]: Added Target module, which specifies the properties of the target (e.g., the size in bits of the default int type). Reworked the generation of the InlineT structure to be target specific.

John Reppy

[2017/09/18]: Removed FLINT primops (and their CPS counterparts) that are not in the InlineT structure and, thus, are never used by the compiler.

John Reppy

[2017/09/18]: Fixed bug #123 (missing nonexhaustive bind warning). The mkVBs function in FLINT/trans/translate.sml was adding a redundant default rule by calling ElabUtil.completeMatch after a default rule had already been explicitly added to the match for let bindings.

David MacQueen

[2017/09/18]: Fixed bug #183 (status code returned by sml REPL). This fix restores the version 110.79 behavior of having sml foo.sml exit with a non-zero status when there is a type-checking error in foo.sml. It also cleans up the error messages associated with use when there is a syntax error.

John Reppy

[2017/08/28]: Fixed bug #185 (Bring command line help text into parity with man page). Added missing options (@SMLversion and @SMLsuffix) to the help message that is printed for the command “sml -h”. Also adjusted the order of options in the help message, and in the man page, so that the orders match.

John Reppy

[2017/08/12]: Changed the way that we test for allocation-space addresses in minor GCs. Instead of using the BIBOP, we now do a pointer range test. On 32-bit systems, this change results in a small (~0.13%) performance boost, but we expect a bigger impact on 64-bit hardware, where the cost of BIBOP probes will be higher and there are more registers available to hold the nursery bounds.

John Reppy

[2017/08/12]: Fixed some issues in build-literals.c. These were mostly false positives in the assertions, but there was also a bug in the way that the available space was tracked that could conceivably result in a crash (but was very unlikely).

John Reppy

[2017/06/07]: Updated _arch-n-os script to recognize macOS 10.13 (High Sierra) as a valid target.

John Reppy

[2017/05/17]: Fixed a bug in the way that JSON string values were being printed. The code previously assumed that C-style escaping will work, but that is not true for "\'" (as well as for control and non-ASCII characters). The new implementation assumes that the string value is UTF-8 and uses the "\\u" escape sequences for characters outside the JSON escapes and printable ASCII characters.

John Reppy

Version 110.81; 2017/05/01

[2017/04/28]: Fixed bug #129 (Symbolic identifiers are allowed as strids).

Dave MacQueen

[2017/04/07]: Fixed bug #179 (ml-ulex writing debug messages to stdOut). Both ml-ulex and ml-antlr now direct their debug and status messages to stdErr (instead of stdOut).

John Reppy

[2017/02/09]: Linux distributions are starting to require that the stack be marked as non-executable in applications. Because the runtime system includes assembly code, this marking was not happening. We’ve added .section directives to the PPC.prim.asm and X86.prim.asm files as per https://wiki.gentoo.org/wiki/Hardened/GNU_stack_quickstart#Patching. Thanks to Daniel Moerner for reporting this issue and for providing a pointer to the fix.

John Reppy

[2016/10/15]: Added --debug command-line option to ml-antlr. This flag causes <b>ml-antlr</b> to generate debug actions that print the left-hand-side non-terminal of the production.

John Reppy

[2016/09/15]: Working on 64-bit support. Changes include making code generation dependent on the target word size and abstracting over the BIBOP representation in the runtime system.

John Reppy

[2016/09/15]

Further cleanup of the separation of FLINT from the front-end. Eliminated all references to ModulePropLists (module-plists.sml) in the front end and in pickling, and moved module-plists from Semant/modules to FLINT/trans. ModulePropLists is now only used in FLINT/trans/translate.sml.

Revision: 4314

Files changed:

compiler/ElabData/modules/modules.sml (cleaned up)
compiler/Elaborator/print/ppmod.sml (cleaned up)
compiler/FLINT/trans/module-plists.sml (moved from Semant/modules)
compiler/Semant/modules/instantiate-param.sml (deleted)
compiler/Semant/pickle/pickmod.sml (no longer mentions property lists)
compiler/Semant/pickle/unpickmod.sml (ditto)
compiler/Semant/statenv/prim.sml
compiler/Semant/types/tp-var-info.sml (deleted)
compiler/core.cm (modified for move of module-plists.sml)

Dave MacQueen

[2016/09/14]

Eliminated dependency of PlambdaType from the front end by adding a type TKind.tkind which is a simplified standin for PlambdaType.tkind for use during elaboration. TKind.tkind values are translated on demand to PlambdaType.tkind in trans/transtypes.sml. Types still has a tycpath type but it is defined using TKind.tkind now. The new structure SigPropList replaces ModulePropLists (Semant/modules/module-plists.sml) for use in instantiate.sml. Instantiate is now defined directly as a structure so the functor application in Semant/modules/instantiate.sml no longer exists.

Files changed:

ElabData/basics/debindex.sig (moved here from Elaborator/basics)
ElabData/basics/debindex.sml (ditto)
ElabData/basics/sig-plist.sml (new)
ElabData/basics/tkind.sml (new)
ElabData/types/types.sig
ElabData/types/types.sml
Elaborator/modules/instantiate.sml
Elaborator/print/ppmod.sml
FLINT/trans/transtkind.sml (new)
FLINT/trans/transtypes.sml
TopLevel/interact/evalloop.sml
ElabData/elabdata.cm
Elaborator/elaborate.cm
core.cm

Dave MacQueen

[2016/09/14]

Added support for Successor ML record-expression-punning syntax. For example, one can now define a function f as

fun f x = {x}

which is equivalent to the definition

fun f x = {x = x}

John Reppy

[2016/09/14]: Fixed a bug in the parser. Asterix (*) was not allowed as a record label when using the record-pattern-punning syntax.

John Reppy

[2016/09/14]: Added support for do exp Successor ML syntax.

John Reppy

[2016/09/12]: Fixed bug #153 (Enabling Successor ML features is delayed). We now use a function Control.setSuccML to switch to/from Successor ML mode in the REPL. The function resets the parser, so the next input will be correctly parsed. The Control.succML flag is no longer visibile in the REPL.

John Reppy

[2016/09/12]

Fixed bug #149 (Datatype replication exposes hidden constructors). Added boolean field stripped to DATATYPE variant of tyckind in compiler/ElabData/types/types.sml with default value false. stripped is set to true when a datatype is matched with a simple type spec in signature matching, and datatypes with stripped set to true are disallowed in datatype replications.

Files changed:

compiler/ElabData/types/types.sig
compiler/ElabData/types/types.sml
compiler/ElabData/types/typesutil.sml
compiler/ElabData/types/core-basictypes.sml
compiler/Elaborator/types/basictypes.sml
compiler/Elaborator/types/eqtypes.sml
compiler/Elaborator/modules/evalent.sml
compiler/Elaborator/modules/sigmatch.sml
compiler/Elaborator/modules/instantiate.sml
compiler/Elaborator/print/ppabsyn.sml
compiler/Elaborator/print/pptype.sml
compiler/Elaborator/elaborate/elabcore.sml
compiler/Elaborator/elaborate/elabmod.sml
compiler/Elaborator/elaborate/elabtype.sml
compiler/Elaborator/elaborate/elabsig.sml
compiler/Semant/pickle/pickmod.sml
compiler/Semant/pickle/unpickmod.sml
compiler/MiscUtil/print/ppobj.sml
compiler/FLINT/trans/transtypes.sml
compiler/FLINT/trans/pequal.sml

Dave MacQueen

[2016/08/31]: Added %tokentype directive to ml-antlr; this directive allows users to specify the token datatype externally, which is necessary in order to share a lexer with two different ml-antlr parsers.

John Reppy

[2016/08/20]: Change the interface to AMD64Gen in MLRISC; the signBit and negateSignBit callbacks now return an MLTree.rexp (instead of a label).

John Reppy

Version 110.80; 2016/08/19

[2016/08/16]: Fixed #151 (Error installing from source on Mac OS X). The fix involves both changes to the config/install.sh script and the mk.x86-darwin makefile. With this fix, we include the SDK argument to the /usr/bin/as only when the OS version is 10.10 (Yosemite) or later.

John Reppy

[2016/08/10]: Added the proposed unzipMap, unzipMapi, find, and findi functions to the ListPair module.

John Reppy

[2016/08/10]: Added the proposed mapLeft, mapRight, appLeft, and appRight functions to the Either module.

John Reppy

[2016/08/09]: Fixed bug #145 (Internal exception occurs on bogus annotation instead of typechecking diagnostic). Added missing OVLD_UB case in function failMessage in compiler/Elaborator/types/unify.sml.

Dave MacQueen

[2016/08/04]: Fixed bug #166 (Can’t install SML/NJ in directories containing spaces). Thanks to Eugene Sharygin for the patch.

John Reppy

[2016/06/21]: Fixed incorrect dividend sign extension before 32-bit divide in amd64 code generator in MLRISC

Mike Rainey

[2016/06/16]: Fixed bug #150 (Add title to batch script).

John Reppy

[2016/05/11]: Implemented the changes for Basis Library Proposal 2016-001. This proposal added the popCount function to the WORD signature.

John Reppy

[2016/05/03]: Fixed bug #156 (sml resumes after SIGSTOP with bogus exception report). The fix is a bit of a hack: I modified the non_bt_hdl function in evalloop.sml to match an IO.Io exception with the appropriate shape for this situation.

John Reppy

[2016/04/07]: Fixed bug #154 (Return code for ml-ulex when there is an error).

John Reppy

[2016/04/07]: Fixed bug #155 (Misleading printing of word literals in error messages).

John Reppy

[2016/04/02]: Fixed a bug in the implementation of the --ml-lex-mode flag for ml-ulex. The \h escape sequence is supposed to map to the character range [\128-\255], but did not.

John Reppy

[2015/11/09]: Fixed bug #147 (Hexadecimal escapes in strings are not supported). We previously did not support Unicode escapes in string literals. We now do so, with non-ascii codepoints being mapped to the UTF-8 encoding with escape values in the range 0..255 being mapped to the corresponding 8-bit character. Values outside that range are flagged as an error.

Revised August 4, 2016

John Reppy

[2015/10/28]: Partial fix for the noisy exception-stack traces on the Error exception. The cases that are handled by this change are applying use to a non-existent file and when there are compilation errors in a program being built by CM.make. What remains to be handled is the situation where CM.make is applied to a non-existent file.

John Reppy

Version 110.79; 2015/10/04

[2015/10/04]: Patched base/compiler/FLINT/clos/closure.sml so that Twelf will build again. Fixes bug #140 (Lookup failure in closure.sml when compiling Twelf).

John Reppy

[2015/09/28]

Added support for a Successor ML tool to CM. This tool allows one to specify that a source file fool.sml is Successor ML source code in the following ways:

foo.sml : succ-ml
foo.sml : sml (succ-ml)
foo.sml (succ-ml)

John Reppy

[2015/09/28]

Added the directory base/old-basis to support backward-compatible views of the Basis Library. You can use these by replacing the line

$/basis.cm

with

$/basis-2004.cm

in your CM files.

[2015/09/28]: New implementation of Date structure in the Basis, which fixes bugs #138 (Incorrect behavior for Date.fromTimeLocal) and #139 (Date.date is broken). Note that some more thought should be given to the correct semantics of Date.date when dealing with offsets. For example, should an offset of +23 hours produce the same date as an offset of -1 hours? Currently our implementation produces different results (by a day) for these two situations.

John Reppy

[2015/09/25]

Implemented the changes for Basis Library Proposal 2015-003. This proposal added operations to the following signatures:

signature ARRAY
signature LIST
signature LIST_PAIR
signature MONO_ARRAY
signature MONO_VECTOR
signature OPTION
signature STRING
signature TEXT
signature VECTOR

and the following structures:

structure Array : ARRAY
structure CharArray : MONO_ARRAY
structure CharVector : MONO_VECTOR
structure List : LIST
structure ListPair : LIST_PAIR
structure Option : OPTION
structure Real64Array : MONO_ARRAY
structure Real64Vector : MONO_VECTOR
structure String : STRING
structure Text : TEXT
structure Vector : VECTOR
structure Word8Array : MONO_ARRAY
structure Word8Vector : MONO_VECTOR

While it is very unlikely that these changes will break existing code, there are a a couple scenarios in which the code might break. Namely, when use of open introduces conflicts and when user code implements one of the affected Basis Library signatures. Both of these examples occurred in the SML/NJ source code; the former in the ml-yacc sources and the latter in the MLRISC sources.

John Reppy

[2015/09/25]: Added the optional implementations of PackReal64Big and PackReal64Little. This addition addresses feature request #82 (Implementations of PACK_REAL missing). The implementation uses the approach suggested by Michael Sullivan.

John Reppy

[2015/09/24]: Fixed bug #45 (Compiler bug in specialize phase). This bug was in compiler/FLINT/opt/fcontract.sml and was the result of a bad interaction between eta contraction and inlining. As part of the fix, I cleaned up the code in this part of FLINT a bit.

John Reppy

[2015/09/21]: Improvements to the error messages produced by the ml-ulex lexer generator.

John Reppy

[2015/09/21]: Added Ref structure and REF signature to Basis implementation ( Basis Library Proposal 2015-007).

John Reppy

[2015/09/21]: Added Fn structure and FN signature to Basis implementation ( Basis Library Proposal 2015-005).

John Reppy

[2015/08/22]: Fixed bug #136 (Incorrect raising of exceptions in Real.fmt and Time.fmt).

John Reppy

[2015/08/14]: Added Either structure and EITHER signature to Basis implementation ( Basis Library Proposal 2015-002).

John Reppy

[2015/07/23]: Fixed bug #135 (Fails to build on Linux PowerPC).

John Reppy

[2015/07/08]: Added Linux 4.* kernels to the list of operating systems recognized by the .arch-n-opsys script (fixes bug #134).

John Reppy

[2015/06/11]: Added Mac OS X 10.11 (El Capitan) to the list of operating systems recognized by the .arch-n-opsys script.

John Reppy

[2015/05/27]

Added support for Successor ML lexical extensions. These can be enabled using the command-line option -Cparser.succ-ml=true or by the assignment

Control.succML := true;

at the REPL. The extensions are as follows:

Underscore (“_”) as a separator in numeric literals; e.g., 123_456, 0wxff_ff_ff_f3, 123_456.1, …
end-of-line comments, which are denoted using (*). End-of-line comments properly nest into conventional block comments. For example, the following block comment is well formed:
```
(*
fun f x = x (*) my identity function *)
*)
```
binary literals for both integers and words; e.g., 0b0101_1110, or 0wb1101.

This change is the beginning of a program to add Successor ML feature to SML/NJ; See https://github.com/SMLFamily/Proposed-Definition-of-Successor-ML for more details.

John Reppy

Version 110.78; 2014/12/24

[2014/12/19]: Major revision of the machinery for overloading resolution for both operators (vars) and literals, now using a common mechanism. This fixes bug #52 by improving the error message when an overloaded operator is inconsistent with its context. Updated 23 files, including major changes in overload.sml, types.sml, unify.sml, elabcore.sml, typesutil.sml. The overload declaration is still used in pervasives.sml, where the order of the specified instances of an ordering determines the default interpretation (i.e., the first one).

The SCHEME and LITERAL forms of tyvars are replaced by a new OVLD form that tracks potential instantiations of the type of the overloaded vars or literals.

David MacQueen

[2014/12/18]: Moved base/NOTES/HISTORY file to doc/src/changelog/HISTORY.txt and converted it to ASCIIDOC format. Have also moved the README files from base/READMES to doc/src/release-notes. These changes are part of a general effort to rationalize and improve the documentation of the SML/NJ system.

John Reppy

[2014/12/13]: Preliminary cleanups before changes to overloading

Minor cleanup in Elaborator/elaborate/elabcore (function elabOVERLOADdec) and in ElabData/types/typesutil.sml (function matchScheme). Preparing for a new method of handling type checking of overloaded operators. [Note that there is no reason for the options field of OVLDvar to be a reference — it is never updated. Changing this requires corresponding change in pickling.]

Also added an etopdebugging flag (ElabControl) for debugging in elabtop.sml. Modified elabcontrol.{sml,sig} and elabtop.sml. Also rearranged ast and absyn printing in evalloop.sml.

David MacQueen

[2014/10/23]: Improved error messages in ml-ulex for unclosed strings. Also made documentation improvements.

John Reppy

[2014/10/11]: Added -D_FILE_OFFSET_BITS=64 flag to x86-linux makefile. This flag is necessary to avoid spurious EOVERFLOW errors on some versions of Linux. The problem appears to be limited to large file systems that have more than 2³² inodes.

John Reppy

[2014/09/13]: Added %value directive to ml-antlr; this addition improves the error repair choices by allowing non-nullary tokens to be inserted when making repairs.

John Reppy

Version 110.77; 2014/08/22

[2014/08/21]

Created new doc tree in SML/NJ repository. Currently this tree just holds the sources for UNIX-style manual pages for the command-line tools (fixing bug #35). The documentation is written using the ASCIIDOC format. Use the following svn command to checkout a copy of the documentation tree:

svn co https://smlnj-gforge.cs.uchicago.edu/svn/smlnj/doc/trunk doc

John Reppy

[2014/08/19]: Compiling the runtime system on cygwin was failing because the file exceptions.h was missing. It appears to have been part of previous versions, so a version has been incorporated verbatim in the file runtime/mach-base/cygwin-fault.c file (fixes bug #125).

John Reppy

[2014/08/19]: Added the actionToString' and repairToString' functions to the AntlrRepair structure. These functions allow one to specialize the printing of tokens based on whether they are being added or deleted.

John Reppy

[2014/08/17]: Added patches to support OpenBSD on PowerPC. The patches were contributed by Jasper Lievisse Adriaanse (fixes bug #124).

John Reppy

[2014/08/17]: Use mkstemp to implement OS.FileSys.tmpName() on systems that support it (should be all modern versions of Unix). This change fixes bug #128. (Thanks to Johannes 5 Joemann).

John Reppy

[2014/08/17]: Fixed a bug in IntInf.~>>, which did not handle negative arguments correctly (bug #110).

John Reppy

[2014/08/14]: Fixed a problem in the CPS contraction phase. An optimization that eliminates construction of a record that already exists was not checking that the existing record was the same record kind (bug #119).

John Reppy

[2014/07/28]: Switch to using MAP_ANONYMOUS to allocate memory on Linux systems. This change avoids problems when "/dev" does not support execute permission (as seems to be the case with some versions of Linux running on ChromeBooks; bug #120).

John Reppy

[2014/06/28]: Fix for bug #127 (Crash on Windows with OS.Process.system).

John Reppy

[2014/06/07]: Fixed a long-standing bug in Socket.recvVec, which prevented the result from being used in a string pattern match (thanks to Vesa Norrman for the fix).

John Reppy

[2014/05/01]: Fixed minor issue in an error message; type variable name should be printed with leading '.

John Reppy

[2013/11/25]: Added PackWord{16,32}{Big,Little} structures to the Unsafe module. This change makes the UNSAFE signature closer to the MLton version, although we still need to add the PackReal structures.

John Reppy

Version 110.76; 2013/07/01

[2013/06/04]: Fix bug #115 (BinPrimIO writer method getPos does not work under CML). Just needed to port the position update from mkReader code to the mkWriter code.

Lars Bergstrom

[2013/06/04]: Fix bug #111 (Socket.acceptNB returns somewhat broken sockets). The problem was that under Win32, sockets returned from accept inherit their parents' non-blocking status, whereas on UNIX they are always blocking.

Lars Bergstrom

[2013/05/20]: Fix bug #117 (BinIO.openAppend raises IO on non-existent file). We were opening the file for append if it existed but not creating it if it did not exist.

Lars Bergstrom

[2013/05/02]: Fix bug #116 (Socket.sameDesc raises Match exception). The problem is that on Windows the iodesc datatype (defined in Basis/Implementation/Win32/pre-os.sml) has both an IODesc constructor and a SockDesc constructor. Updated the code in Win32/os-io.sml to handle the SockDesc constructor.

John Reppy

[2013/04/19]: Fix bug #113 (Socket.select waits exactly twice the indicated timeout)

John Reppy

[2013/01/19]: Fix AMD64 code generator to properly sign-extend arguments to IDIVQ.

Lars Bergstrom

[2012/10/20]: Fix bug #108 (off-by-one error in Util/dynamic-array.sml; iterators crash)

John Reppy

[2012/10/20]: Fix bug #107 (Bogus Int64 comparison operators)

John Reppy

Version 110.75; 2012/10/01

[2012/09/28]: Fixed bug #92. IntInf.scan now handles the “0x” prefix correctly. Also made minor improvements to the NumScan module.

John Reppy

[2012/09/24]: Added Base64 module to SML/NJ Library to support encoding and decoding Word8 vectors as base64 strings.

John Reppy

[2012/09/23]: Additions to the SML/NJ Library. Added exists, existsi, all, and alli functions to ORD_MAP signature and implementations, and added all function to ORD_SET signature and implementations.

John Reppy

[2012/09/21]: Bug fix in ml-antlr to ensure that the generated toString function for tokens is strictly legal SML code (i.e., non-printing characters and UTF8 multibyte sequences are properly escaped).

John Reppy

[2012/09/11]: Added getu function to ULexBuffer as a way to improve ml-ulex performance. This addition allows a fastpath for processing ASCII characters, which improved lexer performance by 3-4%.

John Reppy

[2012/08/02]: Fixed bugs #89 and #96: Build Failure with Xcode 4.3 Also removed build support for MacOS X pre-10.5 (Leopard) on PPC and pre-10.6 (Snow Leopard) on Intel.

John Reppy

[2012/02/05]: Fixed bug #88. The check for valid arcs on Unix systems now allows any character other than slash or nul.

John Reppy

Version 110.74; 2012/01/20

[2012/01/20]: Fixed implementation of Real.signBit on little-endian machines.

John Reppy

[2012/01/19]: 1) Fix for bug #60 recalculate strictness for DEFtyc`s in functor bodies when functor is applied (`Elaborator/modules/evalent.sml)

2) Fix for bug #77 separate ast representations for datatypes and datatype replications in decs and specs (multiple files)

3) set version to 110.74

Details in NOTES/changes/dbm_2012_1.

Dave MacQueen

[2012/01/12]

Change of SourceMap interface. related to fix of off-by-one error in lexer (committed earlier?), and cleanup of noweb code added by Norman Ramsey many years ago (but little used today).
Slight cleanup of match compiler, eliminating compiler/FLINT/tempexpn.sml file that was part of unused implementation of pattern templates (pattern macros).
Modification of type checker to add "culprit tracking" for improved type error messages (printing of additional culprit information is controlled by ElabControl.showTypeErrorCulprits flag, default false, added in revision 3652). The culprit tracking needs to be debugged and improved, and the presentation of the culprits needs to be done better.

Details in NOTES/changes/dbm_2012_1.

Dave MacQueen

[2011/11/25]: Bug fixes for Unsafe.blastRead (bug #76): . proper error handling when reading from memory and there are not enough bytes. . pass correct data pointer and length to BlastIn (code was using old macros).

+ John Reppy

[2011/11/25]: Added hash-table-based implementation of sets to SML/NJ Library.

John Reppy

[2011/10/25]: Better error reporting under 32-bit linux for the missing dpkg support (bug #70). Enable 3.x kernels to build (bugs #80, #81, #83).

Lars Bergstrom

[2011/05/23]: Added new S-expression library to SML/NJ Library (contributed by Damon Wang)

John Reppy

[2011/05/17]: Fixed bug in JSON scanner (SML/NJ Library). It didn’t handle escaped backslash or double quote correctly.

John Reppy

Version 110.73; 2011/05/13

[2011/05/10]

Added boolean literals (true and false) to the conditional-expression syntax in CM. Thus, you can write

#if true
  structure Foo
#endif

in a CM file. This change is meant to make it easier to use autoconf to configure the build process of an SML application.

John Reppy

[2011/05/09]: Added missing String.scan function (bug #69). This also fixes the handling of certain corner cases by String.fromString.

John Reppy

[2011/05/03]: Added the RTDSC and RTDSCP instructions to the amd64 code generator.

Mike Rainey

[2011/04/08]: Added fix for comments in code bug (bug #63). Thanks to Michael Norrish.

John Reppy

[2011/04/08]: Fixed bug in Socket.acceptNB (bug #59)

John Reppy

[2011/03/31]: Fixed syntax error in ml-lex compatibility mode (bug #49)

John Reppy

[2011/03/22]: Update _arch-n-opsys script for Mac OS X Lion (10.7).

John Reppy

[2011/02/18]: Added Barriers module to CML.

John Reppy

[2011/02/10]: Fixed ml-yacc examples to respect the changed signatures with respect to TextIO.inputLine.

Lars Bergstrom

[2010/09/16]: Changed the Win32 implementation of validArc to support directories with extended characters (umlauts, etc.).

Lars Bergstrom

[2010/09/16]: Fixed the Win32 socket and polling implementation to work correctly with CML. Signature of poll was wrong and didn’t handle sockets at all.

Lars Bergstrom

[2010/06/16]: Fixed Real.toString and Real.fmt to include sign for negative zero.

John Reppy

[2010/03/23]: Fixed the bug with Win32 calls to OS.Process.system not quoting the string.

Lars Bergstrom

[2010/02/11]: Applied patch for building on more recent versions of NetBSD (bug #39).

Jon Riehl

Version 110.72; 2010/02/02

[2009/12/20]: Fixed performance bugs in List module by making @ and foldr be tail recursive (bug #51).

John Reppy

[2009/12/11]: Fixed the Win32 unable to print long strings bug (bug #37).

Lars Bergstrom

[2009/12/10]: Fixed an overrun during major GC. If the string arena was nearly full, it was possible for alignment padding added during copy to the to-space to overrun the allocated size.

Lars Bergstrom

[2009/11/18]: The ml-antlr and ml-ulex programs have been ported to build under mlton.

John Reppy

[2009/11/17]: Added %header directive to the ml-ulex scanner generator. Also updated the documentation.

John Reppy

[2009/11/17]: Added @SMLsuffix flag to sml command. This flag can be used to get the suffix for heap files.

John Reppy

[2009/11/17]: Added --strict-sml flag to ml-ulex for MLton compatibility.

John Reppy

[2009/11/10]: Added %header directive to the ml-antlr parser generator. Also updated the documentation.

John Reppy

Version 110.71; 2009/09/16

[2009/09/13]: Changes to support compiling the runtime system on Mac OS X 10.6 (aka Snow Leopard).

John Reppy

[2009/08/19]: Fixed a bug in the register-spill generator that is part of the MLRISC register allocator. The problem was that the code in RASpillWithRenaming functor assumed incorrectly that dedicated registers would appear in def/use information generated by ClusterRA. Thanks to Allen Leung for helping with this bug.

Mike Rainey

[2009/07/09]: Removed redundant implementations of various top-level operations by consolidating them in base/system/smlnj/init/pervasive.sml. This change also fixes a bug in that the top-level version of round was incorrect.

John Reppy

Version 110.70; 2009/06/15

[2009/06/12]

Corrected problem in config/actions that led to the so-called "unpickling bug" which appeared in version 110.68.
Provided fix for the 64-bit pattern match bug.

Matthias Blume

[2009/03/21]: Fixed bug in Int32.fmt when the argument was the minimum int and the radix was something other than DEC.

John Reppy

[2009/02/21]: Fixed bugs in how ml-antlr parsed ML types in %tokens specifications.

Aaron Turon

[2009/01/13]: Picking up some additional fixes for 110.69, including a fix for spaces in CM file paths.

Jon Riehl

Version 110.69; 2008/12/22

[2008/12/06]: Re-enabled some CPS optimizations (first_contract and eta). The most important effect of this change is to make uses of SMLofNJ.Cont.capture be properly tail recursive.

John Reppy

[2008/12/03]: New concurrency-related instructions for x86 and amd64 code generators.

PAUSE: Notify the CPU that the program is spin waiting.
MFENCE: memory fence for reads and writes.
SFENCE: memory fence for writes.
LFENCE: memory fence for reads.

+ Mike Rainey

[2008/12/02]: Added makefile and other support for building runtime on OpenBSD.

John Reppy

Version 110.68; 2008/08/13

[2008/08/11]: Minor fix to ml-ulex backend for regexps that match any character and perform a single action. Previously the emitted code would not allow the regexps to match any character at all.

Aaron Turon

[2008/08/05]: Added build support for OpenBSD (thanks to Brian O’Hanlon).

John Reppy

[2008/07/12]: Fixed Int64.fromString to use base-10.

John Reppy

[2008/04/12]: Various updates to the ml-lpt tree. The documentation has been updated; bugs in the parsing of negation and character classes in ml-ulex have been fixed; and changes have been made to make the ml-lpt tools more compatible with MLton (and other SML implementations). Thanks to Matthew Fluet and Aaron Turon for their patches.

John Reppy

[2008/20/04]: Implemented timer-based profiling on Windows, with behavior as close to the *nix ITIMER-based profiling as possible.

Lars Bergstrom

[2008/07/04]: Finished off the Windows subset of the basis library. Added process support and various configuration and system identification utils.

Lars Bergstrom

[2008/03/18]: Major changes to the RegExp library: see smlnj-lib/CHANGES for details.

John Reppy

[2008/02/14]: Added the Windows Status structure

Lars Bergstrom

[2008/02/14]: Added the Windows DDE structure

Lars Bergstrom

[2008/02/05]: Added the Windows Config structure

Lars Bergstrom

[2008/01/31]: Added outline of the Windows basis library and the basic registry functionality.

Lars Bergstrom

[2008/01/23]: Fixed the amd64 code generator to compile with the current MLRISC.

Mike Rainey

[2007/11/26]: nlffi was updated to work on Windows. It needed to pass in the correct value for the name of the kernel32 DLL to obtain 'base' bindings. Additionally updated the README for the most basic nlffi sample with what you need to do on MacOSX and on Windows to make it work.

Lars Bergstrom

[2007/11/21]: Overwrite the SMLNJ_HOME environment variable on installation Properly change the package code so that subsequent version installations prompt for uninstall (instead of 'repair/remove')

Lars Bergstrom

[2007/11/14]: Cleaned up WININSTALL file for new MSI-based setup.

Lars Bergstrom

Version 110.67; 2007/11/13

[2007/11/12]: Ensure that the size of the allocation space is at least 128K.

John Reppy

[2007/11/05]: Fixed type error in ml-lpt library that occurs when compiling against a basis that was compiled with the USE_64_BIT_POSITIONS symbol set. (Thanks to Johannes Joemann)

John Reppy

[2007/11/03]: Fixed the amd64 code generator to compile with the current MLRISC.

Mike Rainey

[2007/11/02]: Made sml.bat more resilient to either not having run the installer and having no SMLNJ_HOME set or having just shuffled the directory around.

Lars Bergstrom

[2007/11/01]: A collection of bug fixes for machine.sml in the Reactive library. (Thanks to Timothy Bourke)

John Reppy

[2007/10/28]: Patches for Mac OS X 10.5 (Leopard).

John Reppy

[2007/10/28]: Fixed some bugs in the AMD64 floating-point spilling code.

Mike Rainey

[2007/10/25]: Added support for the atomic XCHG instruction.

Mike Rainey

[2007/10/22]: Added AMD64 support for floating-point negation.

Mike Rainey

[2007/10/22]: Fixed ^C handling in Windows Added a Windows installer Made it possible to build for Windows on a mapped drive from Parallels

Lars Bergstrom

[2007/10/22]: Added AMD64 support for the atomic fetch and add instruction.

Mike Rainey

[2007/10/18]: The GAS output now favors p2align over align, since the former is guaranteed to be consistent over multiple architectures and the latter is not.

Mike Rainey

[2007/10/18]: Fixed a bug in register spilling.

Mike Rainey

[2007/10/2]: Added demo support for AMD64 for quick testing and fixed support for 64-bit label constants.

Mike Rainey

[2007/09/20]: Added support for the MLRISC COND instruction and the x86-64 CMOVCC instruction.

Mike Rainey

[2007/09/17]: Fixed an instruction-selection bug when loading 64-bit labels.

Mike Rainey

[2007/09/14]: Fixed Elaborator and Translate performance bugs

George Kuan

[2007/09/12]: Fixed xorl memory argument bug.

Mike Rainey

[2007/07/27]: Added f64sgn (for Real64.signBit) as a primop defined in MLRiscGen. signBit(~0.0) not handled correctly.

George Kuan

[2007/06/21]: Fixed bug in {TextIO,BinIO}`.StreamIO.endOfStream` that would incorrectly signal end of stream.

John Reppy

[2007/06/12]: Eliminated config/allsources. The information is now drawn directly from config/actions.

Matthias Blume

Version 110.65; 2007/06/07

[2007/06/06]

Aaron: fixed a number of bugs in ml-ulex.
Matthias:
- added CM control cm.force-tools; this is false by default; when set to true, then tools like ml-yacc, ml-lex, ml-ulex, etc. will be forced to run regardless of whether or not their targets are up-to-date
- changed installer code so that config/install.sh will re-build heap images for all tools even if those heap images already existed

Matthias Blume

[2007/06/04]: After Aaron Turon’s bug fix for ml-ulex (handling the ^ character in legacy mode), re-ran the lexer generator on all lex input files and committed the results.

This should fix the problem with ckit and nlffi that was reported by Vesa A. Norrman.

Matthias Blume

Version 110.64; 2007/05/31

[2007/05/31]

3rd merge of base from primop-branch-3 into the trunk. Additional bug fixes included:

Timer.cpu_timer, etc. type printing corrected (by making Timer have opaque sig constraint in basis/Implementation/timer.sml)
Infinite loop in FLINT (tests/typing/tests/25.sml) (fix by Stefan Monnier)

Dave MacQueen

[2007/05/31]: Fixed some bugs in new Div code in FLINT/trans/translate.sml.

Also, changed the handling of "no core access": When translate.sml needs access to a core exception at a time when the core has not been set up yet (this only happens when compiling system/smlnj/init/*), then don’t bother generating the corresponding tests.

The old scheme was to generate a bogus value to be used in place of the exception. Unfortunately, that confuses the plambda type checker. Moreover, it does not do any good, because at runtime we don’t expect such an exception to be ever raised. (The code in system/smlnj/init/* has to be written very carefully with this in mind!)

Matthias Blume

[2007/05/29]: Added FSQRT instructions for the AMD64 code generator.

Mike Rainey

[2007/05/29]: FLINT/trans/translate.sml now wraps all DIV/MOD/QUOT/REM operations with an explicit test for zero division. This should fixe several regressions and makes it possible for downstream optimization phases to treat these operations as "pure" when they are applied to unsigned operands.

Matthias Blume

[2007/05/29]: Added the new MLRISC code generator for the AMD64. This version, in contrast to the previous one, uses SSE registers and instructions for all floating-point computations.

Mike Rainey

[2007/05/29]: A number of fixes related to the formatting of dates. These include fixes for bugs #1415 and #1416. We also now correctly handle format characters that lie outside the specified set.

John Reppy

[2007/05/23]

CMB (and CM) now automatically defines the CM "preprocessor" symbol NO_PLUGINS during "makeml -rebuild" or when CM operates in "slave" mode.

(In addition, CMB_REBUILD_MODE is defined for `makeml -rebuild`,
 CM_SLAVE_MODE is defined in attached slaves while running CM.make
 or CM.rebuild, and CMB_SLAVE_MODE is defined in attached slaves
 while running `CMB.make`.  The point is that the single symbol
 NO_PLUGINS is definde in all three cases.)

I changed the three locations within the sources that get compiled during CMB.make where ml-yacc or ml-lex input is processed: When NO_PLUGINS defined, then the use of the mlyacc and mllex tools is bypassed.

When bootstrapping new versions of the compiler, there can be situations where the plugin tools for ml-yacc and ml-lex (or ml-ulex) are not available or otherwise not operational. In this case one can manually define the NO_PLUGINS symbol prior to running CMB.make(). To do so, the following command should be issued at the interactive prompt:

   #set (CMB.symval "NO_PLUGINS") (SOME 1);

A CMB.make with NO_PLUGINS defined relies on the existence of the files that normally would be generated by ml-yacc and ml-lex. (Copies of these files are in the repository.)

Matthias Blume

[2007/5/23]: Second merge of base from primop-branch-3 into the trunk. Additional bug fixes included:

Date.scan and Date.fromString fixed;
Overloading resolution fixed and some type printing problems corrected.

+ Dave MacQueen

[2007/05/23]: Changed the installation mechanism for CM tool plugins. These are just libaries and now get installed like ordinary libaries.

There are now a number of new installation targets that give some fine-grain control over what classes and suffixes are known, and what they will map to. See config/targets for details.

The code that caused plugin installation as part of running a tool’s "build" script has been removed. (The build script is for building, not for installing.)

Matthias Blume

[2007/05/22]: Added a boolean control named cm.tolerate-tool-failures (env. variable name: CM_TOLERATE_TOOL_FAILURES). The default is false and makes CM fail if a shell tool reports a non-success exit status. If the control is set to true, then CM will press on after tool failures in the event that the target files exist (even though they are considered outdated). Turning the control to true can be useful for bootstrapping.

Matthias Blume

[2007/05/19]

Merge of ml-lpt revisions for 110.64.

The name of several ml-lpt-lib modules has changed:

Repair

⇒

AntlrRepair

StreamPos

⇒

AntlrStreamPos

ErrHandlerFn

⇒

AntlrErrHandler

EBNF

⇒

AntlrEBNF

The ml-antlr specification format has changed: declarations such as %tokens and nonterminal definitions can occur multiple times in the same specification. The semantics are such that each new declaration extends the previous ones. This does not apply to %start or %name, of course.

Importing a grammar via %import now includes all declarations in from the imported grammar, except for %name, %entry, and %start. Tokens and nonterminals can be dropped using the new %dropping clause of the %import directive; the separate %drop and %extend have been removed.

We now allow optional type annotations on nonterminals, using the %nonterms directive as in ml-yacc.

The refcell construct is now implemented using SML’s regular reference cells, so the :== and !! notation has been deprecated.

The ml-antlr tool now does much more checking of specifications, and its error messages have been greatly improved. Error repair for generated parsers has been completely rewritten, and is now both much faster and more accurate.

ml-ulex is now more lenient with escape codes (non-SML-standard escape codes are now interpreted literally, so e.g., \| denotes “|”). Also, character classes may now include a “-” character at the beginning as is standard in most other regexp tools.

All of these changes are documented in the user guide, which has been updated and improved with this merge.

Aaron Turon

[2007/05/03]: Merge of the primop3 branch (base) into the trunk to create 110.63.1. Significant changes in FLINT and the front end, mostly having to do with a reorganized system for handling primops. Various bug fixes and improvements in printing signatures.

David MacQueen

[2007/05/02]: Preliminary commit of large ml-lpt revisions (more to come for 110.64). The ml-antlr specification format has changed: declarations such as %tokens and nonterminal definitions can occur multiple times in the same specification. The semantics are such that each new declaration extends the previous ones. Grammar extension constructs have also changed. We now allow type annotations on nonterminals. Finally, the refcell construct is now implemented using SML’s regular reference cells, so the :== and !! notation has been deprecated. All of these changes will appear in the 110.64 user guide for ml-lpt.

The ml-antlr tool now does much more checking of specifications, and its error messages have been greatly improved. There has also been some work on the error repair process for generated parsers, but this will be further improved in 110.64.

Aaron Turon

[2007/04/24]: More Basis fixes: The Char.fromString (etc.) functions did not handle the “\uxxxx” escape sequence. There is still an outstanding bug with String.fromString the tail is a format escape. I added a comment to this effect in Basis/Implementation/string.sml. Thanks to Andreas Rossberg.

John Reppy

[2007/04/24]: Added next function to Fifo and Queue modules in the SML/NJ Library.

John Reppy

[2007/04/23]: More Basis fixes: Time.fmt dropped the leading “~” for negative time values that had no whole part.

John Reppy

[2007/04/23]: More Basis fixes: the OS.Path module did not include the InvalidArc exception and did not do sufficient argument checking. Thanks to Stephen Weeks and Adam Chilpala.

John Reppy

[2007/04/16]: Fixed an unbound functor bug in the AMD64 CM file.

Mike Rainey

[2007/04/12]: Substantially changed the signature and implementation of AMD64 SVID. It now looks similar to the ia32 SVID, yet uses staged allocation.

Mike Rainey

[2007/04/07]: More Basis fixes: the WORD signature was missing {to,from}`Large.` Thanks to Andreas Rossberg.

John Reppy

Version 110.63; 2007/03/22

[2007/03/19]: Fixed bogus operand sizes in AMD64 instruction spilling.

Mike Rainey

[2007/03/19]: Fixed a number of inconsistencies between the Posix.TTY structure and the Basis specification. Thanks to Adam Chilpala.

John Reppy

[2007/02/26]: Added preliminary support in MLRISC for Staged Allocation, a technique for specifying calling conventions (see http://www.eecs.harvard.edu/~nr/pubs/staged-abstract.html). Initially, we plan to use this code to generate C calls for the AMD64.

The staged allocation code base resides in MLRISC/staged-allocation, and specialized calling conventions go in MLRISC/ARCH/staged-allocation.

Mike Rainey

[2007/02/20]: Bug fix: when SaveCState was called with two values to save, a subsequent GC could cause the RestoreCState to fail because the saved state had been promoted to tagless pair.

John Reppy

[2007/02/20]

Fixed bug triggered by:

    val a1 = Word8Array.array(a1, 0w0);
    val _  = Word8Array.update(a1, 0, 0w128);

The x86MCEmitter crashed when the immediate operand to MOVB was outside of the range -128 … 127. I’ve changed the code so that the range check is disabled. Only the low order 8 bits of the immediate operand are now significant.

Allen Leung

[2007/02/15]: Eliminated any mention of lexgen, which was an early precursor to ml-ulex.

You should update your admin directory, so the shell scripts for maintaining your local copy of the repository reflect this change.

Matthias Blume

[2007/02/14]: Fixed bug in CM’s parallel make facility that failed to have the master re-link modules after letting slaves compile them.

Matthias Blume

[2007/02/12]: Fixed typo in ml-build script that prevented library anchors from being registered.

Matthias Blume

Version 110.62; 2007/02/02

[2007/01/31]: Brought ml-lpt manual up-to-date with the code. Many minor improvements to the tools, and a few minor bugfixes. Prepared ml-lpt-lib for integration with new UTF8 structure. Changed the interface for creating streams in ml-ulex (we now support stream creation from several kinds of sources). Overall, the tools are now quite stable, and their interfaces are unlikely to change in a way that would break compatibility. More work, however, is needed in the documentation.

Aaron Turon

[2007/02/01]: Implemented library installer. Moved CM plugin code for ml-burg, ml-lex, and ml-yacc out of CM source tree and into their respective trees. Implemented CM plugin for ml-ulex and ml-antlr. Used library installer for ml-burg, ml-ulex, and ml-antlr.

For ml-yacc and ml-lex we continue to have permanently "plugged-in" CM tools. (It turns out to be too messy to do otherwise because there is too much code that during installation relies on the presence of these tools — resulting in a tricky ordering problem.)

Matthias Blume

[2007/01/30]: Added SMLofNJ.shiftArgs which is like a shell’s "shift" command. Modified CM’s startup code to use shiftArgs as it processes command line arguments. This way, the init code in each .sml-file or library that is mentioned at the top level will see only those arguments that have not yet been processed at this point. In other words, the init code can "seize control" and process the remaining command line.

Matthias Blume

[2007/01/30]: Added fromList function to the ORD_SET interface and lookup to the ORD_MAP interface. See the SML/NJ Library CHANGES file for details.

John Reppy

[2007/01/28]: Added the UTF8 structure and signature from the Moby compiler to the SML/NJ library (and the CML library). These modules will replace the version in the ml-lpt-lib.

John Reppy

[2007/01/26]: Added entries to handle ml-lpt-lib.cm in installer.

Matthias Blume

Version 110.61.1; 2006/12/15

[2006/12/15]: Fixed brown-paper-bag bug with CM’s pathname handling, which made installation fail under Win32.

This supersedes the pre-brown-paper-bug release (see below).

Matthias Blume

Version 110.61; 2006/12/14

[2006/14/06]: Fixed the code in runtime/c-libs/posix-tty/{tcgetattr,tcsetattr}`.c` to get the c_cc termios data copied correctly. Also moved the allocation of the string to avoid problems if it caused a GC.

Thanks to Timothy Bourke for the bug report and fix.

John Reppy

[2006/12/08]: Added code to CM’s "standard shell tool" implementation which causes it to tolerate (with a warning) the situation where the shell command fails (e.g., due to the shell command’s non-existence) as long as all target files exist.

This makes it possible to, e.g., build ml-yacc from sources even if svn checkout messed up the time stamps on files in such a way that yacc.grm is younger than yacc.grm.sml or yacc.grm.sig. (Ml-yacc would be needed to re-process yacc.grm, but obviously it might not yet be available at that time.)

Matthias Blume

[2006/12/06]: Fixed the types of recvVecFrom, recvVecFrom', recvVecFromNB, and recvVecFromNB' in the SOCKET signature. This error is actually in the SML Basis specification too.

John Reppy

[2006/12/05]: CM now reports undefined anchors as errors and aborts execution rather than silently pressing on using bogus values.

Matthias Blume

[2006/11/29]: Use Say.vsay for printing the “[autoloading]” message, so #set CM.Control.verbose false (or -Ccm.verbose=false) can be used to suppress them.

Matthias Blume

[2006/11/10]: Fixed bug in CM where “with:” specifications that affect compilation (as opposed to parsing) were ignored.

Matthias Blume

Version 110.60; 2006/11/09

[2006/11/09]

This is the first subversion-hosted release. There are also changes to the directory layout. Here is a table:

config

⇒

config

src/cm

⇒

base/cm

src/compiler

⇒

base/compiler

src/runtime

⇒

base/runtime

src/system

⇒

base/system

src/cm/pgraph

⇒

pgraph

src/READMES

⇒

base/READMES

`src/$`note

⇒

`base/NOTES/$`note

for note in BOOT CVSNOTES CYGWININSTALL HISTORY INSTALL MACOSXINSTALL WININSTALL

src/smlnj-lib

⇒

smlnj-lib

src/MLRISC

⇒

MLRISC

ckit

⇒

ckit

src/cml

⇒

cml

src/eXene

⇒

eXene

src/heap2asm

⇒

heap2asm

src/lexgen

⇒

lexgen

src/ml-burg

⇒

ml-burg

src/ml-lex

⇒

ml-lex

src/ml-yacc

⇒

ml-yacc

src/ml-nlffi-lib

⇒

nlffi/lib

src/ml-nlffigen

⇒

nlffi/gen

src/smlnj-c

⇒

smlnj-c

src/tools/TraceDebugProf

⇒

trace-debug-profile

(All pathnames are relative to the SML/NJ "root" directory.)

In addition, there is also a new

  ml-lpt

directory containing two new program generator tools: ml-ulex and ml-antlr (a lexer generator that handles unicode and an ANTRL-inspired LL(k) parser generator). These tools are currently "beta-quality"

The latest versios of the sources can now be obtained anonymously via subversion. For this, it is useful to first check out

  svn://smlnj-gforge.cs.uchicago.edu/smlnj/admin

and put the resulting directory on your shell’s PATH. This provides access to three shell scripts: checkout-all.sh, stat-all.sh, and refresh-all.sh.

To create a freshly checked-out copy of the sources, do

  checkout-all.sh [dir]

where dir is the optional SML/NJ root directory (default is ".").

This creates the above directory layout. Each subdirectory of the root is under individual subversion control. The stat-all.sh and refresh-al.sh scripts apply "svn stat" or "svn update" to each of these subtrees.

Matthias Blume

[2006/11/02]

Reorganized directory layout.

This is a temporary solution, more reorganization is to come.

The basic idea is to have a number of toplevel trees, each corresponding to a well-defined part of the overall system. Each part can be maintained individually, even in separate source repositories, although currently we still serve everything out of the main smlnj-gforge tree.

Installer and and scripts have been updated to reflect the new layout. The installer (base/src/system/smlnj/installer) is now "scriptable" to avoid burning too much knowledge about the layout into SML source code. The main script used by the installer is in config/actions.

The main change is that many of the subdirectories of what used to be known as "src" have moved to the toplevel. The "src" directory itself has moved down into a subtree called "base". (We may eventually get rid of the extra level of indirection represented by "src".)

The layout is now as follows:

toplevel tree name default repository (using svn://smlnj-gforge.cs.uchicago.edu/smlnj for $gf) +

toplevel tree name	default repository (using `svn://smlnj-gforge.cs.uchicago.edu/smlnj` for `$gf`) +
`config`	`$gf/config/trunk` +
`base`	`$gf/sml/trunk` +
`smlnj-lib`	`$gf/smlnj-lib/trunk`
`MLRISC`	`$gf/MLRISC/trunk` +
`ml-yacc`	`$gf/ml-yacc/trunk`
`ml-lex`	`$gf/ml-lex/trunk`
`ml-burg`	`$gf/ml-burg/trunk`
`lexgen`	`$gf/lexgen/trunk`
`heap2asm`	`$gf/heap2asm/trunk` +
`cml`	`$gf/cml/trunk`
`eXene`	`$gf/eXene/trunk`
`ckit`	`$gf/ckit/trunk`
`nlffi`	`$gf/nlffi/trunk` +
`smlnj-c`	`$gf/smlnj-c/trunk`

config

$gf/config/trunk +

base

$gf/sml/trunk +

smlnj-lib

$gf/smlnj-lib/trunk

MLRISC

$gf/MLRISC/trunk +

ml-yacc

$gf/ml-yacc/trunk

ml-lex

$gf/ml-lex/trunk

ml-burg

$gf/ml-burg/trunk

lexgen

$gf/lexgen/trunk

heap2asm

$gf/heap2asm/trunk +

cml

$gf/cml/trunk

eXene

$gf/eXene/trunk

ckit

$gf/ckit/trunk

nlffi

$gf/nlffi/trunk +

smlnj-c

$gf/smlnj-c/trunk

In $gf/admin there are a few useful shell scripts for checking out and maintaining the entire collection of trees:

admin/checkout-all.sh [dir]: optionally creates dir and checks out all trees from their default repositories; if dir is missing, checkout into the current working directory.
admin/refresh-all.sh [dir]: looks at all trees (from the above list) in dir (default: .) and runs “svn update” if the tree exists and is under subversion control; non-existing or non-subversion trees are skipped
admin/stat-all.sh [dir]: like refresh-all.sh, but runs “svn stat” instead

Matthias Blume

[2006/10/05]: Merged code for AMD64 backend (Mike Rainey’s work). Everything is hooked up but untested.

Matthias Blume

Version 110.59; 2006/05/17

[2006/05/17]

I am freezing 110.59. Changes other than the version-number increase:

eXene

committed changes to eXene from Alley Stoughton: "fixed bugs in X authorization and resource handling, as well as in the pile and viewport widgets"

Runtime

fixed linking problem with NetBSD 3.x.

Lexgen

lexgen tool handles non-ascii characters in 7-bit mode the same way that ml-lex does
lexgen propagates exceptions the same way that ml-lex does

CML

Fixed a bug in the SyncVar polling functions (iGetPoll, mTakePoll, and mGetPoll) that could lead to livelock.

Matthias Blume

[2006/05/12]: Implemented ml-makedepend (i.e., CM.sources) in a better (more robust) way. This should hopefully fix the ml-makedepend problem permanently.

Matthias Blume

[2006/05/12]: Fixed long-standing bug with ml-makedepend where it would output a spurious dependency to a non-existing file. (This is a simple fix. It might need further looking into.)

Matthias Blume

[2006/04/20]: Committed patches received from Johannes 5 Joemann (joemann@befree.free.de) that enable heap2exec under Linux and FreeBSD.

Matthias Blume

[2006/04/14]

MLRISC changes:

renamed GAS_PSEUDO_OPS to AS_PSEUDO_OPS and put it in its own file.
added support for NOTB and XORB operators in pseudo-op expressions
added DarwinPseudoOp functor that supports Darwin’s assembler syntax.
added support for 64-bit integer literals

John Reppy

Version 110.58; 2006/03/03

[2006/03/01]: Incorporated several bugfixes to lexgen. Compiler now compiles to fixpoint when using lexgen instead of ml-lex.

Matthias Blume

[2006/02/28]: Removed ml-flex and added lexgen instead, using Aaron Turon’s newly provided tarball. The generated lexers still have problems.

Matthias Blume

[2006/02/26]: Removed ml-flex's dependency on regexp-lib.cm. Turned installation of ml-flex on by default.

Matthias Blume

[2006/02/24]: Added ml-flex sources. Partially integrated, but should not be turned on yet! (Read: leave it commented-out in config/targets!)

Matthias Blume

[2006/02/23]: Changes to support building on x86-64 systems (using the 32-bit mode). Also cleaned up signal handling on Linux. Support for pre-2.2 Linux kernels dropped.

John Reppy

[2006/02/22]: CM has changed. Updated the script for rebuilding the MLRISC generated files.

Allen Leung

[2006/02/14]: Hooked code for Darwin-specific Intel ABI into main compiler. (This is a temporary solution which relies on the fact that the compiler itself does not use NLFFI. Eventually we need to divorce intel mac from generic x86 unix code and make separate sets of binaries.)

Matthias Blume

[2006/02/13]: Changed MLRISC x86 CCalls for partial support of Mac OS X ABI.

John Reppy

[2006/02/06]: Changes to support Mac OS X on Intel hardware. The C-calls support in MLRISC must also be updated to support the Mac OS X ABI.

John Reppy

[2005/12/16]: Improved error reporting and handling in CM.

Matthias Blume

[2005/11/21]: Enabling $/html-lib.cm in config/preloads did not work. This is fixed now. (Since the anchor mapping for html-lib.cm is not yet in effect at the time when config/preloads is processed, the library has to be referred to by another name. In 110.57 this would be $SMLNJ-LIB/HTML/html-lib.cm. I arranged for $smlnj/smlnj-lib/html-lib.cm to be valid as well.)

Thanks to Todd Wilson (Fresno) for alerting me to this issue.

Matthias Blume

Version 110.57; 2005/11/19

[2005/11/19]: Fixed a problem in config/install.sh where it tries to "fish" the name of the CM metadata directory from the wrong place (because the physical location of basis.cm has changed). Also, corrected the path anchor for $/html-lib.cm. (Thanks to M. Fluet for pointing out these problems.)

Matthias Blume

Version 110.57; 2005/11/16

[2005/11/16]: Fixed problem with bogus exception message when using back-trace facility.

Matthias Blume

[2005/11/15]: Added simple implementation of Array2.copy. (Warning: mostly untested.)

Matthias Blume

[2005/11/15]: Reversed change to src/system/smlnj/internals/versiontool.cm. This file gets loaded as a tool — by the equivalent of CM.make during the run of CMB.make. Thus, CMB’s path configuration is meaningless for it. Instead, it has the status of "user code", so it should use $/basis.cm to refer to the Basis library. (At least that’s true for the purpose of bootstrapping the previous change. In the future it might make sense to have versiontool.cm refer to $smlnj/basis/basis.cm, i.e., the version of the Basis that the compiler itself uses.)

Also patched src/system/testml to have it activate those extra anchor bindings in config/extrapathconfig.

Matthias Blume

[2005/11/15]

This change affects the way the following libraries are tied into the system:

   $/basis.cm
   $/smlnj-lib.cm
   $/pp-lib.cm
   $/controls-lib.cm
   $/html-lib.cm
   $/ml-yacc-lib.cm

These libraries are now internally (as seen from the source code of the implementation itself) known by the following names:

   $smlnj/basis/basis.cm
   $smlnj/smlnj-lib/smlnj-lib.cm
   $smlnj/smlnj-lib/pp-lib.cm
   $smlnj/smlnj-lib/controls-lib.cm
   $smlnj/smlnj-lib/html-lib.cm
   $smlnj/ml-yacc/ml-yacc-lib.cm

This makes it possible to work with code that requires different versions of these libraries, and which refers to these libraries using their "default" names (i.e., the first set of names above). In other words, one can un-define or re-define those default names without compromising the proper functioning of the compiler itself.

A similar procedure had already been performed for several of the MLRISC libraries that are linked into the compiler. I did some cleanup on this code.

A new file in the config directory (named extrapathconfig) is responsible for setting up path anchors that the compiler itself does not need, but that are typically required by user code.

Matthias Blume

[2005/11/07]: Fixed erroneous out-of-bounds test in the “update” function of various *ArraySlice modules. (Thanks to Vesa A. Norrman for pointing out the problem.)

Pushed some Basis changes through ML-Lex, CML, and eXene.

Matthias Blume

[2005/11/07]: Fixed a Basis incompatibility: The depreciated function Substring.all was removed (use Substring.full instead).

John Reppy

[2005/11/05]: Tweaked interval set API in SML/NJ library; see the CHANGES file for details.

John Reppy

[2005/11/02]

Runtime system bootstrap code now accepts hex digits in BOOTLIST in either upper case or lower case format.
Pushed changes to names of Pack<N>{Big,Little} structures through CML and eXene.

Matthias Blume

[2005/11/02]: Fixed a Basis incompatibility: Pack<N>{Big,Little} structures should be named PackWord<N>{Big,Little}.

John Reppy

[2005/10/28]: Fixed a minor Basis incompatibility: hex digits should be upper case.

John Reppy

Version 110.56; 2005/10/25

[2005/10/25]: Added interval sets to utility library (signatures INTERVAL_DOMAIN and INTERVAL_SET, and functor IntervalSetFn).

John Reppy

[2005/10/14]: Add Zhong Shao’s fix for datatype equality functions.

John Reppy

[2005/10/14]: Bug fix a bug found by Carl Hauser. There was a typo in the reload code for FCMP in x86SpillInstr.sml.

Allen Leung

[2005/10/14]: Removed some debugging code in file x86Asm.sml. The function emit_operand was printing out debugging output.

Allen Leung

[2005/07/27]: Fixed ml-lex to recognize “\r” as representing carriage return.

John Reppy

[2005/07/27]: Fixed ml-yacc to work on files with non-native end-of-line encodings (e.g., Windows text file on a Unix system).

John Reppy

[2005/07/20]: Added changes from Dominic Evans (oldmanuk (at) gmail (dot) com) to support HPUX 11.

John Reppy

[2005/07/06]: Changes to the SML/NJ Library. See smlnj-lib/CHANGES for details.

John Reppy

[2005/07/06]: Fixed reversed logic for deciding whether to "copy up" or "copy down" in *-array-slice.sml.

Matthias Blume

[2005/05/31]: A typo in the Cygwin code fixed.

Allen Leung

[2005/05/31]: Updated Cygwin’s fault/signal handling to match the Windows version. Updated the export list.

Allen Leung

Version 110.54; 2005/05/18

[2005/05/18]: Added support scripts for Mac OS X PackageMaker and modified config/install.sh so that it supports re-dumping a heap image after customization.

Matthias Blume

[2005/05/18]: Un-overloaded / to work around bug in overloading resolution code.

Matthias Blume

[2005/05/16]

Added mechanism for re-creating a heap file for the interactive system after configuration variables have been changed.

   CM.redump_heap : string -> unit

This is much like SMLofNJ.exportML, but starting from the resulting heap does not return to the caller of CM.redump_heap but restarts the interactive system from scratch. The original call of CM.redump_heap does not return but ends the interactive session. Thus, CM.redump_heap is a lot like SMLofNJ.exportFn.

Internally, redump_heap winds the dynamic execution context back to the point where the original heap image was created and re-executes the heap image generation code in the boot code.

Matthias Blume

[2005/05/09]: Added a hack to the existing hack known as Word64 to make fromString behave correctly. I am still not sure whether Word64.scan will work as specified with respect to the interaction of radix and prefix.

Matthias Blume

[2005/05/04]: Added a gc protocol checking phase. This phase is enabled with the flag "check-gc". "debug-check-gc" turns on the verbose mode.

Allen Leung

[2005/05/04]: Fixed a bug in the implementation of div and mod for IntInf. Thanks to Neophytos Michael for reporting the problem.

Matthias Blume

[2005/05/04]: Added the join combinator to the ParserComb module in the SML/NJ Library.

Matthias Blume

[2005/02/28]: Fixed serious bug (brown paper bag variety) in new implementation of structure Atom in CML. (I had accidentally used a mailbox instead of an mvar, leaving the door open for races.)

Matthias Blume

Version 110.53; 2005/02/25

[2005/02/25]: Brought back SMLofNJ.Susp. The underlying suspension type is the one implemented in Core, which means that it is the same as the one used by the lazy extension.

Matthias Blume

[2005/02/24]: Simpler and at the same time more general implementation of structure Atom in CML.

Matthias Blume

[2005/02/15]: Created new “tools” directory under “src” and moved “TraceDebugProf” there.

Matthias Blume

[2005/02/10]: Implemented “long long” arguments and results for NLFFI. (Only the PPC/MacOS implementation is complete, the other backends still need to be updated.)

Matthias Blume

[2005/01/24]: Minor cleanup in ML-Yacc rule printing mechanism. This should fix a problem with certain "as" patterns which previously got rendered using incorrect syntax.

Matthias Blume

[2005/01/18]: Made time profiling code (interrupt handler) in runtime system aware of new array representation.

Matthias Blume

[2005/01/14]

Implemented new (but still experimental) heap2exec facility. This is tested under Mac OS X and should work under Linux (will test shortly). It will probably also work on the Sparc (will test some time later). Also removed old “HACKED_STANDALONE” hack from runtime

To be able to test heap2exec, uncomment the request for “heap2asm” in config/targets prior to installation. (Notice that this is different from "heap2exec" mentioned below. Not a typo.)

To perform an actual test, run the command

$ bin/heap2exec heapfile execfile

(You can put heap2exec on your shell’s path.)

For example, run

$ bin/heap2exec bin/.heap/ml-yacc.ppc-darwin mly

This will create a standalone executable called “mly” that you can then invoke directly as a command.

Matthias Blume

[2005/01/07]: fixed off-by-one error in ML_STRING macro (globals.c)

Matthias Blume

[2004/12/23]: Made ml-build script "smarter" (but only very little).

Matthias Blume

[2004/12/21]

Implemented access to signed and unsigned long long data in NLFFI. (The parameter-passing part of the picture has not complete. But data structure access seems to work.)
Fixed CM's incorrect assumption that the PPC is little-endian. (On the Mac, it is big-endian. And that’s currently our only PPC platform.)

Matthias Blume

[2004/12/21]: Some cleanup in the $c/memory.cm library: separated some concerns by moving allocation code and memory access code each into their own files.

Matthias Blume

[2004/12/17]: The Unix I/O library of SML/NJ on Cygwin does not understand Windows style pathname, so problems arise when SMLNJ_HOME is set to a Windows style pathname. The _run-sml script now converts SMLNJ_HOME to a POSIX pathname on Cygwin.

Allen Leung

[2004/12/16]: Last-minute changes incorporated into 110.52. Release tag moved.

The changes: - HashString.hashString' → HashString.hashSubstring - bug fix in UnivariateStats

+ Matthias Blume

[2004/12/15]

+HashString.hashString' → HashString.hashSubstring
corresponding changes in atom.sml
"de-compressed" (aka. un-obfuscated) code for UnivariateStats and added some comments

Matthias Blume

Version 110.52; 2004/12/15

[2004/12/15]: More on the space problem (this time for Win32).

Matthias Blume

[2004/12/14]: Hacked some of the scripts (in particular: the installer) to cope with spaces in filenames a bit better. But beware: the current "solution" is likely still full of bugs and inherently incomplete. (We need to do away with those shell scripts for a comprehensive solution.)

Matthias Blume

[2004/12/13]: Fixed bug in code for ml-makedepend.

Matthias Blume

[2004/12/09]: Added two simple but potentially useful statistics modules to SML/NJ Library. (See CHANGES file there.)

Matthias Blume

[2004/12/01]

Updates to SML/NJ Library

Added function HashString.hashString' for substrings.
Hand-inlined CharVector.foldl into HashString (for speed).
Modified implementation of structure Atom to avoid extracting strings from substrings unless necessary.

(Also see CHANGES file for smlnj-lib.)

Matthias Blume

[2004/11/24]: Made sure CML compiles when the Position structure is Int64.

Matthias Blume

[2004/11/24]

The compiler can now be compiled in a mode that makes structure Position equal to Int64. The default, however, is unchanged (Position is Int31) for the time being.

To enable 64-bit positions, use the following procedure:

Start sml
Autoload $smlnj/cmb.cm (if not already autoloaded)

Type

#set (CMB.symval "USE_64_BIT_POSITIONS") (SOME 1);

Run CMB.make() as usual.

This is barely tested. The only test so far was a little SML program counting the number of characters in an 8-gigabyte file by reading it character-by-character. That test was successful.

In support of 64-bit positions, a number of new functions have been added to the runtime system.

Matthias Blume

[2004/11/23]: Fixed a problem with unhelpful error messages related to problems with .cm or .sml files that appear as part of the sml command line.

Matthias Blume

Version 110.51; 2004/11/18

[2004/11/18]: Enabled dlopen and friends for FreeBSD (as recommended by Johannes 5 Joemann).

Matthias Blume

[2004/11/17]: Added support for MLTree constructs LIVE and KILL to all the architectures.

Allen Leung

[2004/11/13]

Stripped down the versiontool: It now only handles the version number. The date string is generated at bootstrap time (during makeml).
In a previous commit, fixed a minor issue with how polyequal is being translated. In particular, the code now "looks through" abstractions. This results in slightly fewer polyEqual warnings and hopefully slightly more efficient code. Important examples for where this matters are the new int64 and word64 types.

Matthias Blume

[2004/11/12]: Structure Int64 fully hooked in. (The implementation is not very efficient, though.)

Matthias Blume

[2004/11/11]: All the pieces of Word64 are now there, with the exception of the conversions from and to LargeWord. (Eventually these need to be identities, but for the time being they don’t even make sense because LargeWord is 32-bit wide.)

Also started to add similar support for Int64, but major pieces of that are still missing.

Matthias Blume

[2004/11/11]: Structure Word64 is now (almost) complete, word literals and patterns seem to work. There are a few odd pieces missing. In particular, I didn’t do the {from,to}`LargeWord` parts because LargeWord is still Word32 at the moment.

Making Word64 official would mean that LargeWord becomes Word64. But this requires extreme care because most word-word conversions have to go through LargeWord, so making a mistake means loss of efficiency or worse. Eventually there will be a solution similar to (but actually simpler than) what I did with IntInf.

Matthias Blume

[2004/11/10]: More 64-bit hacking (but still not even half-way there yet). Also, some assorted improvements to the handling of 8-bit words.

Matthias Blume

[2004/11/09]: Started adding some infrastructure for supporting 64-bit int- and word-types. (Still in its very early stages.)

Matthias Blume

Version 110.50; 2004/10/28

[2004/10/28]

Changend config/srcarchiveurl from a file just containing the URL string into a file containing shell script code. The code has access to the $VERSION variable.
Made corresponding changes to config/install.sh and config/unpack.
Default contents of config/srcarchiveurl uses $VERSION and normally does not have to be edited to reflect a version change.

(As a result, a version change can be done by just editing config/version, the rest is now automatic.)

Matthias Blume

[2004/10/27]: BackTrace.monitor now also reports the source of the exception that triggered the trace.

Matthias Blume

[2004/10/27]

This is the HISTORY entry for two earlier commits, both concerning the x86 c-calls code in MLRISC:

added a missing LOAD in the code that deals with struct arguments
made sure the caller does not add the wrong number of bytes to the stack pointer after a call of a function returning a struct (the callee already pops the implicit argument which points to the space reserved for the result)

Matthias Blume

[2004/10/24]

John discovered a bug in the syntax of fucomip. The opcodes FU?COMIP? have been changed to

fu?comip? %st(i), %st

Allen Leung

[2004/10/20]

Added a mechanism for getting back-trace information from standalone programs. Here is how it works:

The part of the program from which you want to get backtrace information (usually the whole program) should be wrapped with BackTrace.monitor. This is a (unit→'a)→'a function, and your main program could be modified from something like
```
fun main (pgm, args) = ...
```
to
```
fun main (pgm, args) = BackTrace.monitor (fn () => ...)
```
To be able to access BackTrace.monitor, you have to add the library $smlnj-tdp/plugins.cm to the .cm file that contains your main function.
Remove all compiled code (i.e., all the .cm/ subdirectories that CM might have created in the past for your project).

Build the system using this command line:

ml-build -Ctdp.instrument=true \$smlnj-tdp/back-trace.cm myprog.cm MyProg.main myprog

instead of the usual

ml-build myprog.cm MyProg.main myprog

I changed the library name $/trace-debug-profile.cm to $smlnj-tdp/plugins.cm, and added the following new libraries:

$smlnj-tdp/back-trace.cm: when loaded causes the back-trace plugin to be installed
$smlnj-tdp/coverage.cm: when loaded causes the coverage plugin to be installed

Matthias Blume

[2004/10/18]: Added an "obsolete" warning for the "group owner" syntax to CM's parser.

Eliminated group owner specs from .cm files throughout the source tree.

Matthias Blume

[2004/10/15]

Test coverage tool added!
Further reorganization of tracing-, debugging-, and profiling support:
- moved original BTImp — now called BackTrace — into a separate library called $/trace-debug-profile.cm
- eliminated all mentions of BTrace from SMLofNJ.Internals
- only the instrumentation mechanism is now left in the compiler proper
- BackTrace module is a plugin which is NOT plugged in by default
- Coverage module is another such plugin

To get the benefits of any of these plugin modules, the code in question must be compiled with TDP instrumentation turned on. This can be done by setting SMLofNJ.Internals.TDP.mode to true. (The ref cell is also controlled via the -Ctdp.instrument=… switch.)

Plugins are selected at link time. (Pre-compiled instrumented code can be re-loaded with different plugins in effect.) When an instrumented module is linked, whatever plugins are at that time enabled will come into effect for that module.

To enable the back-trace plugin, load library $/trace-debug-profile.cm and invoke BackTrace.install() (e.g., from the interactive prompt). To enable the coverage plugin, load the same library and invoke Coverage.install().

Back-traces are generated automatically on uncaught exceptions and when the code in question explicitly invokes BackTrace.trigger().

Coverage (and execution frequency-) information must be queried explicitly by calling Coverage.not_covered and Coverage.hot_spots.

Matthias Blume

[2004/10/14]: Snapshot of a significant overhaul of how the trace/debug/profile support is hooked into the system (specifically: Core and SMLofNJ.Internals).

Matthias Blume

[2004/10/13]

Some rationalization of names:

structure BTrace -> structure TDPInstrument
etc.

This is is preparation of using the original back-trace instrumentation for other purposes. "TDP" stands for Trace/Debug/Profile.

The control flag controlling whether instrumentation is on or off is now registered under a different name, so instead of running sml as

sml -Cinstrument.btrace-mode=true

one has to say

sml -Ctdp.instrument=true

Matthias Blume

[2004/10/11]: Made some minor modifications to elabcore.sml to have source regions be propagated more tightly — resulting in better (i.e., smaller) regions being reported in error- and debug messages.

Matthias Blume

[2004/10/08]: Fixed handling of keywords in .cm files: After seeing "is" the lexer treats subsequent occurrences of "group", "library", "source", "is", "*", and "-" as ordinary identifiers rather than keywords.

Most seriously, this fixes a problem with CM’s "shell" tool. The tool is supposed to accept a tool argument called "source", but this did not work because of the clash with the keyword.

Matthias Blume

[2004/10/07]

Assorted cleanup work:

got rid of intstrmap in favor of using the library’s hash table implementation
threw out most of the pathnames stuff, as it was not used anyway
simplified tokentable implementation
fixed some minor spelling errors

Matthias Blume

[2004/10/06]: Cleaned up the absyn to reflect the invariant that HANDLE always carries a FNexp as part of the type definition. This eliminates some superfluous sanity checks at runtime down the road.

Some minor cleanup of the btrace code.

Matthias Blume

[2004/10/01]: Added hack to make slave mode work in the presence of the version tool. (Still, since the master does two passes over the code for CMB.make, the release number gets bumped twice when slaves are attached. I don’t know if this is worth fixing…)

Matthias Blume

[2004/09/30]

Moved the "version" magic into its own little library under src/system/smlnj/internal. This avoids expensive reconstruction of a stable src/compiler/core.cm.
At the same time, structure CompilerVersion is now known as structure SMLNJVersion.
Arranged for the version tool to NOT kick in when rebuilding the system (makeml -rebuild, fixpt). Otherwise one would never reach a fixpoint. Also, loading the versiontool does not work when rebuilding the system because CM is not properly initialized at that time.

Matthias Blume

[2004/09/29]: Implemented some CM magic to have file src/compiler/TopLevel/main/version.sml generated automagically. The version is taken from two files: config/version and config/release. The first is expected to contain a two-part version number such as 110.49. The second should contain a single number, but it may be missing.

If the environment variable VERSIONTOOL_BUMP_RELEASE is defined at the time the version tool is loaded (which is the first time you say CMB.make), then the tool will increment the value stored in config/release every time CMB.make is invoked.

The binfile format is now insensitive to anything beyond the first two components of a version number, so bumping the release does not render binfiles incompatible. Auto-bumping can be used to keep track of versions during development without invalidating existing binfiles.

In any case, every CMB.make updates the date information in version.sml. (This is the date that is printed in the banner.)

Matthias Blume

[2004/09/28]: Some cleanup of the controls code.

Matthias Blume

[2004/09/27]

Added two pieces of functionality to the Controls interface:

val save'restore: 'a control -> unit -> unit

grabs the current value of the control in stage 1 and restores it in stage 2.

val set' : 'a control * 'a -> unit -> unit

stores the given value into the control in stage 2 (i.e., delayed) but does all error checking in stage 1. (This is for string controls that need to do parse their argument — something that might fail. In some cases, notably in CM, one already knows the intended argument but wants to delay the actual assignment until a time when error recovery would be more difficult.)

Also changed the handling of controls in tool arguments to classes “sml” and “lazysml”:

use Controls.save’restore as a more robust way of restoring the old value (in particular: without having to re-parse the string)
use controls to handle the “overload” keyword in the init group (I believe this change actually fixes a long-standing obscure bug.)

Matthias Blume

[2004/09/27]

Added a new tool class called “lazysml” to CM’s tool chest. The only difference to “sml” is that compilation is done with Control.lazysml set to true. A source of class “lazysml” is automatically recognized by a file name suffix of “.lml”.

In addition to the above feature, the original class “sml” now also supports a tool argument “lazy” which has the same effect. As a result, the following three lines are equivalent:

    foo.sml : lazysml
    foo.sml : sml (lazy)
    foo.sml (lazy)

The setting goes into effect both during parsing and during compilation. The original setting is restored right after parsing and after compilation, respectively.

In addition to all the above, there is also a general mechanism to set ANY of the "controls" that are available at the command line via “-C…” on a per-sml-file basis. The same rules that apply for “lazy” apply as well. (In fact, “lazy” is implemented as a special case of the general mechanism.)

The .cm file syntax uses a new keyword tool argument called “with”. There are several ways of indicating the desired settings:

    foo.sml (with:parser.quotations=true)
    foo.sml (with:(name:parser.quotations value:true))
    foo.sml (with:(name:name1 value:value1 name:name2 value:value2 ...))
    foo.sml (with:(name1=value1 name2=value2 ...))
    foo.sml (with:(name1=value1 name:name2 value:value2 name3=value3 ...))

Another possible abbreviation is to leave out the =v or value:v part if the name refers to a boolean control (in which case the value is taken to be true). Thus, one could get lazy sml also by saying:

    foo.sml (with:parser.lazy-keyword=true)
    foo.sml (with:parser.lazy-keyword)
    foo.sml (with:(name:parser.lazy-keyword value:true))
    foo.sml (with:(name:parser.lazy-keyword))

Matthias Blume

[2004/09/24]: Turned message about "emiting long form of branch" off by default. Added a control flag to turn it back on when desired.

Matthias Blume

[2004/09/24]: Applied patch for setting rounding modes under Mac OS X. Thanks to Melissa O’Neill for providing the code!

Matthias Blume

[2004/09/23]

Changed definition of type ControlRegistry.registry_tree to include control_info (i.e., the name of the controlling environment variable).
Added command-line flags -e and -E to print the names of environment variables that can be used to control internal settings. (This uses the new API mentioned in 1.)

Matthias Blume

Version 110.49; 2004/09/13

[2004/09/13]: Put target “mlrisc” back into the default list. (There is no harm in having it, and some users have expressed their wish to have “mlrisc” included by default.)

Matthias Blume

[2004/09/13]: Fixed the signal masking code to properly nest mask/unmask operations on a per-signal basis.

John Reppy

[2004/09/08]: Bumped the heap magic number to 0x09082004 to account for the changed layout of the ML frame under Mac OS X.

Matthias Blume

[2004/09/03]: Added a patch to _arch-n-opsys to enable the Cygwin runtime. The Cygwin runtime is turned on by setting the environment variable SMLNJ_CYGWIN_RUNTIME to 1.

Allen Leung

[2004/08/31]: Added some exports to src/compiler/core.cm upon request by J. Joemann.

Matthias Blume

[2004/08/30]

Upon request by Johannes Joemann:

improved ML code of installer to fall back to coping when renaming fails (i.e., when source and target are on different file systems); the code compiles but has yet to be tested in anger
removed mlrisc from list of default targets (config/targets)

Matthias Blume

[2004/08/27]: Added ptreql primop to structure InlineT (upon request from Larry Paulson).

Matthias Blume

[2004/08/15]

Another bug fix from Carl Hauser:

diff /net/niflab/smlnj48/src/MLRISC/graphs/udgraph.sml udgraph.sml
> 48c48
> <              | rmv((e as (k,_))::es,L) = rmv(es,if k = i then es else
> e::L)
> ---
> >              | rmv((e as (k,_))::es,L) = rmv(es,if k = i then L else e::L)

Without this, any deletion of an edge in an undirected graph does severe violence to the graph.

Allen Leung

[2004/08/10]: The IBM/MacOS syntax switch on PPC was incorrectly swapped.

Allen Leung

Version 110.48; 2004/08/10

[2004/08/09]: Bug fix from Carl Hauser:

single_source_shortest_paths in dijkstra.sml was observed to get wrong answers (by comparing to single_source_shortest_paths in bellman-ford.sml).

The problem is that following the expression A.update(dist,s,Num.zero) it is necessary to update the priority queue using Q.decreaseWeight(Q,s).

Allen Leung

[2004/08/06]

Fiddled with handling of command-line options:

sml now quits after processing the command line if -H, -S, -h<n>, or -s<n> appears as the last command-line argument
a new option -q terminates the session when encountered on the command line; subsequent arguments will be ignored
bug fixes: short (erroneous) arguments are no longer ignored completely

Matthias Blume

[2004/08/04]

Added minimal IBM assembly syntax support for PowerPC.
Cygwin: manually changed the file cygwin.def. Some exported symbols have been altered in the runtime. We need an automatic way to keep the file in sync.

Allen Leung

Version 110.47; 2004/08/04

[2004/08/03]: Added low-level support for choosing C calling conventions by twiddling the type of rawccall. (See src/compiler/Semant/types/cproto.sml for details.)

Matthias Blume

[2004/08/02]: Backed out of change to win32-filesys.c. The earlier patch to get_file_time caused CM to produce files with the wrong time stamp.

Matthias Blume

[2004/08/02]: Added NLFFI support for Win32, adapted from a patch provided by David Hansel. This is currently completely untested. Also, the issue concerning stdcall vs. ccall is still unresolved.

Matthias Blume

[2004/07/30]

Gearing up towards 110.47…

various minor bugfixes to ml-nlffigen
a beginning of a manual for nlffi
eliminated 'export name=value' in config/install.sh as this does not work with certain versions of /bin/sh (Thanks to David King at Motorola for catching this.)
several bugfixes provided or suggested by David Hansel at Reactive Systems:
- added a test for tm==NULL to gmtime.c and localtime.c
- applied patch for incorrect GetFileTime under win32
- toSeconds → toMilliseconds in Win32/win32-process.sml

Matthias Blume

[2004/07/21]

Fixed minor issue in ml-nlffigen: Now generate structure T_foo for a typedef to an incomplete type, but leave out the “typ” member. (This is just for consistency.)
Started to produce what is supposed to become better (i.e., comprehensive) documentation of what ml-nlffigen does and produces.

Matthias Blume

[2004/07/14]: Added C_UNION to c-calls/c-types.sml and updated the machinery (ml-nlffigen, cproto.sml) that conveys C function interface information to the code generator.

However, the actual architecture-specific implementation of function arguments and results that are C unions is still not implemented.

Matthias Blume

[2004/07/14]: Added these instructions to the PowerPC architecture: LBZU(X), LHZU(X), LWZU(X), STWU(X), STFDU, STFSU, etc…

Note: I haven’t added their instruction encoding into the description.

Allen Leung

[2004/07/13]: Added the two instructions LWARX and STWCX to the PowerPC instruction set.

A (untested) rewrite of loop-structure.sml. The old version is completely broken.

Allen Leung

[2004/07/13]

use paramAlloc to report c-calls with too many arguments (for PPC version where parameter area is pre-allocated)
added ccall_maxargspace to machspec (to implement the above)
made "make" commend in CM’s "make" tool configurable
added option (default: on) for passing the name of the SML/NJ’s "bin" directory to "make"; the call looks like this:
```
make <options> SMLNJ_BINDIR=<dir> <target>
```
This can be used by the Makefile to, e.g., pick the "right" version of ml-nlffigen.
minor code tweaks

Matthias Blume

Version 110.46.1; 2004/07/12

[2004/07/12]

NLFFI under Mac OS X now working (sort of). This is largely untested, though.

Note:

You have to make a new, clean build of the runtime system.
There are new BOOTFILES, you have to use them! (Doing the bootstrap process yourself would be very painful! If you absolutely have to do it, build the system under a different architecture and then cross-compile.)

Version bumped to 110.46.1 to account for runtime data format changes.

Matthias Blume

[2004/06/18]: Changed the implementation of structure Unix so that the same stream is returned every time one of the {text,bin}{In,Out}`streamOf` functions is invoked on the same proc. This is not what the spec currently says — although IMO it arguably should. (See discussion below.)

Matthias Blume

Version 110.46; 2004/06/17

[2004/06/17]

Changed the interface of structures Timer and Unix to match the most recent Basis spec.

In the case of Unix there still seems to be an open/weird issue:

The {text,bin}{In,Out}streamOf functions are supposed to create
fresh streams whenever they are called -- as opposed to have them
return the same stream every time.  This design is supposed to
prevent space leaks caused by proc values hanging on to streams.

The reap function, on the other hand, is supposed to close the
streams.  This cannot be done without having a handle on the
stream in proc after all...

I took the liberty to implement the following stopgap solution:

The proc value hangs on to the most recently created stream(s).
Reap closes those.  If either or both of the two streams hadn't
been created at all yet, then reap will close the corresponding
file descriptors directly.

PS: I don’t understand the original space leak argument anymore. If a proc hangs on to the imperative stream, then I/O operations on those will advance the state of the cached stream and avoid the space leak.

Matthias Blume

[2004/05/28]: Added signature PACK_REAL and exported functor PrimIO.

Matthias Blume

[2004/05/25]: CM now ignores (but still accepts) the "owner" information in group descriptions. The owner of a group is its next enclosing library. Each group must have a unique owner. (There is a virtual "toplevel" library that owns groups which are not nested within a real library.) Previously, each group had to explicitly declare its owner, and CM would check that such a declaration is correct. The new scheme is to have CM check that for each group there is precisely one owning library.

The advantage of the new scheme is that the programmer no longer needs to maintain the somewhat annoying owner information. The downside is that CM cannot enforce the ownership rule across multiple runs of CM.make. Fortunately, enclosing the same group in two different libraries A and B which are not part of the same program does not cause real problems.

Matthias Blume

[2004/05/20]: Made the win32 version work again. (Strangely, a misplaced comma had slipped into win32-process.c which prevented the runtime from being compiled correctly.)

Also, included a minor addition to ml-build.bat analogous to what was done in blume-20040519-ml-build.

Matthias Blume

[2004/05/19]: Arranged for ml-build to clean up after itself a little bit better. The script generates a temporary SML source file and compiles it using CM, so CM generates metadata (GUID, SKEL, objectfile) for it. It now gets rid of those at the end, so they don’t accumulate under .cm.

This required a minor change to install.sh because the name of the metadata directory (default: .cm) is actually configurable at installation time.

Matthias Blume

[2004/05/18]: Added Posix.IO.mk{Bin,Text}{Reader,Writer} by lifting their respective implementations from internal modules PosixBinPrimIO and PosixTextPrimIO.

Matthias Blume

[2004/05/11]

Added previously missing support for many socket-related functions under win32. Thanks to David Hansel <hansel@reactive-systems.com> for the voluminous patch!

(I have not tested this patch under win32 yet.)

Here is David’s e-mail:

Hi,

Attached to this email you find a diff against sml/nj 110.45 that will enable socket support under Windows.

To apply the patch (using unix or cygwin) 1) gunzip runtime.diff.gz 2) "cd" into "src/runtime" in the source tree of a fresh 110.45 installation. 3) patch -p 1 < [your/path/to]runtime.diff

The code compiles fine but has NOT yet been extensively tested. I only ran a few tests for basic socket client functionality (which worked fine). Especially the functions that use ioctl are not tested at all and might not work (see below).

I implemented this since we want to move to a newer version of sml/nj but need socket support in order to use it. This is the first time I even had a look at the sml/nj source, so please review my changes before making this part of the distribution! Here are a few issues that I think might be better for someone to solve who is more familiar with the sml/nj source (and socket programming):

getnetbyaddr.c and getnetbyname.c will raise a "not implemented" exception since I could not figure out what the windows equivalent of these functions is
In sockets-osdep.h there are a some #include statements that are only used in a few files that include sockets-osdep.h
In smlnj-sock-lib.c, function init_fn() calls WSAStartup() but does not process its return value since I don’t know how to report an error upwards.
It would probably be good to have a call to WSACleanup() when the library is unloaded (if there is such a possibility). Otherwise I think Windows will take care of this automatically when the process finishes.
I used ioctlsocket() as a replacement for ioctl() but I have no idea if that is actually the proper replacement on Windows.
All these issues are marked in the code by "FIXME" comments.

We use sml/nj extensively in our products and are quite happy with it. I hope this contribution will help you.

Keep up the good work!

David

Matthias Blume

[2004/05/11]: Fixed two bugs in installml script. (Thanks to Vesa A. Norrman for the patch.)

Matthias Blume

[2004/05/11]: Added support for nlffi under netbsd. (Thanks to Vesa A. Norrman for the patch.)

Matthias Blume

[2004/05/11]: As per request by Adam Chlipala <adam@hcoop.net>, extended various export lists in compiler-related .cm-files.

Matthias Blume

[2004/05/11]: The installer now honors the "src-smlnj" target again, although its meaning has changed from "all sources required for the compiler" to "all sources the installer knows about". In other words, if you enable "src-smlnj" in the "targets" file, then the installer will pull in sources for everything. (Notice that this refers to source code only. Compiled code is still only installed for modules that were requested explicitly or which are required for other modules that were requested explicitly.)

Matthias Blume

[2004/04/23]: Fixed IEEEReal.scan (and .fromString) so that if there is an overflow in the exponent calculation we get INF or ZERO (depending on the mantissa and the sign of the exponent).

Matthias Blume

[2004/04/23]: The ml-build script now terminates with a non-0 status when something goes wrong.

Matthias Blume

[2004/04/22]: Made exception Option to be the same as exception Option.Option (as it should be).

Matthias Blume

[2004/03/19]

Fixed the runtime so that ml-nlffi-lib runs on the cygwin version of SML/NJ. The problem is that

lib = dlopen(NULL, ...)
f   = dlsym(lib, "malloc");

does not work on Windows unless we explicitly export symbols such as 'malloc' during linking. We fixed this by explicitly exporting the required symbols with the magic gcc incantation:

-Wl,--export-all cygwin.def

where cygwin.def is a file containing all the symbols that we wish to export.

I suspect this is a Windows problem and we’ll have to do the same (somehow with windows compilers) when we build the native win32 version with the system calls LoadLibrary/GetProcAddress.

Allen Leung

[2004/03/04]: Fixed problem with IntInf.fmt (sign would show up on the right instead of on the left for BIN, OCT, and HEX).

Matthias Blume

[2004/03/04]: Fixed problem with installer script (unix only) where bin/ml-yacc and friends pointed (via symlinks) to absolute locations instead of just .run-sml. This was reported by Vesa A Norrman.

Matthias Blume

Version 110.45; 2004/02/13

[2004/01/26]: Improved handling of exceptions at the interactive toplevel.

Matthias Blume

[2004/01/26]: Type of top-level "app" corrected. Added code for setting vp_limitPtrMask to Win32-specific runtime.

Matthias Blume

[2003/11/18]

changed Timer interface to what might become the spec
POSIX_FLAGS → BIT_FLAGS according to spec
some other minor discrepancies wrt. spec eliminated

Matthias Blume

Version 110.44; 2003/11/06

[2003/11/04]: Eliminated the "dont_move_libraries" directive in config/targets. (The mechanism was broken and could not be fixed easily. Moreover, there does not seem to be any reason not to move all libraries into lib during installation. I originally implemented this directive as a backward-compatibility feature when I first introduced the new CM. Now that things have been stable for a long time and going back to the old CM is not an option, there is no reason to keep it around.)

Matthias Blume

[2003/11/03]: Made installer honor INSTALLDIR variable again. (Thanks to Chris Richards for pointing out the problem and providing the solution.)

Matthias Blume

[2003/10/01]: MLRISC bug fix from Lal.

Matthias Blume

[2003/09/30]

Added openVector, nullRd, and nullWr to PRIM_IO.
Improved .bat files (for Win32 port) to make things work under Win95. (thanks to Aaron S. Hawley for this one)

Matthias Blume

[2003/09/26]: Added missing wrapper for privilege "primitive" in $smlnj/viscomp/core.cm.

Matthias Blume

Version 110.43.3; 2003/09/26

[2003/09/26]: I modified the read-eval-print loop so that the autoloader gets invoked whenever the prettyprinter tries to look up a symbol that is not currently defined in the toplevel environment but which appears in CM’s autoload registry. As a result, we see far fewer of those ?.Foo.Bar.xxx names in the prettyprinter’s output.

In addition to this I tried to clean up some pieces of the Basis implementation (e.g., Socket, Word8Array) in order to prevent other instances of these ?.Foo.Bar.xxx names from being printed.

The mechanism that picks names for types still needs some work, though. (Right now it seems that if there is a type A.t which is defined to be B.u, but B is unavailable at toplevel, then A.t gets printed as "?.B.u" although the perhaps more sensible solution would be to use "A.t" in this case. In other words, the prettyprinter should follow a chain of DEFtycs not farther than there are corresponding toplevel names in the current environment.)

Matthias Blume

[2003/09/24]

Another installer tweak: All the ML code for the installer is now compiled during CMB.make and put into a little library called $smlnj/installer.cm. The installation then simply invokes

   sml -m $smlnj/installer.cm

and everything happens automagically.

Win32: ML code senses value of environment variable SMLNJ_HOME. Unix: ML code senses values of environment variables ROOT, CONFIGDIR, and BINDIR.

The new scheme guarantees that the ML code responsible for the installation is in sync with the APIs of the main system. Also, the installer is somewhat faster because the installer script is precompiled.

Matthias Blume

[2003/09/24]: Added a signature SYNCHRONOUS_SOCKET to basis.cm. This is like SOCKET but excludes all non-blocking operations. Defined SOCKET (in Basis) and CML_SOCKET in terms of SYNCHRONOUS_SOCKET. Removed superfluous implementations of non-blocking operations from CML’s Socket structure.

Matthias Blume

[2003/09/24]

Fixed SOCKET API and implementation to match Basis spec. This required changing the internal representation of sockets to one that remembers (for each socket file descriptor) whether it is currently blocking or non-blocking. This state is maintained lazily (i.e., a system call is made only if the state actually needs to change).
OS-specific details of sockets were moved into separate files, thus making it possible to unify the bulk of the socket implementations between Unix and Win32.
CML’s socket API changed accordingly. (Note that we need to remove non-blocking functions from this API since they are redundant in the case of CML!)
CML’s socket implementation now makes use of non-blocking functions provided by Basis, thus removing all OS-dependent code from this part of CML.
Changed Real64.precision from 52 to 53. Minor cleanup in Real64 code.

Matthias Blume

Version 110.43.2; 2003/09/22

[2003/09/22]: Made a new interim version and bootfiles for developer’s bootstrapping convenience.

Matthias Blume

[2003/09/19]

new-install.sh → install.sh
changed default CM "metadata" directory name to ".cm" (instead of "CM")
tweaked installer so that another name instead of .cm can be chosen at install time (by setting the CM_DIR_ARC environment variable during installation); once installation is complete, the name is fixed

Matthias Blume

Version 110.43.1; 2003/09/18

[2003/09/18]: Made a new interim version and bootfiles for developer’s bootstrapping convenience.

Matthias Blume

[2003/09/18]

Exported fractionsPerSecond etc. from TimeImp (but not from Time as this seems to be controversial at the moment) and used those in Posix.ProcEnv.times.
Added Time.{from,to}Nanoseconds to Time.
Improved Real.{from,to}LargeInt by avoiding needless calculations. For example, fromLargeInt never needs to look at more than 3 "big digits" to get its 53 bits of precision.

Matthias Blume

[2003/09/17]: Added an entry to the primitive environment (compiler/Semant/statenv/prim.sml) for int32→real64 conversion and added code to compiler/CodeGen/main/mlriscGen.sml to implement it.

Removed some of the "magic" constants in real64.sml and replaced them with code that generates these values from their corresponding integer counterparts.

Made all(?) the slice-related changes to the Basis and made everything compile again…

Matthias Blume

[2003/09/15]: Fixed bug in Real.fromLargeInt.

Matthias Blume

[2003/09/13]: Minor bugfix in config/libinstall (set anchor with path to standalone tool after installing it, otherwise libraries that need ml-lex or ml-yacc won’t compile the first time the installer runs).

Matthias Blume

[2003/09/12]

fixed bug in Real.toLargeInt
fixed bug in Posix.ProcEnv.times
changed inputLine functions to return an option
minor installer improvements / bugfixes
changed default @SMLalloc parameter for x86/celeron to 64k

Matthias Blume

Version 110.43; 2003/09/09

[2003/09/09]

Rewrote large parts of config/install.sh in SML (config/libinstall.sml). Modified config/install.bat to take advantage of it. Also modified config/install.sh (and called it config/new-install.sh) to take advantage of it on Unix systems. (The SML code is (supposed to be) platform- independent.)

The installer can now install everything under Win32 as well as under *nix as long as it compiles.

Other changes:

made CML compile again under Win32
made eXene compile under Win32 (by providing a fake structure UnixSock and by using OS.Process.getEnv instead of Posix.ProcEnv.getenv)
fixed a bug in nowhere: it assumed that type OS.Process.status is the same as type int; under Win32 it isn’t
fixed some slice-related problems in the win32-specific parts of CML
added a functor argument "sameVol" to os-path-fn.sml in the Basis (under Win32, the volume name is case-insensitive, and the OS.Path code compares volume names for equality)

Matthias Blume

[2003/09/08]: Made Win32 version of OS.FileSys.fullPath return current directory when given an empty string. This is what the spec says, and incidentally, CM depends on it. (CM otherwise goes into an infinite loop in certain cases when presented with the name of a non-existing .cm file.)

Matthias Blume

[2003/09/04]

Changed interface to vectors and arrays in Basis to match (draft) Basis spec.
Added signatures and implementations of slices according to Basis spec.
Edited source code throughout the system to make it compile again under 1. and 2. (In some cases code had to be added to have it match the new signatures.)
MLRISC should be backward-compatible: the copies of the originals of files that needed to change under 3. were retained, the .cm files check the compiler version number and use old versions when appropriate.
Changed type of OS.FileSys.readDir and Posix.FileSys.readdir to dirstream → string option (in accordance with Basis spec).
When generating code that counts lines, ml-lex used function CharVector.foldli, taking advantage of its old interface. This has been replaced with the corresponding code from CharVectorSlice. (html-lex must be re-lexed!)
BitArray in smlnj-lib/Util has been extended/modified to match the new MONO_ARRAY signature. (Do we need BitArraySlice?)
Removed temporary additions (fromInternal, toInternal) from the (now obsolete) IntInf in smlnj-lib/Util.
Cleaned up structure Byte.
Added localOffset, scan, and fromString to Date (according to spec). Cleaned/corrected implementation of Date. (Still need to check for correctness; implement better canonicalizeDate.)
Added "scan" to signature IEEE_REAL.
Some improvements to IntInf [in particular: efficiency-hack for mod and rem when second operand is 2 (for parity checks).]
Changed representation of type Time.time, using a single IntInf.int value counting microseconds. This considerably simplified the implementation of structure Time. We now support negative time values; scan and fromString handle signs.
Functor PrimIO now takes two additional arguments (VectorSlice and ArraySlice).

Matthias Blume

[2003/08/28]: This is a major update which comes with a version number bump (110.42.99 — yes, we are really close to 110.43 :-), NEW BOOTFILES, and an implementation of IntInf in the Basis.

There are a fairly large number of related changes and updates throughout the system:

+ Basis: - Implemented IntInf. - Made LargeInt a projection of IntInf (by filtering through INTEGER). - Added some missing Real64 operations, most notably Real.toLargeInt. - Added FixedInt as a synonym for Int32.

+ compiler: * Added support for a built-in intinf type. — - literals - pattern matching - conversion shortcuts (Int32.fromLarge o Int.toLarge etc.) - overloading on literals and operations — This required adding a primitive type intinf, some additional primops, and implementations for several non-trivial intinf operations in Core. (The intinf type is completely abstract to the compiler; all operations get delegated back to the Core.)

+ * Intinf equality is handled by polyequal. However, the compiler does not print its usual warning in this case (since polyequal is the right thing to do there).

+ * Improved the organization of structure InlineT.

+ * A word about conversion primops: If conversions involving intinf do not cancel out during CPS contract, then the compiler must insert calls to Core functions. Since all core access must be resolved already during the FLINT translate phase, it would be too late a the time of CPS contract to add new Core calls. For this reason, conversion primops for intinf carry two arguments: 1. the numeric argument that they are supposed to convert, and 2. the Core function that can help with this conversion if necessary. If CPS contract eliminates a primop, then the associated Core function becomes dead and goes away. Intinf conversion primops that do not get eliminated by CPS contract get rewritten into calls of their core functions by a separate, new phase.

+ interactive system: - Control.Print.intinfDepth controls max length of intinf constants being printed. (Analogous to Control.Print.stringDepth.) - Cleanup in printutil and pputil: got rid of unused stuff and duplicates; replaced some of the code with code that makes better use of library functionality.

+ CM: Bugfix: parse-errors in init group (system/smlnj/init/init.cmi) are no longer silent.

+ CKIT: Fixed mismatched uses of Int32 and LargeInt. I always decided in favor of LargeInt — which is now the same as IntInf. CKIT-knowledgable people should check whether this is what’s intended and otherwise change things back to using Int32 or FixedInt.

+ Throughout the code: Started using IntInf.int literals and built-in operations (e.g., comparison with 0) where this seems appropriate.

+ Matthias Blume

[2003/08/13]: Merging changes from the mcz-branch development branch into trunk. These changes involve replacement of the emulated old prettyprinter interface with direct use of the SML/NJ Lib PP library, and fixing of a couple of bugs (895, 1186) relating to error messages. A new prettyprinter for ast datatypes (Elaborator/print/ppast.{sig,sml}) has been added.

Dave MacQueen

Version 110.42.9; 2003/08/11

[2003/08/11]

This patch restores SML/NJ’s ability to run under win32. There are a number of changes, including fixes for several bugs that had gone unnoticed until now:

uname "CYGWIN_NT*" is recognized as win32 (This is relevant only when trying to run the win32 version from within cygwin.)
There are a number of simple .bat scripts that substitute for their corresponding Unix shell-scripts. (See below.)
The internals of ml-build have been modified slightly. The main difference is that instead of calling ".link-sml" (or link-sml.bat) using OS.Process.system, the ML process delegates this task back to the script. Otherwise problems arise in mixed environments such as Cygwin where scripts look and work like Unix scripts, but where OS.Process.system cannot run them.
In CM, the srcpath pickler used native pathname syntax — which is incorrect in the case of cross-compilation. The new pickle format is independent of platform-specific naming conventions.
Path configuration files (such as lib/pathconfig) can now choose between native and standard syntax. Placing a line of the form
```
standard!
```
into the file causes all subsequent paths to be interpreted using CM standard pathname syntax (= Unix conventions); a line
```
native!
```
switches back to native style. This was needed so that path config files can be written portably, see src/system/pathconfig.
Runtime system:
- win32-filesys.c: get_file_time and set_file_time now access modification time, not creation time.
- I/O code made aware of new array representation.
- Bug fixes in X86.prim.masm.
- src/system/makeml made aware of win32. (For use under cygwin and other Unix-environments for windows.)
- In Basis, fixed off-by-one error in win32-io.sml (function vecF) which caused BinIO.inputAll to fail consistently.
.bat scripts:
```
Windows .bat scripts assume that `SMLNJ_HOME` is defined.
```
- sml.bat, ml-yacc.bat, ml-lex.bat: Driver scripts for standalone applications (sml, ml-yacc, ml-lex).
- ml-build.bat: analogous to ml-build.
- config\install.bat: Analogous to config/install.sh. This requires that SMLNJ_HOME is set and that Microsoft Visual C is ready to use. (nmake etc. must be on the path, and vcvars32 must have been run.) Moreover, sources for ml-lex and ml-yacc need to exist under src, and the bootfile hierarchy must have been unpacked under sml.boot.x86-win32. The script is very primitive and does a poor job at error checking. It only installs the base system, ml-lex, and ml-yacc. No other libraries are being installed (i.e., you get only those that are part of the compiler.)
- link-sml.bat: analogous to .link-sml, but not currently used
Unrelated bug fixes:
- ml-nlffigen now exports structures ST_* corresponding to incomplete types.
- Added getDevice to PP/src/pp-debug-fn.sml. (Would not compile otherwise.)

Matthias Blume

[2003/06/17]: Modified compiler/Elaborator/print/pptype.sml to fix bug 895. Tag will be used for new development branch (mcz-branch) for use by MacQueen, (Lucasz) Zairek, and (George) Cao at uchicago.

Dave MacQueen

[2003/05/27]: Tried to eliminated most cases of polymorphic equality.

Matthias Blume

[2003/05/21]

Two changes:

Added a flag for controlling whether non-exhaustive bindings will be treated as errors (default is false).
Cleaned up the entire source tree so that CMB.make goes through without a single non-exhaustive match- or bind warning.

Matthias Blume

[2003/05/17]

Added cases for IF, WHILE, ANDALSO, and ORELSE to Absyn.

This mainly affects the quality of error messages. However, some of the code is now more straightforward than before. (Treatment of the above four constructs in translate.sml is much simpler than the "macro-expansion" that was going on before. Plus, the mach- compiler no longer gets invoked just to be able to compile an if-expression.)
The ErrorMsg.Error exception is now caught and absorbed by the interactive loop.

Matthias Blume

[2003/05/16]: Ported the runtime system to cygwin, which uses the unix x86-unix bin files. Missing/buggy features:

getnetbyname, getnetbyaddr: these functions seem to be missing in the Cygwin library.
Ctrl-C handling may be flaky.
Windows system calls and Windows I/O are not supported.

A new set of binfiles is located at:

+ http://www.dorsai.org/~leunga/boot.x86-unix.tgz

+ This is only needed for bootstrapping the cygwin version of smlnj. Other x86 versions can use the existing binfiles.

+ Allen Leung

[2003/04/08]

Added a target 'mlrisc' to installer.
Added missing elements to structure ListPair.

Matthias Blume

[2003/01/07]: Fixed a bug in Int.rem(x,y) where y is a power of 2 on x86. The arguments to the SUBL instruction were swapped.

Allen Leung

[2002/12/12]: Fixed a serious bug in the rewrite code for FP spilling/reloading that sent the RA into an infinite loop when floating point registers get spilled. (Because of this bug, e.g., nucleic stopped compiling between 110.37 and 110.38.) There was another set of potential problems related to the handling of MLRISC annotations (but those did not yet cause real problems, apparently).

Matthias Blume

[2002/12/06]: Added a call of SrcPath.sync at the beginning of Parse.parse (in CM). This fixes the problem of CM getting confused by files that suddenly change their identity (e.g., by getting unlinked and recreated by some text editor such as vi). There might be a better/cheaper/cleaner way of doing this, but for now this will have to do.

Matthias Blume

[2002/10/28]: Exported structure Typecheck from $smlnj/viscomp/core.cm.

Matthias Blume

Version 110.42.1; 2002/10/17

[2002/10/17]: In good old tradition, there has been a slight hiccup so that we have to patch 110.42 after the fact. The old release tag has been replaced (see below).

The change solves a problem with two competing approaches the configuration problem regarding MacOS 10.1 vs. MacOS 10.2 which got in each other’s way.

This change only affects the runtime system code and the installer script. (No new bootfiles.)

Matthias Blume

Version 110.42; 2002/10/16

[2002/10/10]: The mltree operator DIVS must be implemented with an overflow check on the PPC because the hardware indicates divide-by-zero using "overflow" as well.

Matthias Blume

[2002/07/23]: Sml now senses the SMLNJ_HOME environment variable. If this is set, then the bin dir is assumed to be in $SMLNJ_HOME/bin and (unless CM_PATHCONFIG is also set), the path configuration file is assumed to be in $SMLNJ_HOME/lib/pathconfig. This way one can easily move the entire tree to some other place and everything will "just work".

(Companion commands such as ml-build and ml-makedepend also sense this variable.)

Matthias Blume

[2002/07/12]: Exported two useful "step" functions from liveness module (MLRISC).

Matthias Blume

Version 110.41; 2002/07/05

[2002/07/05]: Exported structure BTImp from $smlnj/viscomp/debugprof.cm so that other clients can set up backtracing support.

Matthias Blume

[2002/06/25]: Fixed a bug in translation of INLMAX (and INLMIN) for the floating-point case. (The sense of the isNaN test was reversed — which made min and max always return their first argument.)

Matthias Blume

[2002/06/11]: Back-ported OS.Path.{from,to}UnixPath from idlbasis-devel branch.

Matthias Blume

[2002/06/10]: I back-ported my implementation of IEEEReal.fromString from the idlbasis-devel branch so that we can test it.

Another small change is that ppDec tries to give more information than just "<sig>" in the case of functors. However, this code is broken in some mysterious way if the functor’s body’s signature has not been declared by ascription but gets inferred from the implementation. This needs fixing…

Matthias Blume

[2002/05/31]: Resurrected SMLofNJ.Internals.BTrace.mode. (It accidentally fell by the wayside when I switched over to using Controls everywhere.)

Matthias Blume

[2002/05/23]: Labels are now displayed in the graphical output to make the fall-through and target blocks obvious.

Lal George

[2002/05/22]: John tweaked yesterday’s fix for 1131 to handle an out-of-memory situation that comes up when allocating huge arrays.

Matthias Blume

Version 110.40; 2002/05/21

[2002/05/21]: John Reppy fixed GC bug 1131.

Matthias Blume

[2002/05/21]: CM documentation update.

Matthias Blume

[2002/05/21]

John tweaked runtime to be silent on heap export (except when GC messages are on).
I added a few more things (cross-compiling versions of CMB) to config/preloads (as suggestions).

Matthias Blume

[2002/05/20]

Added ControlUtil structure to control-lib.cm.
Use it throughout.
Used Controls facility to define MLRISC controls (as opposed to registering MLRISC control ref cells with Controls after the fact)
Fixed messed-up controls priorities.
- Removed again all the stuff from config/preloads that one wouldn’t be able to preload at the time the initial heap image is built. (Many libraries, e.g., CML, do not exist yet at this time. The only libraries that can be preloaded via config/preloads are those that come bundled with the bootfiles.)
  
  Matthias Blume

[2002/05/20]: Added a lot of commented-out suggestions for things to be included in config/preloads.

Matthias Blume

[2002/05/18]

Made the mdl tool stuff compile and run again.
I’ve disabled all the stuff that depends on RTL specifications; they are all badly broken anyway.

Allen Leung

[2002/05/17]

John Reppy made several modifications to the SML/NJ Library. In particular, there is a shiny new controls-lib.cm.
Pushed new controls interface through compiler so that everything compiles again.
Added FormatComb and FORMAT_COMB to the CML version of the SML/NJ Library (so that CML compiles again).
Modified init scripts because XXX_DEFAULT environment variables are no longer with us. (Boot-time initialization is now done using the same environment variables that are also used for startup-time initialization of controls.)

Matthias Blume

[2002/05/15]: All pseudo-ops emitted before the first segment declaration such as TEXT, DATA, and BSS directives are assumed to be global declarations and are emitted first in the assembly file. This is useful in a number of situations where one has pseudo-ops that are not specific to any segment, and also works around the constraint that one cannot have client pseudo-ops in the TEXT segment.

Because no segment is associated with these declarations it is an error to allocate any space or objects before the first segment directive and an exception will be raised. However, we cannot make this check for client pseudo-ops.

These top level declarations are a field in the CFG graph_info. In theory you can continue to add to this field after the CFG has been built — provided you know what you are doing;-)

Lal George

[2002/05/13]

A few minor bugfixes:

Stopgap measure for bug recently reported by Elsa Gunter (ppDec). (Bogus printouts for redefined bindings still occur. Compiler bug should no longer occur now. We need to redo the prettyprinter from scratch.)
CM pathname printer now also adds escape sequences for ( and )
commend and docu fixes for ml-nlffi

Matthias Blume

[2002/05/10]

Applied the following bugfix provided by Emden Gansner:

Output is corrupted when outputSubstr is used rather than output.

The problem occurs when a substring

ss = (s, dataStart, dataLen)

where dataStart > 0, fills a stream buffer with avail bytes left.
avail bytes of s, starting at index dataStart, are copied into the
buffer, the buffer is flushed, and then the remaining dataLen-avail
bytes of ss are copied into the beginning of the buffer. Instead of
starting this copy at index dataStart+avail in s, the current code
starts the copy at index avail.

  Fix:
  In text-io-fn.sml, change line 695 from
val needsFlush = copyVec(v, avail, dataLen-avail, buf, 0)
  to
val needsFlush = copyVec(v, dataStart+avail, dataLen-avail, buf, 0)

Matthias Blume

[2002/04/12]

Grabbed newer assyntax.h from the XFree86 project.
Fiddled with how to compile X86.prim.asm without warnings.
(Very) Minor cleanup in CM.

Matthias Blume

[2002/04/01]: Added full support for div/mod/rem/quot on the x86, using the machine instruction’s two results (without clumsily recomputing the remainder) directly where appropriate.

Some more extensive power-of-two support was added to the x86 instruction selector (avoiding expensive divs, mods, and muls where they can be replaced with cheaper shifts and masks). However, this sort of thing ought to be done earlier, e.g., within the CPS optimizer so that all architectures benefit from it.

The compiler compiles to a fixed point, but changes might be somewhat fragile nevertheless. Please, report any strange things that you might see wrt. div/mod/quot/rem…

Matthias Blume

[2002/03/29]: Fixed my broken div/mod logic. Unfortunately, this means that the inline code for div/mod now has one more comparison than before. Fast paths (quotient > 0 or remainder = 0) are not affected, though. The problem was with quotient = 0, because that alone does not tell us which way the rounding went. One then has to look at whether remainder and divisor have the same sign… :(

Anyway, I replaced the bootfiles with fresh ones…

Matthias Blume

Version 110.39.3; 2002/03/29

[2002/03/29]

Primops have changed. This means that the bin/boot-file formats have changed as well.

To make sure that there is no confusion, I made a new version.

+ CHANGES:

removed REMT from mltree (remainder should never overflow).
added primops to deal with divisions of all flavors to the frontend
handled these primops all the way through so they map to their respective MLRISC support
used these primops in the implementation of Int, Int32, Word, Word32
removed INLDIV, INLMOD, and INLREM as they are no longer necessary
parameterized INLMIN, INLMAX, and INLABS by a numkind
translate.sml now deals with all flavors of INL{MIN,MAX,ABS}, including floating point
used INL{MIN,MAX,ABS} in the implementation of Int, Int32, Word, Word32, and Real (but Real.abs maps to a separate floating-point-only primop)

TODO items:
Hacked Alpha32 instruction selection, disabling the selection of REMx instructions because the machine instruction encoder cannot handle them. (Hppa, PPC, and Sparc instruction selection did not handle REM in the first place, and REM is supported by the x86 machine coder.)
Handle DIV and MOD with DIV_TO_NEGINF directly in the x86 instruction selection phase. (The two can be streamlined because the hardware delivers both quotient and remainder at the same time anyway.)
Think about what to do with "valOf(Int32.minInt) div ~1" and friends. (Currently the behavior is inconsistent both across architectures and wrt. the draft Basis spec.)
Word8 should eventually be handled natively, too.
There seems to be one serious bug in mltree-gen.sml. It appears, though, as if there currently is no execution path that could trigger it in SML/NJ. (The assumptions underlying functions arith and promotable do not hold for things like multiplication and division.)

Matthias Blume

[2002/03/27]

Added support for all four division operations (ML’s div, mod, quot, and rem) to MLRISC. In the course of doing so, I also rationalized the naming (no more annoying switch-around of DIV and QUOT), by parameterizing the operation by div_rounding_mode (which can be either DIV_TO_ZERO or DIV_TO_NEGINF).

The generic MLTreeGen functor takes care of compiling all four operations down to only round-to-zero div.

Missing pieces:

Doing something smarter than relying on MLTreeGen on architectures like, e.g., the x86 where hardware division delivers both quotient and remainder at the same time. With this, the implementation of the round-to-neginf operations could be further streamlined.
Remove inlining support for div/mod/rem from the frontend and replace it with primops that get carried through to the backend. Do this for all int and word types.

Matthias Blume

[2002/03/25]

I improved (hopefully without breaking them) the implementation of Int.div, Int.mod, and Int.rem. For this, the code in translate.sml now takes advantage of the following observations:

Let  q = x quot y      r = x rem y
     d = x div  y      m = x mod y

where "quot" is the round-to-zero version of integer division that hardware usually provides. Then we have:

r = x - q * y        where neither the * nor the - will overflow
d = if q >= 0 orelse x = q * y then q else q - 1
                     where neither the * nor the - will overflow
m = if q >= 0 orelse r = 0 then r else r + y
                     where the + will not overflow

This results in substantial simplification of the generated code. The following table shows the number of CFG nodes and edges generated for fun f (x, y) = x OPER y (* with OPER \in div, mod, quot, rem *)

+ OPER | nodes(old) | edges(old) | nodes(new) | edges(new) -------------------------------------------------------- div | 24 | 39 | 12 | 16 mod | 41 | 71 | 12 | 16 quot | 8 | 10 | 8 | 10 rem | 10 | 14 | 8 | 10

+ Matthias Blume

[2002/03/25]: Fixed a bug in cproto (c prototype decoder).

Matthias Blume

[2002/03/25]: I did some cleanup to Allen’s new primop code and replaced yesterday’s bootfiles with new ones. (But they are stored in the same place.)

Matthias Blume

[2002/03/24]: Made the bootfiles that Allen asked for.

Matthias Blume

[2002/03/23]

Changes to FLINT primops:

    (* make a call to a C-function;
     * The primop carries C function prototype information and specifies
     * which of its (ML-) arguments are floating point. C prototype
     * information is for use by the backend, ML information is for
     * use by the CPS converter. *)
  | RAW_CCALL of { c_proto: CTypes.c_proto,
                   ml_args: ccall_type list,
                   ml_res_opt: ccall_type option,
                   reentrant : bool
                 } option
   (* Allocate uninitialized storage on the heap.
    * The record is meant to hold short-lived C objects, _i.e._, they
    * are not ML pointers.  With the tag, the representation is
    * the same as RECORD with tag tag_raw32 (sz=4), or tag_fblock (sz=8)
    *)
  | RAW_RECORD of {tag:bool,sz:int}
  and ccall_type = CCALL_INT32 | CCALL_REAL64 | CCALL_ML_PTR

These CPS primops are now overloaded:

       rawload of {kind:numkind}
       rawstore of {kind:numkind}

The one argument form is:

         rawload {kind} address

The two argument form is:

         rawload {kind} [ml object, byte-offset]

RAW_CCALL/RCC now takes two extra arguments:
1. The first is whether the C call is reentrant, i.e., whether ML state should be saved and restored.
2. The second argument is a string argument specifying the name of library and the C function.
These things are currently not handled in the code generator, yet.
In CProto,

An encoding type of "bool" means "ml object" and is mapped into C prototype of PTR. Note that "bool" is different than "string", even though "string" is also mapped into PTR, because "bool" is assigned an CPS type of BOGt, while "string" is assigned INT32t.
Pickler/unpicker

Changed to handle RAW_RECORD and newest RAW_CCALL
MLRiscGen,
1. Changed to handle the new rawload/rawstore/rawrecord operators.
2. Code for handling C Calls has been moved to a new module CPSCCalls, in the file CodeGen/cpscompile/cps-c-calls.sml
Added the conditional move operator
```
         condmove of branch
```
to cps. Generation of this is still buggy so it is currently disabled.

Allen Leung

[2002/03/22]

Implemented the Ball-Larus branch prediction-heuristics, and incorporated graphical viewers for control flow graphs.

Ball-Larus Heuristics:

See the file compiler/CodeGen/cpscompile/cpsBranchProb.sml.

By design it uses the Dempster-Shafer theory for combining probabilities. For example, in the function:

fun f(n,acc) = if n = 0 then acc else f(n-1, n*acc)

the ball-larus heuristics predicts that the n=0 is unlikely (OH-heuristic), and the 'then' branch is unlikely because of the RH-heuristic — giving the 'then' branch an even lower combined probability using the Dempster-Shafer theory.

Finally, John Reppy’s loop analysis in MLRISC, further lowers the probability of the 'then' branch because of the loop in the else branch.

+ Graphical Viewing:

+ I merely plugged in Allen’s graphical viewers into the compiler. The additional code is not much. At the top level, saying:

+ Control.MLRISC.getFlag "cfg-graphical-view" := true;

+ will display the graphical view of the control flow graph just before back-patching. daVinci must be in your path for this to work. If daVinci is not available, then the default viewer can be changed using:

+ Control.MLRISC.getString "viewer"

+ which can be set to "dot" or "vcg" for the corresponding viewers. Of course, these viewers must be in your path.

+ The above will display the compilation unit at the level of clusters, many of which are small, boring, and un-interesting. Also setting:

+ Control.MLRISC.getInt "cfg-graphical-view_size"

+ will display clusters that are larger than the value set by the above.

+ Lal George

[2002/03/21]: Changed the interface to the KMP routine in PreString and fixed a minor bug in one place where it was used.

Matthias Blume

[2002/03/21]: Fixed a potential problem in cfg edge splitting.

Allen Leung

[2002/03/21]

Recoded the buggy parts of x86-fp.
1. All the block reordering code has been removed. We now depend on the block placement phases to do this work.
2. Critical edge splitting code has been simplified and moved into the CFG modules, as where they belong.
  Both of these were quite buggy and complex. The code is now much, much simpler.
X86 backend.
1. Added instructions for 64-bit support. Instruction selection for 64-bit has not been committed, however, since that requires changes to MLTREE which haven’t been approved by Lal and John.
2. Added support for FUCOMI and FUCOMIP when generating code for PentiumPro and above. We only generate these instructions in the fast-fp mode.
3. Added cases for JP and JNP in X86FreqProps.

CFG

CFG now has a bunch of methods for edge splitting and merging.

Machine description.

John's simplification of MLTREE_BASIS.fcond broke a few machine
description things:

rtl-build.{sig,sml} and hppa.mdl fixed.

NOTE: the machine description stuff in the repository is still broken.
      Again, I can't put my fixes in because that involves
      changes to MLTREE.

Allen Leung

[2002/03/20]: Implemented Knuth-Morris-Pratt string matching in PreString and used it for String.isSubstring, Substring.isSubstring, and Substring.position.

(Might need some stress-testing. Simple examples worked fine.)

Matthias Blume

[2002/03/19]: Added a structure C.W and functions convert/Ptr.convert to ml-nlffi-lib.

This implements a generic mechanism for changing constness qualifiers anywhere within big C types without resorting to outright "casts". (So far, functions such as C.rw/C.ro or C.Ptr.rw/C.Ptr.ro only let you modify the constness at the outermost level.) The implementation of "convert" is based on the idea of "witness" values — values that are not used by the operation but whose types "testify" to their applicability. On the implementation side, "convert" is simply a projection (returning its second curried argument). With cross-module inlining, it should not result in any machine code being generated.

Matthias Blume

[2002/03/15]

Provided (preliminary?) implementations for

{String,Substring}.{concatWith,isSuffix,isSubstring}

and

Substring.full

Those are in the Basis spec but they were missing in SML/NJ.

Matthias Blume

[2002/03/14]

Controls:

Factored out the recently-added Controls : CONTROLS stuff and put it into its own library $/controls-lib.cm. The source tree for this is under src/smlnj-lib/Controls.
Changed the names of types and functions in this interface, so they make a bit more "sense":
```
module -> registry
'a registry -> 'a group
```
The interface now deals in ref cells only. The getter/setter interface is (mostly) gone.
Added a function that lets one register an already-existing ref cell.
Made the corresponding modifications to the rest of the code so that everything compiles again.
Changed the implementation of Controls.MLRISC back to something closer to the original. In particular, this module (and therefore MLRISC) does not depend on Controls. There now is some link-time code in int-sys.sml that registers the MLRISC controls with the Controls module.

CM:
- One can now specify the lambda-split aggressiveness in init.cmi.
Matthias Blume

[2002/03/13]

Bug fix for:

> leunga@weaselbane:~/Yale/tmp/sml-dist{21} bin/sml
> Standard ML of New Jersey v110.39.1 [FLINT v1.5], March 08, 2002
> - fun f(x,(y,z)) = Real.~ y;
> [autoloading]
> [autoloading done]
>	fchsl	(%eax), 184(%esp)
> Error: MLRisc bug: X86MCEmitter.emitInstr
>
> uncaught exception Error
>   raised at: ../MLRISC/control/mlriscErrormsg.sml:16.14-16.19

The problem was that the code generator did not generate any fp registers in this case, and the ra didn’t know that it needed to run the X86FP phase to translate the pseudo fp instruction. This only happened with unary fp operators in certain situations.

Allen Leung

[2002/03/13]

Added _overload as a synonym for overload for backward compatibility. (Control.overloadKW must be true for either version to be accepted.)
Fixed bug in install script that caused more things to be installed than what was requested in config/targets.
Made CM aware of the (_)overload construct so that autoloading works.

Matthias Blume

[2002/03/12]: Forgot to update BOOT and srcarchiveurl.

Matthias Blume

Version 110.39.2; 2002/03/12

[2002/03/12]

Yet another version number bump (because of small changes to the binfile format). Version number is now 110.39.2. NEW BOOTFILES!

Changes:

The new pid generation scheme described a few weeks ago was overly
complicated.  I implemented a new mechanism that is simpler and
provides a bit more "stability":  Once CM has seen a compilation
unit, it keeps its identity constant (as long as you do not delete
those crucial CM/GUID/* files).  This means that when you change
an interface, compile, then go back to the old interface, and
compile again, you arrive at the original pid.

There now also is a mechanism that instructs CM to use the plain
environment hash as a module's pid (effectively making its GUID
the empty string).  For this, "noguid" must be specified as an
option to the .sml file in question within its .cm file.
This is most useful for code that is being generated by tools such
as ml-nlffigen (because during development programmers tend to
erase the tool's entire output directory tree including CM's cached
GUIDs).  "noguid" is somewhat dangerous (since it can be used to locally
revert to the old, broken behavior of SML/NJ, but in specific cases
where there is no danger of interface confusion, its use is ok
(I think).

ml-nlffigen by default generates "noguid" annotations.  They can be
turned off by specifying -guid in its command line.

Matthias Blume

[2002/03/12]: Integrated jump chaining and static block frequency into the compiler. More details and numbers later.

Lal George

[2002/03/11]

Tested the jump chain elimination on all architectures (except the hppa). This is on by default right now and is profitable for the alpha and x86, however, it may not be profitable for the sparc and ppc when compiling the compiler.

The gc test will typically jump to a label at the end of the cluster, where there is another jump to an external cluster containing the actual code to invoke gc. This is to allow factoring of common gc invocation sequences. That is to say, we generate:

f:
   testgc
   ja	L1	% jump if above to L1

L1:
   jmp L2

+ After jump chain elimination the 'ja L1' instructions is converted to 'ja L2'. On the sparc and ppc, many of the 'ja L2' instructions may end up being implemented in their long form (if L2 is far away) using:

+ jbe L3 % jump if below or equal to L3 jmp L2 L3: …

+ For large compilation units L2 may be far away.

+ Lal George

[2002/03/11]: A functor parameter was missing.

Matthias Blume

[2002/03/11]

   The representation of the empty string now points to a
legal null terminated C string instead of unit.  It is now possible
to convert an ML string into C string with InlineT.CharVector.getData.
This compiles into one single machine instruction.

Allen Leung

[2002/03/10]

Added machine generation for CALL instruction (relative displacement mode)

Allen Leung

Version 110.39.1; 2002/03/08

[2002/03/08]

Entrypoints: non-zero offset into a code object where execution should begin.

Added the notion of an entrypoint to CodeObj.
Added reading/writing of entrypoint info to Binfile.
Made runtime system bootloader aware of entrypoints.
Use the address of the label of the first function given to mlriscGen as the entrypoint. This address is currently always 0, but it will not be 0 once we turn on block placement.
Removed the linkage cluster code (which was The Other Way(tm) of dealing with entry points) from mlriscGen.

Matthias Blume

[2002/03/07]

Bug fixes for CMOVcc on x86.

Added machine code generation for CMOVcc
CMOVcc is now generated in preference over SETcc on PentiumPro or above.
CMOVcc cannot have an immediate operand as argument.

Allen Leung

[2002/03/07]

This is a very large but mostly boring patch which makes (almost) every tuneable compiler knob (i.e., pretty much everything under Control.* plus a few other things) configurable via both the command line and environment variables in the style CM did its configuration until now.

Try starting sml with '-h' (or, if you are brave, '-H')

To this end, I added a structure Controls : CONTROLS to smlnj-lib.cm which implements the underlying generic mechanism.

The interface to some of the existing such facilities has changed somewhat. For example, the MLRiscControl module now provides mkFoo instead of getFoo. (The getFoo interface is still there for backward-compatibility, but its use is deprecated.)

The ml-build script passes -Cxxx=yyy command-line arguments through so that one can now twiddle the compiler settings when using this "batch" compiler.

TODO items:

We should go through and throw out all controls that are no longer connected to anything. Moreover, we should go through and provide meaningful (and correct!) documentation strings for those controls that still are connected.

Currently, multiple calls to Controls.new are accepted (only the first has any effect). Eventually we should make sure that every control is being made (via Controls.new) exactly once. Future access can then be done using Controls.acc.

Finally, it would probably be a good idea to use the getter-setter interface to controls rather than ref cells. For the time being, both styles are provided by the Controls module, but getter-setter pairs are better if thread-safety is of any concern because they can be wrapped.

+ One bug fix: The function blockPlacement in three of the MLRISC backpatch files used to be hard-wired to one of two possibilities at link time (according to the value of the placementFlag). But (I think) it should rather sense the flag every time.

Other assorted changes (by other people who did not supply a HISTORY entry):

the cross-module inliner now works much better (Monnier)
representation of weights, frequencies, and probabilities in MLRISC changed in preparation of using those for weighted block placement (Reppy, George)

Matthias Blume

[2002/03/07]: Tested the weighted block placement optimization on all architectures (except the hppa) using AMPL to generate the block and edge frequencies. Changes were required in the machine properties to correctly categorize trap instructions. There is an MLRISC flag "weighted-block-placement" that can be used to enable weighted block placement, but this will be ineffective without block/edge frequencies (coming soon).

+ Lal George

[2002/03/05]: In order to support the block placement optimization, a new cluster is generated as the very first cluster (called the linkage cluster). It contains a single jump to the 'real' entry point for the compilation unit. Block placement has no effect on the linkage cluster itself, but all the other clusters have full freedom in the manner in which they reorder blocks or functions.

On the x86 the typical linkage code that is generated is: ---------------------- .align 2 L0: addl $L1-L0, 72(%esp) jmp L1

+ .align 2 L1: ----------------------

+ 72(%esp) is the memory location for the stdlink register. This must contain the address of the CPS function being called. In the above example, it contains the address of L0; before calling L1 (the real entry point for the compilation unit), it must contain the address for L1, and hence

+ addl $L1-L0, 72(%esp)

+ I have tested this on all architectures except the hppa.The increase in code size is of course negligible

+ Lal George

[2002/03/03]: Added #[ … ] expressions to mlrisc tools

Allen Leung

[2002/02/27]

made types in structure C and C_Debug to be equal
got rid of code duplication (c-int.sml vs. c-int-debug.sml)
there no longer is a C_Int_Debug (C_Debug is directly derived from C)

Matthias Blume

[2002/02/26]

Fixed a minor bug in CM’s "noweb" tool: If numbering is turned off, then truly don’t number (i.e., do not supply the -L option to noweb). The previous behavior was to supply -L'' — which caused noweb to use the "default" line numbering scheme. Thanks to Chris Richards for pointing this out (and supplying the fix).
Once again, I reworked some aspects of the FFI:
1. The incomplete/complete type business:
  - Signatures POINTER_TO_INCOMPLETE_TYPE and accompanying functors are gone!
  - ML types representing an incomplete type are now equal to ML types representing their corresponding complete types (just like in C). This is still safe because ml-nlffigen will not generate RTTI for incomplete types, nor will it generate functions that require access to such RTTI. But when ML code generated from both incomplete and complete versions of the C type meet, the ML types are trivially interoperable.
    
    NOTE: These changes restore the full generality of the translation (which was previously lost when I eliminated functorization)!
2. Enum types:
  - Structure C now has a type constructor "enum" that is similar to how the "su" constructor works. However, "enum" is not a phantom type because each "T enum" has values (and is isomorphic to MLRep.Signed.int).
  - There are generic access operations for enum objects (using MLRep.Signed.int).
  - ml-nlffigen will generate a structure E_foo for each "enum foo".
    
    The structure contains the definition of type "mlrep" (the ML-side representation type of the enum). Normally, mlrep is the same as "MLRep.Signed.int", but if ml-nlffigen was invoked with "-ec", then mlrep will be defined as a datatype — thus facilitating pattern matching on mlrep values. ("-ec" will be suppressed if there are duplicate values in an enumeration.)
    
    Constructors ("-ec") or values (no "-ec") e_xxx of type mlrep will be generated for each C enum constant xxx.
    
    Conversion functions m2i and i2m convert between mlrep and MLRep.Signed.int. (Without "-ec", these functions are identities.)
    
    Coversion functions c and ml convert between mlrep and "tag enum".
    
    Access functions (get/set) fetch and store mlrep values.
  - By default (unless ml-nlffigen was invoked with "-nocollect"), unnamed enumerations are merged into one single enumeration represented by structure E_'.
Matthias Blume

[2002/02/25]

This is a new implementation of the CPS spill phase. The new phase is in the new file compiler/CodeGen/cpscompile/spill-new.sml In case of problems, replace it with the old file spill.sml

The current compiler runs into some serious performance problems when constructing a large record. This can happen when we try to compile a structure with many items. Even a very simple structure like the following makes the compiler slow down.

    structure Foo = struct
       val x_1 = 0w1 : Word32.int
       val x_2 = 0w2 : Word32.int
       val x_3 = 0w3 : Word32.int
       ...
       val x_N = 0wN : Word32.int
    end

The following table shows the compile time, from N=1000 to N=4000, with the old compiler:

N 1000 CPS 100 spill 0.04u 0.00s 0.00g MLRISC ra 0.06u 0.00s 0.05g (spills = 0 reloads = 0) TOTAL 0.63u 0.07s 0.21g

1100 CPS 100 spill 8.25u 0.32s 0.64g MLRISC ra 5.68u 0.59s 3.93g (spills = 0 reloads = 0) TOTAL 14.71u 0.99s 4.81g

1500 CPS 100 spill 58.55u 2.34s 1.74g MLRISC ra 5.54u 0.65s 3.91g (spills = 543 reloads = 1082) TOTAL 65.40u 3.13s 6.00g

2000 CPS 100 spill 126.69u 4.84s 3.08g MLRISC ra 0.80u 0.10s 0.55g (spills = 42 reloads = 84) TOTAL 129.42u 5.10s 4.13g

3000 CPS 100 spill 675.59u 19.03s 11.64g MLRISC ra 2.69u 0.27s 1.38g (spills = 62 reloads = 124) TOTAL 682.48u 19.61s 13.99g

4000 CPS 100 spill 2362.82u 56.28s 43.60g MLRISC ra 4.96u 0.27s 2.72g (spills = 85 reloads = 170) TOTAL 2375.26u 57.21s 48.00g

As you can see the old cps spill module suffers from some serious performance problem. But since I cannot decipher the old code fully, instead of patching the problems up, I’m reimplementing it with a different algorithm. The new code is more modular, smaller when compiled, and substantially faster (O(n log n) time and O(n) space). Timing of the new spill module:

4000 CPS 100 spill 0.02u 0.00s 0.00g MLRISC ra 0.25u 0.02s 0.15g (spills=1 reloads=3) TOTAL 7.74u 0.34s 1.62g

Implementation details:

As far as I can tell, the purpose of the CPS spill module is to make sure the number of live variables at any program point (the bandwidth) does not exceed a certain limit, which is determined by the size of the spill area.

When the bandwidth is too large, we decrease the register pressure by packing live variables into spill records. How we achieve this is completely different than what we did in the old code.

First, there is something about the MLRiscGen code generator that we should be aware of:

MLRiscGen performs code motion!

In particular, it will move floating point computations and
address computations involving only the heap pointer to
their use sites (if there is only a single use).
What this means is that if we have a CPS record construction
statement

       RECORD(k,vl,w,e)

we should never count the new record address w as live if w
has only one use (which is often the case).

We should do something similar to floating point, but the transformation
there is much more complex, so I won't deal with that.

Secondly, there are now two new cps primops at our disposal:

rawrecord of record_kind option This pure operator allocates some uninitialized storage from the heap. There are two forms:

rawrecord NONE [INT n]  allocates a tagless record of length n
rawrecord (SOME rk) [INT n] allocates a tagged record of length n
                            and initializes the tag.

rawupdate of cty rawupdate cty (v,i,x) Assigns to x to the ith component of record v. The storelist is not updated.

We use these new primops for both spilling and increment record construction.

Spilling.

This is implemented with a linear scan algorithm (but generalized
to trees).  The algorithm will create a single spill record at the
beginning of the cps function and use rawupdate to spill to it,
and SELECT or SELp to reload from it.  So both spills and reloads
are fine-grain operations.  In contrast, in the old algorithm
"spills" have to be bundled together in records.

Ideally, we should sink the spill record construction to where
it is needed.  We can even split the spill record into multiple ones
at the places where they are needed.  But CPS is not a good
representation for global code motion, so I'll keep it simple and
am not attempting this.

Incremental record construction (aka record splitting).

Long records with many component values which are simulatenously live
(recall that single use record addresses are not considered to
 be live) are constructed with rawrecord and rawupdate.
We allocate space on the heap with rawrecord first, then gradually
fill it in with rawupdate.  This is the technique suggested to me
by Matthias.

Some restrictions on when this is applicable:
a. It is not a VECTOR record.  The code generator currently does not handle
   this case. VECTOR record uses double indirection like arrays.
b. All the record component values are defined in the same "basic block"
   as the record constructor.  This is to prevent speculative
   record construction.

Allen Leung

[2002/02/22]: Minor bug fixes in the parser and rewriter

Allen Leung

[2002/02/21]: Regenerated the peephole files. Some contained typos in the specification and some didn’t compile because of pretty printing bugs in the old version of 'nowhere'.

Allen Leung

[2002/02/19]

Minor bug fixes to the mlrisc-tools library:

Fixed up parsing colon suffixed keywords
Added the ability to shut the error messages up
Reimplemented the pretty printer and fixed up/improved the pretty printing of handle and → types.
Fixed up generation of literal symbols in the nowhere tool.
Added some SML keywords to to sml.sty

Allen Leung

[2002/02/19]

A wild mix of changes, some minor, some major:

All C FFI-related libraries are now anchored under $c: $/c.cm -→ $c/c.cm $/c-int.cm -→ $c/internals/c-int.cm $/memory.cm -→ $c/memory/memory.cm
"make" tool (in CM) now treats its argument pathname slightly differently:
1. If the native expansion is an absolute name, then before invoking the "make" command on it, CM will apply OS.Path.mkRelative (with relativeTo = OS.FileSys.getDir()) to it.
2. The argument will be passed through to subsequent phases of CM processing without "going native". In particular, if the argument was an anchored path, then "make" will not lose track of that anchor.
Compiler backends now "know" their respective C calling conventions instead of having to be told about it by ml-nlffigen. This relieves ml-nlffigen from one of its burdens.
The X86Backend has been split into X86CCallBackend and X86StdCallBackend.
Export C_DEBUG and C_Debug from $c/c.cm.
C type encoding in ml-nlffi-lib has been improved to model the conceptual subtyping relationship between incomplete pointers and their complete counterparts. For this, ('t, 'c) ptr has been changed to 'o ptr — with the convention of instantiating 'o with ('t, 'c) obj whenever the pointer target type is complete. In the incomplete case, 'o will be instantiated with some "'c iobj" — a type obtained by using one of the functors PointerToIncompleteType or PointerToCompleteType.
```
Operations that work on both incomplete and complete pointer types are
typed as taking an 'o ptr while operations that require the target to
be known are typed as taking some ('t, 'c) obj ptr.
```
```
voidptr is now a bit "more concrete", namely "type voidptr = void ptr'"
where void is an eqtype without any values.  This makes it possible
to work on voidptr values using functions meant to operate on light
incomplete pointers.
```
As a result of the above, signature POINTER_TO_INCOMPLETE_TYPE has been vastly simplified.

Matthias Blume

[2002/02/19]: Applied Chris Okasaki’s bug fix for priority queues.

Matthias Blume

Version 110.39; 2002/02/15

[2002/02/15]: Added EnvRef.listBoundSymbols and CM.State.showBindings. Especially the latter can be useful for exploring what bindings are available at the interactive prompt. (The first function returns only the list of symbols that are really bound, the second prints those but also the ones that CM’s autoloading mechanism knows about.)

Matthias Blume

[2002/02/15]

Two improvements to ml-nlffigen:

Write files only if they do not exist or if their current contents do not coincide with what’s being written. (That is, avoid messing with the time stamps unless absolutely necessary.)
Implement a "repository" mechanism for generated files related to "incomplete pointer types". See the README file for details.

Matthias Blume

[2002/02/14]: Added a type 't t_' to tag.sml (in ml-nlffi-lib.cm). This is required because of the new and improved tag generation scheme. (Thanks to Allen Leung for pointing it out.)

Matthias Blume

[2002/02/14]

Fixed the MLRISC bug sent by Markus Wenzel regarding the compilation of Isabelle on the x86.

+ From Allen:

+ * I’ve found the problem:

+ in ra-core.sml, I use the counter "blocked" to keep track of the true number of elements in the freeze queue. When the counter goes to zero, I skip examining the queue. But I’ve messed up the bookkeeping in combine():

         else ();
         case !ucol of
           PSEUDO => (if !cntv > 0 then
                 (if !cntu > 0 then blocked := !blocked - 1 else ();
                                    ^^^^^^^^^^^^^^^^^^^^^^^
                  moveu := mergeMoveList(!movev, !moveu)
                 )
              else ();

+ combine() is called to coalesce two nodes u and v. I think I was thinking that if the move counts of u and v are both greater than zero then after they are coalesced then one node is removed from the freeze queue. Apparently I was thinking that both u and v are of low degree, but that’s clearly not necessarily true.

+ * 02/12/2002:

+ Here’s the patch. HOL now compiles.

+ I don’t know how this impact on performance (compile time or runtime). This bug caused the RA (especially on the x86) to go thru the potential spill phase when there are still nodes on the freeze queue.

Lal George

[2002/02/13]: Fixed a bug in ml-nlffigen that was introduced with one of the previous updates.

Matthias Blume

[2002/02/13]: Added new priority queue export symbols (which have just been added to smlnj-lib.cm) to CML’s version of smlnj-lib.cm. (Otherwise CML would not compile and the installer would choke.)

Matthias Blume

[2002/02/13]

More tweaks to ml-nlffigen:
- better internal datastructures (resulting in slight speedup)
- "-match" option requires exact match
- "localized" gensym counters (untagged structs/unions nested within other structs/unions or within typedefs get a fresh counter; their tag will be prefixed by a concatenation of their parents' tags)
- bug fixes (related to calculation of transitive closure of types to be included in the output)
Minor Basis updates:
- added implementations for List.collate and Option.app

Matthias Blume

[2002/02/11]: Added a "-gensym" option to command line of ml-nlffigen. This can be used to specify a "stem" — a string that is inserted in all "gensym’d" names (ML structure names that correspond to unnamed C structs, unions, and enums), so that separate runs of ml-nlffigen do not clash.

Matthias Blume

[2002/02/11]: A quick fix for a problem with GenSML (in the pgraph-util library): Make generation of toplevel "local" optional. (Strictly speaking, signature definitions within "local" are not legal SML.)

Other than that: updates to INSTALL and cm/TODO.

Matthias Blume

Version 110.38.1; 2002/02/08

[2002/02/08]

The installer (config/install.sh) has gotten smarter:

Configuration options are a bit easier to specify now (in config/targets).
Bug in recognizing .tar.bz2 files fixed.
Installer automatically resolves dependencies between configuration options (e.g., if you ask for eXene, you will also get cml — regardless whether you asked for it or not).
Installer can run in "quieter mode" by setting the environment variable INSTALL_QUIETLY to "true". "Quieter" does not mean "completely silent", though.
Build HashCons library as part of smlnj-lib.

+ . A new scheme for assigning persistent identifiers to compilation units (and, by extension, to types etc.) has been put into place. This fixes a long-standing bug where types and even dynamic values can get internally confused, thereby compromising type safety (abstraction) and dynamic correctness. See http://cm.bell-labs.com/cm/cs/who/blume/pid-confusion.tgz for an example of how things could go wrong until now.

+ The downside of the new scheme is that pids are not quite as persistent as they used to be: CM will generate a fresh pid for every compilation unit that it thinks it sees for the first time. That means that if you compile starting from a clean, fresh source tree at two different times, you end up with different binaries.

+ Cutoff recompilation, however, has not been compromised because CM keeps pid information in special caches between runs.

+ Matthias Blume

[2002/02/07]: Compilers that generate assembly code may produce global labels whose value is resolved at link time. The various peephole optimization modules did not take this in account.

TODO. The Labels.addrOf function should really return an option type so that clients are forced to deal with this issue, rather than an exception being raised.

Lal George

[2002/02/06]

A bug fix from Allen: A typo causes extra fstp %st(0) instructions to be generated at compensation edges, which might cause stack underflow traps at runtime. This occurs in fft where there are extraneous fstps right before the into trap instruction (in this case they are harmless since none of the integers overflow.)
Pulled out various utility modules that were embedded in the modules of the register allocator. I need these modules for other purposes, but they are not complete enough to put into a library (just yet).

Lal George

[2002/01/31]

Fixed a bug where C-calls on SPARC needlessly allocated a huge chunk (96 bytes) of extra stack space by mistake.
Bug in logic of handling of command-line options in ml-nlffigen fixed.

Matthias Blume

[2002/01/30]: MLRISC bug fixes:

Fixed a bindings computation bug in the 'nowhere' program generator tool.
MachineInt.fromString was negating its value.

+ Allen Leung

[2002/01/29]

Added somewhat detailed installation instructions (file INSTALL).
Fixed curl-detection bug in config/install.sh.

It is now possible to select the URL getter using the URLGETTER environment variable:

not set / "unknown"      --> automatic detection (script tries wget,
                             curl, and lynx)
"wget" / "curl" / "lynx" --> use the specified program (script "knows"
                             how to properly invoke them)
other                    --> use $URLGETTER directly, it must take
                             precisely two command-line arguments
                             (source URL and destination file name)

Matthias Blume

[2002/01/28]

Fixed problem with calculation of "used" registers in sparc-c-calls.
Make use of the allocParam argument in sparc-c-calls.

Matthias Blume

[2002/01/28]: John Reppy: Changes c-calls API to accept client-callback for allocating extra stack space. me: Corresponding changes to mlriscGen (using a dummy argument that does not change the current behavior).

Matthias Blume

Version 110.38; 2002/01/28

[2002/01/28]

Retracted earlier 110.38. (The Release_110_38 tag has been replaced with blume-Release_110_38-retracted.)
Fixed a problem with incorrect rounding modes in real64.sml. (Thanks to Andrew Mccreight <andrew.mccreight@yale.edu>.)
A bug in ml-nlffigen related to the handling of unnamed structs, unions, and enums fixed. The naming of corresponding ML identifiers should now be consistent again.

Matthias Blume

[2002/01/27]

Added a target called nowhere in the configuration scripts.
Enabling this will build the MLRISC 'nowhere' tool (for translating
programs with where-clauses into legal SML code) during installation.

Allen Leung

[2002/01/25]: Call it a (working) release! Version is 110.38. Bootfiles are ready.

README will be added later.

!!! NOTE: Re-tagged as blume-Release_110_38-retracted. Original tag (Release_110_38) removed. Reason: Last-minute bug fixes.

Matthias Blume

[2002/01/25]

A large number of tweaks and improvements to ml-nlffi-lib and ml-nlffigen:

ML represenation types have been streamlined
getter and setter functions work with concrete values, not abstract ones where possible
ml-nlffigen command line more flexible (see README file there)
some bugs have been fixed (hopefully)

Matthias Blume

[2002/01/24]

There is a dramatic simplification in the interface to the
register allocator for RISC architectures as a result of making
parallel copy instructions explicit.

Lal George

[2002/01/22]: Bug fix for c-calls on x86 (having to do with how char- and short-arguments are being handled).

Matthias Blume

[2002/01/21]

Another day of fiddling with the FFI…

Bug fix/workaround: CKIT does not complain about negative array dimensions, so ml-nlffigen has to guard itself against this possibility. (Otherwise a negative dimension would send it into an infinite loop.)
Some of the abstract types (light objects, light pointers, most "base" types) in structure C are now eqtypes.
Added constructors and test functions for NULL function pointers.

Matthias Blume

[2002/01/18]: Made config/srcarchiveurl point to a new place. (Will provide boot files shortly.)

Maybe we christen this to be 110.38?

Matthias Blume

[2002/01/18]

Today’s FFI fiddling:

Provided a structure CGetSet with "convenient" versions of C.Get.* and C.Set.* that use concrete (MLRep.*) arguments and results instead of abstract ones.
Provided word-style bit operations etc. for "int" representation types in MLRep.S<Foo>Bitops where <Foo> ranges over Char, Int, Short, and Long.

Matthias Blume

[2002/01/18]: Now that x86-fast-fp seems to be working, I turned it back on again by default. (Seems to work fine now, even with the FFI.)

Other than that, I added some documentation about the FFI to src/ml-nlffigen/README and updated the FFI test examples in src/ml-nlffi-lib/Tests/*.

Matthias Blume

[2002/01/17]

Fixed a problem with handling return fp values when x86’s fast fp mode is turned on.
Minor pretty printing fix for cellset. Print %st(0) as %st(0) instead of %f32.
Added a constructor INT32lit to the ast of MLRISC tools.

Allen Leung

[2002/01/16]

More fiddling with the FFI interface:

Make constness 'c instead of rw wherever possible. This eliminates the need for certain explicit coercions. (However, due to ML’s value polymorphism, there will still be many cases where explicit coercions are necessary. Phantom types are not the whole answer to modeling a subtyping relationship in ML.)
ro/rw coersions for pointers added. (Avoids the detour through */&.)
"printf" test example added to src/ml-nlffi-lib/Tests. (Demonstrates clumsy workaround for varargs problem.)

Matthias Blume

[2002/01/15]

Since COPY instructions are no longer native to the architecture, a generic functor can be used to implement the expandCopies function.
Allowed EXPORT and IMPORT pseudo-op declarations to appear inside a TEXT segment.

Lal George

[2002/01/15]

Fix for bug resulting in single-precision float values being returned incorrectly from FFI calls.
Small modifications to C FFI API:
- memory-allocation routines return straight objects (no options) and raise an exception in out-of-memory situations
- unsafe extensions to cast between function pointers and pointers from/to ints
- added structure C_Debug as an alternative to structure C where pointer-dereferencing (|| and |!) always check for null-pointers
- added open_lib' to DynLinkage; open_lib' works like open_lib but also takes a (possibly empty) list of existing library handles that the current library depends on
Matthias Blume

[2002/01/10]

Updates to portable graph code.
Major update to ml-nlffigen and ml-nlffi-lib. Things are much more scalable now so that even huge interfaces such as the one for GTK compile in finite time and space. :-) See src/ml-nlffigen/README for details on what’s new.

Matthias Blume

[2001/01/09]

Removed the native COPY and FCOPY instructions
from all the architectures and replaced it with the
explicit COPY instruction from the previous commit.

It is now possible to simplify many of the optimizations
modules that manipulate copies. This has not been
done in this change.

Lal George

[2001/12/06]

Changed the representation of instructions from being fully abstract to being partially concrete. That is to say:

 from
type instruction

 to
type instr				(* machine instruction *)

datatype instruction =
    LIVE of {regs: C.cellset, spilled: C.cellset}
         | KILL of {regs: C.cellset, spilled: C.cellset}
         | COPYXXX of {k: CB.cellkind, dst: CB.cell list, src: CB.cell list}
         | ANNOTATION of {i: instruction, a: Annotations.annotation}
         | INSTR of instr

This makes the handling of certain special instructions that appear on all architectures easier and uniform.

LIVE and KILL say that a list of registers are live or killed at the program point where they appear. No spill code is generated when an element of the 'regs' field is spilled, but the register is moved to the 'spilled' (which is present, more for debugging than anything else).

LIVE replaces the (now deprecated) DEFFREG instruction on the alpha. We used to generate:

DEFFREG f1
f1 := f2 + f3
       trapb

but now generate:

f1 := f2 + f3
trapb
LIVE {regs=[f1,f2,f3], spilled=[]}

Furthermore, the DEFFREG (hack) required that all floating point instruction use all registers mentioned in the instruction. Therefore f1 := f2 + f3, defines f1 and uses [f1,f2,f3]! This hack is no longer required resulting in a cleaner alpha implementation. (Hopefully, intel will not get rid of this architecture).

COPYXXX is intended to replace the parallel COPY and FCOPY available on all the architectures. This will result in further simplification of the register allocator that must be aware of them for coalescing purposes, and will also simplify certain aspects of the machine description that provides callbacks related to parallel copies.

ANNOTATION should be obvious, and now INSTR represents the honest to God machine instruction set!

The <arch>/instructions/<arch>Instr.sml files define certain utility functions for making porting easier — essentially converting upper case to lower case. All machine instructions (of type instr) are in upper case, and the lower case form generates an MLRISC instruction. For example on the alpha we have:

datatype instr =
   LDA of {r:cell, b:cell, d:operand}
 | ...

val lda : {r:cell, b:cell, d:operand} -> instruction
  ...

where lda is just (INSTR o LDA), etc.

Lal George

Version 110.37; 2001/11/22

[2001/11/21]: Removed the "Release_110_37" tag because of a serious bug. This will be re-tagged once the bug is fixed.

Matthias Blume

[2001/11/21]: Forgot to add a file. (Just a .tex-file — part of the CM manual source.)

Matthias Blume

[2001/11/21]

Note: I removed the original tag "Release_110_37" from this commit because we found a serious bug in all non-x86 backends. - Matthias

Modifications to the SML/NJ code generator and to the runtime system so that code object name strings are directly inserted into code objects at code generation time. The only business the runtime system has with this is now to read the name strings on occasions. (The encoding of the name string has also changed somewhat.)
CM now implements a simple "set calculus" for specifying export lists. In particular, it is now possible to refer to the export lists of other libraries/groups/sources and form unions as well as differences. See the latest CM manual for details.
An separate notion of "proxy" libraries has again be eliminated from CM’s model. (Proxy libraries are now simply a special case of using the export list calculus.)
Some of the existing libraries now take advantage of the new set calculus. (Notice that not all libraries have been converted because some of the existing .cm-files are supposed to be backward compatible with 110.0.x.)
Some cleanup in stand-alone programs. (Don’t use "exnMessage" — use "General.exnMessage"! The former relies on a certain hook to be initialized, and that often does not happen in the stand-alone case.)

Matthias Blume

[2001/11/21]

Implemented a complete redesign of MLRISC pseudo-ops. Now there
ought to never be any question of incompatabilities with
pseudo-op syntax expected by host assemblers.

For now, only modules supporting GAS syntax are implemented
but more should follow, such as MASM, and vendor assembler
syntax, _e.g._ IBM as, Sun as, etc.

Lal George

[2001/11/14]

Routed the name of the current source file to mlriscgen where it should be directly emitted into the code object. (This last part is yet to be done.)
Some cleanup of the pgraph code to make it match the proposal that I put out the other day. (The proposal notwithstanding, things are still in flux here.)

Matthias Blume

[2001/11/14]

Fix for a backpatching bug reported by Allen.

Because the boundary between short and long span-dependent
instructions is +/- 128, there are an astounding number of
span-dependent instructions whose size is over estimated.

Allen came up with the idea of letting the size of span
dependent instructions be non-monotonic, for a maxIter
number of times, after which the size must be monotonically
increasing.

This table shows the number of span-dependent instructions
whose size was over-estimated as a function of maxIter, for the
file Parse/parse/ml.grm.sml:

   maxIter		# of instructions:
10			687
20			438
30			198
      40			  0

In compiling the compiler, there is no significant difference in
compilation speed between maxIter=10 and maxIter=40. Actually,
my measurements showed that maxIter=40 was a tad faster than
maxIter=10! Also 96% of the  files in the compiler reach a fix
point within 13 iterations, so fixing maxIter at 40, while high,
is okay.

Lal George

[2001/10/31]

CKIT: * Changed the "Function" constructor of type Ast.ctype to carry optional argument identifiers. * Changed the return type of TypeUtil.getFunction accordingly. * Type equality ignores the argument names. * TypeUtil.composite tries to preserve argument names but gives up quickly if there is a mismatch.

+ installation script:

attempts to use "curl" if available (unless "wget" is available as well)

CM:
has an experimental implementation of "portable graphs" which I will soon propose as an implementation-independent library format
there are also new libraries $/pgraph.cm and $/pgraph-util.cm

NLFFI-LIB:
some cleanup (all cosmetic)

NLFFIGEN:
temporarily disabled the mechanism that suppresses ML output for C definitions whose identifiers start with an underscore character
generate val bindings for enum constants
user can request that only one style (light or heavy) is being used; default is to use both (command-line arguments: -heavy and -light)
fixed bug in handling of function types involving incomplete pointers
generate ML entry points that take record arguments (i.e., using named arguments) for C functions that have a prototype with named arguments (see changes to CKIT)

Matthias Blume

[2001/10/27]

Fixed the bug described in blume-20010920-slowfp.

The fix involves
   1. generating FCOPYs in FSTP in ia32-svid
   2. marking a CALL with the appropriate annotation

Allen Leung

[2001/10/16]: Underscore patch from Chris Richards (fixing problem with compiling runtime system under recent NetBSD).

Matthias Blume

[2001/10/12]: X86RA now uses a valid (instead of dummy) PrintFlowgraph module.

Allen Leung

[2001/10/11]: The representation of a program point never expected to see more than 65536 instructions in a basic block!

Lal George

[2001/10/09]: Changed the machine description files to support printing of local and global labels in assembly code, based on host assembler conventions.

Lal George

[2001/09/25]: I provided a non-hook implementation of exnName (at the toplevel) and made the "dummy" implementation of exnMessage (at the toplevel) more useful: if nothing gets "hooked in", then at least you are going to see the exception name and a message indicating why you don’t see more.

[For the time being, programs that need exnMessage and want to use ml-build should either use General.exnMessage (strongly recommended) or refer to structure General at some other point so that CM sees a static dependency.]

[Similar remarks go for "print" and "use": If you want to use their functionality in stand-alone programs generated by ml-build, then use TextIO.output and Backend.Interact.useFile (from $smlnj/compiler.cm).]

Matthias Blume

[2001/09/20]: Allen says that x86-fast-fp is not safe yet, so I turned it off again…

Matthias Blume

[2001/09/20]

Updated the BOOT file (something that I forgot to do earlier).
Small internal change to CM so that it avoids "/../" in filenames as much as possible (but only where it is safe).
Changed config/_run-sml (resulting in a changed bin/.run-sml) so that arguments that contain delimiters are passed through correctly. This change also means that all "special" arguments of the form @SMLxxx… must come first.
Changed install script to put relative anchor names for tool commands into pathconfig.

Matthias Blume

Version 110.36; 2001/09/18

[2001/09/14]: John committed some changes that Allen made, in particular a (hopefully) correctly working version of the x86-fp module.

I changed the default setting of the Control.MLRISC.getFlag "x86-fast-fp" flag to "true". Everything seems to compile to a fixpoint ok, and "mandelbrot" speeds up by about 15%.

Matthias Blume

[2001/09/13]

Stefan Monnier’s patch to fix a miscompilation problem that was brought to light by John Reppy’s work on Moby.
Implemented a minimal "structure Compiler" that contains just "version" and "architecture". The minimal version will be available when the full version is not. This is for backward- compatibility with code that wants to test Compiler.version.

Matthias Blume

[2001/08/28]: Fix for bug 1581, received from Neophytos Michael.

Matthias Blume

Version 110.35; 2001/08/24

[2001/08/24]

removed clusters from MLRISC completely and replaced with graphs.

Lal George

[2001/08/23]

some reorganization of the code that implements various kinds of environments in the compiler (static, dynamic, symbolic, combined)
re-implemented the EnvRef module so that evalStream works properly (if the stream contains references to "use", "CM.make", etc.)
cleaned up evalloop.sml and interact.sml (but they need more cleaning)

Matthias Blume

[2001/08/20]: I forgot to commit a few files. Here they are…

Matthias Blume

[2001/08/20]

!!!! NEW BOOTFILES !!!!

This is another round of reorganizing the compiler sources. This time the main goal was to factor out all the "instrumentation" passes (for profiling and backtracing) into their own library. The difficulty was to do it in such a way that it does not depend on elaborate.cm but only on elabdata.cm.

Therefore there have been further changes to both elaborate.cm and elabdata.cm — more "generic" things have been moved from the former to the latter. As a result, I was forced to split the assignment of numbers indicating "primtyc"s into two portions: SML-generic and SML/NJ-specific. Since it would have been awkward to maintain, I bit the bullet and actually changed the mapping between these numbers and primtycs. The bottom line of this is that you need a new set of bin- and bootfiles.

I have built new bootfiles for all architectures, so doing a fresh checkout and config/install.sh should be all you need.

The newly created library’s name is

$smlnj/viscomp/debugprof.cm

and its sources live under

src/compiler/DebugProf

Matthias Blume

[2001/08/15]

This is a first cut at reorganizing the CM libraries that make up the core of the compiler. The idea is to separate out pieces that could be used independently by tools, e.g., the parser, the typechecker, etc.

The current status is a step in this direction, but it is not quite satisfactory yet. Expect more changes in the future.

Here is the current (new) organization…

What used to be $smlnj/viscomp/core.cm is now divided into
six CM libraries:

$smlnj/viscomp/basics.cm
              /parser.cm
              /elabdata.cm
              /elaborate.cm
              /execute.cm
              /core.cm

The CM files for these libraries live under src/system/smlnj/viscomp.
All these libraries are proxy libraries that contain precisely
one CM library component.  Here are the locations of the components
(all within the src/compiler tree):

Basics/basics.cm
Parse/parser.cm
ElabData/elabdata.cm
Elaborator/elaborate.cm
Execution/execute.cm
core.cm

[This organization is the same that has been used already
for a while for the architecture-specific parts of the visible
compiler and for the old version of core.cm.]

As you will notice, many source files have been moved from their
respective original locations to a new home in one of the above
subtrees.

The division of labor between the new libraries is the following:

basics.cm:
   - Simple, basic definitions that pertain to many (or all) of
     the other libraries.
parser.cm:
   - The SML parser, producing output of type Ast.dec.
   - The type family for Ast is also defined and exported here.
elabdata.cm:
   - The datatypes that describe input and output of the elaborator.
     This includes types, absyn, and static environments.
elaborator.cm:
   - The SML/NJ type checker and elaborator.
     This maps an Ast.dec (with a given static environment) to
     an Absyn.dec (with a new static environment).
   - This libraries implements certain modules that used to be
     structures as functors (to remove dependencies on FLINT).
execute.cm:
   - Everything having to do with executing binary code objects.
   - Dynamic environments.
core.cm:
   - SML/NJ-specific instantiations of the elaborator and MLRISC.
   - Top-level modules.
   - FLINT (this should eventually become its own library)

Notes:

I am not 100% happy with the way I separated the elaborator (and its data structures) from FLINT. Two instances of the same problem:

Data structures contain certain fields that carry FLINT-specific information. I hacked around this using exn and the property list module from smlnj-lib. But the fact that there are middle-end specific fields around at all is a bit annoying.
The elaborator calculates certain FLINT-related information. I tried to make this as abstract as I could using functorization, but, again, the fact that the elaborator has to perform calculations on behalf of the middle-end at all is not nice.
Having to used exn and property lists is unfortunate because it weakens type checking. The other alternative (parameterizing nearly everything) is not appealing, though.

I removed the "rebinding =" warning hack because due to the new organization it was awkward to maintain it. As a result, the compiler now issues some of these warnings when compiling init.cmi during bootstrap compilation. On the plus side, you also get a warning when you do, for example: val op = = Int32.+ which was not the case up to now.

I placed "assign" and "deref" into the _Core structure so that the code that deals with the "lazy" keyword can find them there. This removes the need for having access to the primitive environment during elaboration.

Matthias Blume

[2001/08/13]: This fix was sent to us by Zhong Shao. It is supposed to improve the performance of certain loops by avoiding needless closure allocation.

Matthias Blume

[2001/07/31]: There was a bug where call instructions would mysteriously vanish. The call instruction had to be one that returned a floating point value.

Lal George

[2001/07/19]: I have dramatically simplified the interface for CELLS in MLRISC.

In summary, the cells interface is broken up into three parts:

CellsBasis : CELLS_BASIS

CellsBasis is a top level structure and common for all architectures. it contains the definitions of basic datatypes and utility functions over these types.
functor Cells() : CELLS

Cells generates an interface for CELLS that incorporates the specific resources on the target architecture, such as the presence of special register classes, their number and size, and various useful substructures.
<ARCH>CELLS

e.g., SparcCells: SPARCCELLS

<ARCH>CELLS usually contains additional bindings for special registers on the architecture, such as:
```
        val r0 : cell          (* register zero *)
        val y : cell           (* Y register *)
        val psr : cell         (* processor status register *)
        ...
```
The structure returned by applying the Cells functor is opened in this interface.

The main implication of all this is that the datatypes for cells is split between CellsBasis and CELLS — a fairly simple change for user code.

In the old scheme the CELLS interface had a definitional binding of the form:
```
        signature CELLS =
          sig
            structure CellsBasis = CellsBasis
            ...
          end
```
With all the sharing constraints that goes on in MLRISC, this old design quickly leads to errors such as:
```
        structure definition spec inside of sharing ...
```
and appears to require an unacceptable amount of sharing and where constraint hackery.

I think this error message (the interaction of definitional specs and sharing) requires more explanation on our web page.

+ Lal George

[2001/07/19]

This update puts together a fairly extensive but straightforward change to the way the libraries that implement the interactive system are organized:

The biggest change is the elimination of structure Compiler. As a replacement for this structure, there is now a CM library (known as $smlnj/compiler.cm or $smlnj/compiler/current.cm) that exports all the substructures of the original Compiler structure directly. So instead of saying Compiler.Foo.bar one now simply says Foo.bar. (The CM libraries actually export a collection of structures that is richer than the collection of substructures of structure Compiler.)

To make the transition smooth, there is a separate library called $smlnj/compiler/compiler.cm that puts together and exports the original structure Compiler (or at least something very close to it).

There are five members of the original structure Compiler that are not exported directly but which instead became members of a new structure Backend (described by signature BACKEND). These are:

	structure Profile  : PROFILE
	structure Compile  : COMPILE
	structure Interact : INTERACT
	structure Machine  : MACHINE

	val architecture : string

Structure Compiler.Version has become structure CompilerVersion.

Cross-compilers for alpha32, hppa, ppc, sparc, and x86 are provided by $smlnj/compiler/<arch>.cm where <arch> is alpha32, hppa, ppc, sparc, or x86, respectively. Each of these exports the same frontend structures that $smlnj/compiler.cm exports. But they do not have a structure Backend and instead export some structure <Arch>Backend where <Arch> is Alpha32, Hppa, PPC, Sparc, or X86, respectively.

Library $smlnj/compiler/all.cm exports the union of the exports of $smlnj/compiler/<arch>.cm

There are no structures <Arch>Compiler anymore, use $smlnj/compiler/<arch>.cm instead.

Library host-compiler-0.cm is gone. Instead, the internal library that instantiates CM is now called cm0.cm. Selection of the host compiler (backend) is no longer done here but. (Responsibility for it now lies with $smlnj/compiler/current.cm. This seems to be more logical.)

Many individual files have been moved or renamed. Some files have been split into multiple files, and some "dead" files have been deleted.

Aside from these changes to library organization, there are also changes to the way the code itself is organized:

Structure Binfile has been re-implemented in such a way that it no longer needs any knowledge of the compiler. It exclusively deals with the details of binfile layout. It no longer invokes the compiler (for the purpose of creating new prospective binfile content), and it no longer has any knowledge of how to interpret pickles.

Structure Compile has been stripped down to the bare essentials of compilation. It no longer deals with linking/execution. The interface has been cleaned up considerably.

Utility routines for dealing with linking and execution have been moved into their own substructures.

(The ultimate goal of these changes is to provide a light-weight binfile loader/linker (at least for, e.g., stable libraries) that does not require CM or the compiler to be present.)

CM documentation has been updated to reflect the changes to library organization.

Matthias Blume

[2001/07/10]

Minor tweak to 110.34 (re-tagged):

README.html file added to CVS repository
runtime compiles properly under FreeBSD 3.X and 4.X

Matthias Blume

Version 110.34; 2001/07/10

[2001/07/09]: I changed the handling of varargs in ml-nlffigen again: The ellipsis … will now simply be ignored (with an accompanying warning).

The immediate effect is that you can actually call a varargs function from ML — but you can’t actually supply any arguments beyond the ones specified explicitly. (For example, you can call printf with its format string, but you cannot pass additional arguments.)

This behavior is only marginally more useful than the one before, but it has the advantage that a function or, more importantly, a function type never gets dropped on the floor, thus avoiding follow-up problems with other types that refer to the offending one.

Matthias Blume

[2001/07/09]

ckit-lib.cm now exports structure Error
ml-nlffigen reports occurences of "…" (i.e., varargs function types) with a warning accompanied by a source location. Moreover, it merely skips the offending function or type and proceeds with the rest of its work.u As a result, one can safely feed C code containing "…" to ml-nlffigen.
There are some internal improvements to CM, providing slightly more general string substitutions in the tools subsystem.

Matthias Blume

[2001/06/27]: Fixed a small bug in CM’s handling of parallel compilation. (You could observe the bug by Control-C-interrupting an ordinary CMB.make or CM.stabilize and then attaching some compile servers. The result was that all of a sudden the previously interrupted compilation would continue on its own. This was because of an over-optimization: CM did not bother to clean out certain queues when no servers were attached "anyway", resulting in the contents of these queues to grab control when new servers did get attached.)

There is also another minor update to the CM manual.

Matthias Blume

[2001/06/26]: Minor typo fixed in CM manual (syntax diagram for libraries).

Matthias Blume

[2001/06/25]: Fixed a nasty bug in the X86 assembly code that caused signal handlers to fail (crash) randomly.

Matthias Blume

[2001/06/25]

This update fixes a number of minor bugs in ml-nlffigen as reported by Nick Carter <nbc@andrew.cmu.edu>.

Silly but ok typedefs of the form "typedef void myvoid;" are now accepted.
Default names for generated files are now derived from the name of the C file without its directory. In particular, this causes generated files to be placed locally even if the C file is in some system directory.
Default names for generated signatures and structures are also derived from the C file name without its directory. This avoids silly things like "structure GL/GL". (Other silly names are still possible because ml-nlffigen does not do a thorough check of whether generated names are legal ML identifiers. When in doubt, use command line arguments to force particular names.)

Matthias Blume

[2001/06/21]: eXene now compiles and (sort of) works again.

The library name (for version > 110.33) is $/eXene.cm.

I also added an new example in src/eXene/examples/nbody. See the README file there for details.

Matthias Blume

[2001/06/20]

CML now compiles and works again.

Libraries (for version > 110.33):

$cml/cml.cm            Main CML library.
$cml/basis.cm          CML's version of $/basis.cm.
$cml/cml-internal.cm   Internal helper library.
$cml/core-cml.cm       Internal helper library.
$cml-lib/trace-cml.cm  Tracing facility.
$cml-lib/smlnj-lib.cm  CML's version of $/smlnj-lib.cm

The installer (config/install.sh) has been taught how to properly install this stuff.

Matthias Blume

[2001/06/19]: This un-breaks the fix for bug 1432. (The bug was originally fixed in 110.9 but I broke it again some time after that.)

Matthias Blume

[2001/06/19]: This should (hopefully) fix the long-standing signal handling bug. (The runtime system was constructing a continuation record with an incorrect descriptor which would cause the GC to drop data on the floor…)

Matthias Blume

[2001/06/15]

Here is a short late-hour update related to Sparc c-calls:

-- made handling of double-word arguments a bit smarter

-- instruction selection phase tries to collapse certain clumsily
   constructed ML-Trees; typical example:

ADD(ty,ADD(_,e,LI d1),LI d2)  ->  ADD(ty,e,LI(d1+d2))

This currently has no further impact on SML/NJ since mlriscGen does
not seem to generate such patterns in the first place, and c-calls
(which did generate them in the beginning) has meanwhile been fixed
so as to avoid them as well.

Matthias Blume

[2001/06/15]

The purpose of this update is to provide an implementation of NLFFI on Sparc machines.

Here are the changes in detail:

src/MLRISC/sparc/c-calls/sparc-c-calls.sml is a new file containing the Sparc implementation of the c-calls API.
The Sparc backend of SML/NJ has been modified to uniformely use %fp for accessing the ML frame. Thus, we have a real frame pointer and can freely modify %sp without need for an omit-frame-ptr phase. The vfp logic in src/compiler/CodeGen/* has been changed to accomodate this case.
ml-nlffigen has been taught to produce code for different architectures and calling conventions.
In a way similar to what was done in the x86 case, the Sparc backend uses its own specific extension to mltree. (For example, it needs to be able to generate UNIMP instructions which are part of the calling convention.)
ml-nlffi-lib was reorganized to make it more modular (in particular, to make it easier to plug in new machine- and os-dependent parts).

There are some other fairly unrelated bug fixes and cleanups as well:

I further hacked the .cm files for MLRISC tools (like MDLGen) so that they properly share their libraries with existing SML/NJ libraries.
I fixed a minor cosmetic bug in CM, supressing certain spurious follow-up error messages.
Updates to CM/CMB documentation.

TODO items:

MLRISC should use a different register as its asmTemp on the Sparc. (The current %o2 is a really bad choice because it is part of the calling conventions, so things might interfere in unexpected ways.)

Matthias Blume

[2001/06/07]

A number of internal changes related to C calls and calling conventions:

ML-Tree CALL statements now carry a "pops" field. It indicates the number of bytes popped implicitly (by the callee). In most cases this field is 0 but on x86/win32 it is some non-zero value. This is information provided for the benefit of the "omit-frameptr" pass.
The CALL instruction on the x86 carries a similar "pops" field. The instruction selection phase copies its value from the ML-Tree CALL statement.
On all other architectures, the instruction selection phase checks whether "pops=0" and complains if not.
The c-calls implementation for x86 now accepts two calling conventions: "ccall" and "stdcall". When "ccall" is selected, the caller cleans up after the call and pops is set to 0. For "stdcall", the caller does nothing, leaving the cleanup to the callee; pops is set to the number of bytes that were pushed onto the stack.
The cproto decoder (compiler/Semant/types/cproto.sml) now can distinguish between "ccall" and "stdcall".
The UNIMP instruction has been added to the supported Sparc instruction set. (This is needed for implementing the official C calling convention on this architecture.)
I fixed some of the .cm files under src/MLRISC/Tools to make them work with the latest CM.

Matthias Blume

[2001/06/05]

The "lambdasplit" parameter for class "sml" in CM has been documented.
CM can now generate "index files". These are human-readable files that list on a per-.cm-file basis each toplevel symbol defined or imported. The location of the index file for <p>/<d>.cm is <p>/CM/INDEX/<d>.cm. To enable index-file generation, set CM.Control.generate_index to true or export an environment-symbol: export CM_GENERATE_INDEX=true.
```
The CM manual has been updated accordingly.
```

I made some slight modifications to the c-calls API in MLRISC.

a) There is now a callback to support saving/restoring of
   dedicated but caller-save registers around the actual call
   instruction.
b) One can optionally specify a comment-annotation for the
   call instruction.

SML/NJ (mlriscGen.sml) uses this new API for the rawccall primop. (For example, the comment annotation shows the C prototype of the function being called.)

Matthias Blume

[2001/06/01]

This is mostly a cleanup of MLFFI stuff:

some signature files have been put into a more exposed place
the ugly 'f type parameter is gone (simplifies types tremendously!)
ml-nlffigen changed accordingly
tutorial updated

Other changes:

author’s affiliation in CM manual(s) updated
some more recognized keywords added to Allen’s sml.sty

Matthias Blume

[2001/05/25]

put the official 110.33-README (as it appears on the ftp server) under CVS
fixed a small bug related to incomplete pointer types in ml-nlffigen
small cosmetic change to the ml-nlffi-lib’s "arr" type constructor (it does not need the 'f type parameter)

Matthias Blume

Version 110.33; 2001/05/23

[2001/05/22]: Made install.sh use file config/targets.customized if it exists, falling back to config/targets if it doesn’t. This way one can have a customized version of the targets file without touching the "real thing", thus eliminating the constant fear of accidentally checking something bogus back into the CVS repository… (File config/targets.customized must not be added to the repository!)

Matthias Blume

[2001/05/22]

Bug fix in ml-nlffigen; now (hopefully) correctly handling struct returns.
Added src/ml-nlffi-lib/Doc/mini-tutorial.txt. This is some very incomplete, preliminary documentation for NLFFI.

Matthias Blume

[2001/05/14]: Some bugs in install script fixed.

In addition to that I also made a slight change to the NLFFI API: Functors generated by ml-nlffigen now take the dynamic library as a straight functor argument, not as a suspended one. (The original functor code used to force the suspension right away anyway, so there was nothing gained by this complication of the interface.)

Matthias Blume

[2001/05/11]

I finally took the plunge and added my new FFI code to the main repository. For x86-linux it is now ready for prime-time.

There are two new subdirectories of "src":

ml-nlffi-lib: The utility library for programs using the FFI interface. Here is the implementation of $/c.cm and its associated low-level partners $/c-int.cm and $/memory.cm.
ml-nlffigen: A stand-alone program for generating ML glue code from C source code.

Building ml-nlffigen requires $/ckit-lib.cm.

The config/install.sh script has been updates to do the Right Thing (hopefully).

Notice that the source tree for the C-Kit will not be put under "src" but directly under the installation root directory. (This is the structure that currently exists on the CVS server when you check out module "sml".) Fortunately, config/install.sh knows about this oddity.

Bugs: No documentation yet.

Matthias Blume

[2001/05/09]: Fixed a bug in the accounting code in cpsopt/contract.sml. (The wrapper/unwrapper elimination did not decrement usage counts and some dead variables got overlooked by the dead-up logic.)

Matthias Blume

[2001/05/08]: Changes to implement the omit-frame-pointer optimization to support raw C calls. For now, there is only support on the Intel x86, but other architectures will follow as more experience is gained with this.

+ Lal George

[2001/05/07]: I made into "proxy libraries" all libraries that qualify for such a change. (A qualifying library is a library that has another library or groups as its sole member and repeats that member’s export list verbatim. A proxy library avoids this repetition by omitting its export list, effectively inheriting the list that its (only) member exports. See the CM manual for more explanation.) The main effect is that explicit export lists for these libraries do not have to be kepts in sync, making maintenance a bit easier.

I also added copyright notices to many .cm-files.

Last but not least, I made a new set of bootfiles.

Matthias Blume

[2001/05/04]

John merged pending changes to $/smlnj-lib.cm
Allen’s previous change accidentally backed out of one of Lal’s earlier changes. I undid this mistake (re-introducing Lal’s change).
I used the new topOrder' function from graph-scc.sml (from $/smlnj-lib.cm) within the compiler where applicable. There is some code simplification because of that.
The "split" phase (in FLINT) is now part of the default list of phases. Compiler.Control.LambdaSplitting.* can be used to globally control the lambda-splitting (cross-module-inlining) engine. In addition to that, it can now also be controlled on a per-source basis: CM has been taught a new tool parameter applicable to ML source files.
- To turn lambda-splitting off completely: local open Compiler.Control.LambdaSplitting in val _ = set Off end
- To make "no lambda-splitting" the global default (but allow per-source overriding); this is the initial setting: local open Compiler.Control.LambdaSplitting in val _ = set (Default NONE) end
- To make "lambda-splitting with aggressiveness a" the global default (and allow per-source overriding): local open Compiler.Control.LambdaSplitting in val _ = set (Default (SOME a)) end
- To turn lambda-splitting off for a given ML souce file (say: a.sml) write (in the respective .cm-file): a.sml (lambdasplitting:off)
- To turn lambda-splitting for a.sml on with minimal aggressiveness: a.sml (lambdasplitting:on)
- To turn lambda-splitting for a.sml on with aggressiveness <a> (where <a> is a decimal non-negative integer): a.sml (lambdasplitting:<a>)
- To turn lambda-splitting for a.sml on with maximal aggressiveness: a.sml (lambdasplitting:infinity)
- To use the global default for a.sml: a.sml (lambdasplitting:default) or simply a.sml
Matthias Blume

[2001/05/04]

MLRISC features.

Fix to CMPXCHG instructions.
Changed RA interface to allow annotations in callbacks.
Added a new method to the stream interface to allow annotations updates.

Allen Leung

[2001/05/01]: Changed install.sh to use the current working directory instead of /usr/tmp for a temporary file (pcedittmp). The previous choice of /usr/tmp caused trouble with MacOS X because of file premission problems.

Matthias Blume

[2001/04/20]

added vp_limitPtrMask to vproc-state.h (for use by the raw-C-calls mechanism to implement proper interrupt handling)
made the ML compiler aware of various data-structure offsets so it can generate code for accessing the vp_inML flag and vp_limitPtrMask
tweaked mlriscGen.sml to have it emit interrupt-handling code for raw C-calls

Matthias Blume

[2001/04/20]

Changes to port to Mac OS X; Darwin.
In the process I found that sqrt was broken on the PPC, because the fsqrt instruction is not implemented.

Lal George

[2001/04/18]

fixed two off-by-4 errors in the x86-specific c-calls implementation (this bug prevented structure arguments containing pointers from being passed correctly)
changed the raw-C-call code in mlriscGen.sml in such a way that structure arguments are represented as a pointer to the beginning of the structure (instead of having a series of synthesized arguments, one for each structure member)
made makeml script’s verbosity level configurable via environment variable (MAKEML_VERBOSITY)
eliminated placeholder implementations for f32l, w16s, i16s, and f32s in rawmem-x86.sml; we are now using the real thing

Matthias Blume

[2001/03/22]: Created a new set of bootfiles (for your automatic installation convenience).

Matthias Blume

[2001/03/22]

All "raw memory access" primitives for the new FFI are implemented now (at least on the x86).
Some further cleanup of CM’s parallel make mechanism.

Matthias Blume

[2001/03/19]: Parallel make (using compile servers) now works again.

To this end, CM.stabilize and CMB.make have been modified to work in two passes when compile servers are attached: 1. Compile everything, do not perform stabilization; this pass uses compile servers 2. Stabilize everything; this pass does not use compile servers If there are no compile servers, the two passes are combined into one (as before). Splitting the passes increases the inherent parallelism in the dependency graph because the entire graph including all libraries is available at the same time. This, in turn, improves server utilization. The downside is that the master process will have to do some extra work after compilation is done (because for technical reasons it must re-read all the binfiles during stabilization).

+ Matthias Blume

[2001/03/16]: Created a new set of bootfiles (for your automatic installation convenience).

Matthias Blume

[2001/03/16]: This is a minor fixup for an (untagged) earlier commit by Allen. (A file was missing).

Matthias Blume

[2001/03/05]

New support for alternative control-flow in MLTREE. Currently we support

FLOW_TO(CALL ...., [k1,...,kn])

This is needed for 'cuts to' in C-- and try/handle-like constructs
in Moby

New assembler flag "asm-show-cutsto" to turn on control-flow debugging.

Changes in interface [from Fermin, John]

Alpha 8-bit SLL support [Fermin]

All architectures

A new module (ClusterExpandCopies) for expanding parallel copies.

Allen Leung

[2001/02/27]

Alpha bug fix for CMOVNE
Handle mltree COND(..,FCMP …,…)
Bug fix in simplifier

Allen Leung

[2001/01/30]: This is just a minor update to sync my devel branch with the main brach. The only visible change is the addition of some README files.

Matthias Blume

[2001/01/12]: Made a new set of bootfiles that goes with the current state of the repository.

Matthias Blume

[2001/01/12]: I am just flushing out some minor changes that had accumulated in my private branch in order to sync with the main tree. (This is mainly because I had CVS trouble when trying to merge into my private branch.)

Most people should be completely unaffected by this.

Matthias Blume

[2001/01/22]

Removed the type LabelExp and replace it by MLTree.
Rewritten mltree-simplify with the pattern matcher tool.
There were some bugs in alpha code generator which would break 64-bit code generation.
Redo the tools to generate code with the
The CM files in MLRISC (and in src/system/smlnj/MLRISC) are now generated by perl scripts.

Allen Leung

[2001/01/10]: The RCC stuff now seems to work (but only on the x86). This required hacking of the c-calls interface (and -implementation) in MLRISC.

Normal compiler users should be unaffected.

Matthias Blume

[2001/01/09]: This is a fairly big patch, flushing out a large number of pending changes that I made to my development copy over the last couple of days.

Of practical relevance at this moment is a workaround for a pickling bug that Allen ran into the other day. The cause of the bug itself is still unknown and it might be hard to fix it properly, but the workaround has some merits of its own (namely somewhat reducing pickling overhead for certain libraries). Therefore, I think this solution should be satisfactory at this time.

The rest of the changes (i.e., the vast majority) has to do with my ongoing efforts of providing direct support for C function calls from ML. At the moment there is a new primop "RAW_CCALL", typing magic in types/cproto.sml (invoked from FLINT/trans/translate.sml), a new case in the FLINT CPS datatype (RCC), changes to cps/convert.sml to translate uses of RAW_CCALL into RCC, and changes to mlriscGen.sml to handle RCC.

The last part (the changes to mlriscGen.sml) are still known to be wrong on the x86 and not implemented on all other architectures. But the infrastructure is in place. I had to change a few functor signatures in the backend to be able to route the CCalls interface from MLRISC there, and I had to specialize the mltree type (on the x86) to include the necessary extensions. (The extensions themselves were already there and redy to go in MLRISC/x86).

Everything should be very happy as soon as someone helps me with mlriscGen.sml…

In any case, nothing of this should matter to anyone as long as the new primop is not being used (which is going to be the case unless you find it where I hid it :). The rest of the compiler is completely unaffected.

Matthias Blume

[2001/01/05]: Added some experimental support for work that I am doing right now. These changes mostly concern added primops, but there is also a new experimental C library in the runtime system (but currently not enabled anywhere except on Linux/X86).

In the course of adding primops (and playing with them), I discovered that Zhong’s INL_PRIM hack (no type info for certain primops) was, in fact, badly broken. (Zhong was very right he labeled this stuff as "major gross hack".) To recover, I made type information in INL_PRIM mandatory and changed prim.sml as well as built-in.sml accordingly. The InLine structure now has complete, correct type information (i.e., no bottom types).

Since all these changes mean that we need new binfiles, I also bumped the version number to 110.32.1.

Matthias Blume

[2000/12/30]: Added proxy libraries for MLRISC and let MLRISC libraries refer to each other using path anchors. (See CM manual for explanation.)

Updated CM documentation.

Fixed some bugs in CM.

Implemented "proxy" libraries (= syntactic sugar for CM).

Added "-quiet" option to makeml and changed runtime system accordingly.

Added cleanup handler for exportML to reset timers and compiler stats.

Matthias Blume

[2000/12/22]

Infinite precision used throughout MLRISC.
see MLRISC/mltree/machine-int.sig

Lal George

[2000/12/22]: Corrected wording and formatting of some CM warning message which I broke in my previous patch.

Matthias Blume

[2000/12/22]: Fixed CM’s handling of anchor environments in connection with CMB.make.

Matthias Blume

[2000/12/22]: Removed src/cm/ffi which does not (and did not) belong here.

Matthias Blume

[2000/12/21]: Probably most important: CM no longer silently swallows all exceptions in the compiler. Plus: some other minor CM changes. For example, CM now reports some sizes for generated binfiles (code, data, envpickle, lambdapickle).

Matthias Blume

[2000/12/15]

"dir" tool added.
improvements and cleanup to Tools structure
documentation updates

Matthias Blume

[2000/12/14]

   In IntInf, added these standard functions, which are missing from our
implementation:

andb : int * int -> int
xorb : int * int -> int
orb  : int * int -> int
notb : int -> int
 <<   : int * word -> int
~>>  : int * word -> int

Not tested, I hope they are correct.

Allen Leung

[2000/12/08]

  Slight improvements to the 'nowhere' tool to handle OR-patterns,
to generate better error messages etc.  Plus a brief manual.

Allen Leung

[2000/12/08]

Version 110.31

Lal George

[2000/12/07]

Major MLRISC internal changes. Affect all clients. Summary:

Type CELLS.cell = int is now replaced by a datatype. As a result, the old regmap is now gone. Almost all interfaces in MLRISC change as a consequence.
A new brand version of machine description tool (v3.0) that generates modules expecting the new interface. The old version is removed.
The RA interface has been further abstracted into two new functors. RISC_RA and X86RA. These functors have much simpler interfaces. [See also directory MLRISC/demo.]
Some other new source→source code generation tools are available:
1. MLRISC/Tools/RewriteGen — generate rewriters from rules.
2. MLRISC/Tools/WhereGen — expands conditional pattern matching rules. I use this tool to generate the peephole optimizers---with the new cell type changes, peephole rules are becoming difficult to write without conditional pattern matching.
More Intmap → IntHashTable change. Previous changes by Matthias didn’t cover the entire MLRISC source tree so many things broke.
CM files have been moved to the subdirectory MLRISC/cm. They are moved because there are a lot of them and they clutter up the root dir.

Version 110.30; 2000/11/06

[2000/11/04]

Made ml-build faster on startup.
Documentation fixes.

Matthias Blume

[2000/11/02]

Small tweaks to pickler — new BOOTFILES!
Version bumped to 110.29.2.
Added conditional compilation facility to init.cmi (see comment there).

Matthias Blume

[2000/10/23]

Minor RA changes that improves spilling on x86 (affects Moby and C-- only)
Test programs for the graph library updated
Some new MLRISC demo programs added

Allen Leung

[2000/08/31]: More error message grief: Where there used to be no messages, there now were some that had bogus error regions. Fixed.

Matthias Blume

Version 110.29.1; 2000/08/31

[2000/08/31]: I made a version 110.29.1 with new bootfiles.

Changes: Modified pickler/unpickler for faster and leaner unpickling. CM documentation changes and a small bugfix in CM’s error reporting.

Matthias Blume

[2000/09/27]

Changed the type of the nodestatus, so that:

SPILLED(~1)		is now SPILLED
SPILLED(m) where m>=0   is now MEMREG(m)
SPILLED(s) where s<~1   is now SPILL_LOC(~s)

Lal George

[2000/09/07]: Small tweak to CM to avoid getting ML syntax error messages twice.

Matthias Blume

[2000/08/31]: New URL for boot files (because the 110.29 files on the BL server do now work correctly with my updated install scripts for yacc and lex).

Matthias Blume

[2000/08/08]: Tiny update to CM manual.

Matthias Blume

[2000/08/7]

  Moby, C--, SSA, x86, machine descriptions etc.  Should only affect C--
and Mobdy.

x86
1. Fixes to peephole module by John and Dan.
2. Assembly fix to SETcc by Allen.
3. Fix to c-call by John.
4. Fix to spilling by John. (This one deals with the missing FSTPT case)
5. Instruction selection optimization to SETcc as suggested by John.
  For example,
  MV(32, x, COND(32, CMP(32, LT, a, b), LI 1, LI 0))
  should generate:
  MOVL a, x SUBL b, x SHRL 31, x

IR stuff

A bunch of new DJ-graph related algorithms added.  These
speed up SSA construction.

SSA + Scheduling

Added code for SSA and scheduling to the repository

Allen Leung

[2000/07/27]: + Made changes to support Linux PPC. p.s. I have confirmation that the 110.29 boot files work fine.

+ Lal George

[2000/07/27]

!!!! WARNING !!!! You must recompile the runtime system! !!!! WARNING !!!!

This is basically another round of script-enhancements:

sml, ml-build, and ml-makedepend accept options -D and -U to define and undefine CM preprocessor symbols.

ml-build avoids generating a new heap image if it finds that the existing one is still ok. (The condition is that no ML file had to be recompiled and all ML files are found to be older that the heap file.)

To make this work smoothly, I also hacked the runtime system as
well as SMLofNJ.SysInfo to get access to the heap image suffix
(.sparc-solaris, ...) that is currently being used.

Moreover, the signature of CM.mk_standalone has changed.  See the
CM manual.

ml-makedepend accepts additional options -n, -a, and -o. (See the CM manual for details.)
More CM manual updates:
- all of the above has been documented.
- there is now a section describing the (CM-related) command line arguments that are accepted by the "sml" command
  
  Matthias Blume

[2000/07/25]

Added a script called ml-makedepend. This can be used in makefiles for Unix' make in a way very similar to the "makedepend" command for C.

The script internally uses function CM.sources.

Synopsis:

*ml-makedepend* [-f makefile] cmfile targetname

The default for the makefile is "makefile" (or "Makefile" should "makefile" not exist).

ml-makedepend adds a cmfile/targetname-specific section to this makefile (after removing the previous version of this section). The section contains a single dependency specification with targetname on the LHS (targetname is an arbitrary name), and a list of files derived from the cmfile on the RHS. Some of the files on the RHS are ARCH/OPSYS-specific. Therefore, ml-makedepend inserts references to "make" variables $(ARCH) and $(OPSYS) in place of the corresponding path names. The makefile writer is responsible for making sure that these variables have correct at the time "make" is invoked.

Matthias Blume

[2000/07/22]

Changed BOOT and config/srcarchiveurl to point to BL server:

ftp://ftp.research.bell-labs.com/dist/smlnj/working/110.29/

Matthias Blume

Version 110.29; 2000/07/18

[2000/07/18]

Updated src/compiler/TopLevel/main/version.sml to version 110.29
Updated config/version to 110.29
Updated config/srcarchiveurl
New boot files! ftp://ftp.cs.princeton.edu/pub/people/blume/sml/110.29-autofetch

Matthias Blume

[2000/07/11]: Fixed a few typos in CM manual.

Matthias Blume

[2000/06/15]

x86 peephole improvement sp += k; sp -= k ⇒ nop [from John]
fix to x86 RET bug [found by Dan Grossman]
sparc assembly bug fix for ticc instructions [found by Fermin]
```
Affects c-- and moby only
```
Allen Leung

[2000/07/04]

Improvements to CM manual.
SMLofNJ.Internals.BTrace.trigger reinstated as an alternative way of getting a back-trace. The function, when called, raises an internal exception which explicitly carries the full back-trace history, so it is unaffected by any intervening handle-raise pairs ("trivial" or not). The interactive loop will print that history once it arrives at top level. Short of having all exceptions implicitly carry the full history, the recommended way of using this facility is:
- compile your program with instrumentation "on"
- run it, when it raises an exception, look at the history
- if the history is "cut off" because of some handler, go and modify your program so that it explicitly calls BTrace.trigger
- recompile (still instrumented), and rerun; look at the full history
  
  Matthias Blume

[2000/07/03]: Small corrections and updates to CM manual.

Matthias Blume

[2000/06/29]

Changes:

Class "mlyacc" now takes separate arguments to pass options to generated .sml- and .sig-files independently.
Corresponding CM manual updates.
BTrace module now also reports call sites. (However, for loop clusters it only shows from where the cluster was entered.) There are associated modifications to core.sml, internals.{sig,sml}, btrace.sml, and btimp.sml.

Matthias Blume

[2000/06/27]

Changes:

Implemented "subdir" and "witness" options for noweb tool. This caused some slight internal changes in CM’s tool implementation.
Fixed bug in "tool plugin" mechanism. This is essentially cleaning some remaining issues from earlier path anchor changes.
Updated CM manual accordingly.
Changed implementation of back-tracing so that I now consider it ready for prime-time.
```
In particular, you don't have to explicitly trigger the back-trace
anymore.  Instead, if you are running BTrace-instrumented code and
there is an uncaught exception (regardless of whether or not it was
raised in instrumented code), the top-level evalloop will print
the back-trace.
```
```
Features:
```
- Instrumented and uninstrumented code work together seemlessly. (Of course, uninstrumented code is never mentioned in actual back-traces.)
- Asymptotic time- and space-complexity of instrumented code is equal to that of uninstrumented code. (This means that tail-recursion is preserved by the instrumentation phase.)
- Modules whose code has been instrumented in different sessions work together without problem.
- There is no penalty whatsoever on uninstrumented code.
- There is no penalty on "raise" expressions, even in instrumented code.
  A potential bug (or perhaps it is a feature, too):
  A back-trace reaches no further than the outermost instrumented non-trivial "raise". Here, a "trivial" raise is one that is the sole RHS of a "handle" rule. Thus, back-traces reach trough
  <exp> handle e => raise e
  and even
  <exp> handle Foo => raise Bar
  and, of course, through
  <exp> handle Foo => ...
  if the exception was not Foo.
  Back-traces always reach right through any un-instrumented code including any of its "handle" expressions, trivial or not.
  To try this out, do the following:
- Erase all existing binfiles for your program. (You may keep binfiles for those modules where you think you definitely don’t need back-tracing.)
- Turn on back-trace instrumentation: SMLofNJ.Internals.BTrace.mode (SOME true);
- Recompile your program. (I.e., run "CM.make" or "use".)
- You may now turn instrumentation off again (if you want): SMLofNJ.Internals.BTrace.mode (SOME false);
- Run your program as usual. If it raises an exception that reaches the interactive toplevel, then a back-trace will automatically be printed. After that, the toplevel loop will print the exception history as usual.

Matthias Blume

[2000/06/26]

CM: - setup-parameter to "sml" added; this can be used to run arbitrary ML code before and after compiling a file (e.g., to set compiler flags)

Compiler: - improved btrace API (in core.sml, internals.{sig,sml}) - associated changes to btrace.sml (BTrace instrumentation pass) - cleaner implementation of btimp.sml (BTrace tracing and report module)

+ CM manual: * new path encoding documented

description of setup-parameter to "sml" added

The biggest user-visible change to back-tracing is that it is no longer necessary to compile all traced modules within the same session. (This was a real limitation.)

Matthias Blume

[2000/06/24]: Fixes startup slowdown problem. (I was calling SrcPath.sync a tad bit too often — to put it mildly. :)

Matthias Blume

[2000/06/23]

This updates adds a backtrace facility to aid programmers in debugging their programs. This involves the following changes:

Module system/smlnj/init/core.sml (structure _Core) now has hooks for keeping track of the current call stack. When programs are compiled in a special mode, the compiler will insert calls to these hooks into the user program. "Hook" means that it is possible for different implementations of back-tracing to register themselves (at different times).
compiler/MiscUtil/profile/btrace.sml implements the annotation phase as an Absyn.dec→Absyn.dec rewrite. Normally this phase is turned off. It can be turned on using this call: SMLofNJ.Internals.BTrace.mode (SOME true); Turning it off again: SMLofNJ.Internals.BTrace.mode (SOME false); Querying the current status: SMLofNJ.Internals.BTrace.mode NONE; Annotated programs are about twice as big as normal ones, and they run a factor of 2 to 4 slower with a dummy back-trace plugin (one where all hooks do nothing). The slowdown with a plugin that is actually useful (such as the one supplied by default) is even greater, but in the case of the default plugin it is still only an constant factor (amortized).
system/Basis/Implementation/NJ/internals.{sig,sml} have been augmented with a sub-structure BTrace for controlling back-tracing. In particular, the above-mentioned function "mode" controls whether the annotation phase is invoked by the compiler. Another important function is "trigger": when called it aborts the current execution and causes the top-level loop to print a full back-trace.

compiler/MiscUtil/profile/btimp.sml is the current default plugin for back-tracing. It keeps track of the dynamic call stack and in addition to that it keeps a partial history at each "level" of that stack. For example, if a tail-calls b, b tail-calls c, and c tail-calls d and b (at separate times, dynamically), then the report will show:

   GOTO   d
         /c
   GOTO  \b
   CALL   a

This shows that there was an initial non-tail call of a, then a
tail-call to b or c, looping behavior in a cluster of functions that
consist of b and c, and then a goto from that cluster (_i.e._, either from
b or from c) to d.

Note that (depending on the user program) the amount of information
that the back-trace module has to keep track of at each level is bounded
by a constant.  Thus, the whole implementation has the same asymptotical
complexity as the original program (both in space and in time).

compiler/TopLevel/interact/evalloop.sml has been modified to handle the special exception SMLofNJ.Internals.BTrace.BTrace which is raised by the "trigger" function mentioned above.

Notes on usage:

Annotated code works well together with unannotated code: Unannotated calls simply do not show up at all in the backtrace.
It is not a good idea to let modules that were annotated during different sessions run at the same time. This is because the compiler chooses small integers to identify individual functions, and there will be clashes if different modules were compiled in separate sessions. (Nothing will crash, and you will even be told about the clashes, but back-trace information will in general not be useful.)
Back-tracing can be confused by callcc and capture.
The only way of getting a back-trace right now is to explicitly invoke the "trigger" function from your user program. Eventually, we should make every exception carry back-trace information (if available). But since this creates more overhead at "raise"-time (similar to the current exnHistory overhead), I have not yet implemented this. (The implementation will be rather easy.) With exceptions carrying back-trace information, this facility will be even more useful because users don’t need to modify their programs…

While it is possible to compile the compiler with back-trace annotations turned on (I did it to get some confidence in correctness), you must make absolutely sure that core.sml and btimp.sml are compiled WITHOUT annotation! (core.sml cannot actually be compiled with annotation because there is no core access yet, but if you compile btimp.sml with annotation, then the system will go into an infinite recursion and crash.) Since CM currently does not know about BTrace, the only way to turn annotations on and off for different modules of the compiler is to interrupt CMB.make, change the settings, and re-invoke it. Of course, this is awkward and clumsy.

Sample sessions:

Standard ML of New Jersey v110.28.1 [FLINT v1.5], June 5, 2000
- SMLofNJ.Internals.BTrace.mode (SOME true);
[autoloading]
[autoloading done]
val it = false : bool
- structure X = struct
-     fun main n = let
-         fun a (x, 0) = d x
-           | a (x, n) = b (x, n - 1)
-         and b (x, n) = c (x, n)
-         and c (x, n) = a (x, n)
-         and d x = e (x, 3)
-         and e (x, 0) = f x
-           | e (x, n) = e (x, n - 1)
-         and f 0 = SMLofNJ.Internals.BTrace.trigger ()
-           | f n = n * g (n - 1)
-         and g n = a (n, 3)
-     in
-         f n
-     end
- end;
structure X : sig val main : int -> int end
- X.main 3;
*** BACK-TRACE ***
GOTO   stdIn:4.2-13.20: X.main[2].f
GOTO-( stdIn:4.2-13.20: X.main[2].e
GOTO   stdIn:4.2-13.20: X.main[2].d
     / stdIn:4.2-13.20: X.main[2].a
     | stdIn:4.2-13.20: X.main[2].b
GOTO-\ stdIn:4.2-13.20: X.main[2].c
CALL   stdIn:4.2-13.20: X.main[2].g
GOTO   stdIn:4.2-13.20: X.main[2].f
GOTO-( stdIn:4.2-13.20: X.main[2].e
GOTO   stdIn:4.2-13.20: X.main[2].d
     / stdIn:4.2-13.20: X.main[2].a
     | stdIn:4.2-13.20: X.main[2].b
GOTO-\ stdIn:4.2-13.20: X.main[2].c
CALL   stdIn:4.2-13.20: X.main[2].g
GOTO   stdIn:4.2-13.20: X.main[2].f
GOTO-( stdIn:4.2-13.20: X.main[2].e
GOTO   stdIn:4.2-13.20: X.main[2].d
     / stdIn:4.2-13.20: X.main[2].a
     | stdIn:4.2-13.20: X.main[2].b
GOTO-\ stdIn:4.2-13.20: X.main[2].c
CALL   stdIn:4.2-13.20: X.main[2].g
GOTO   stdIn:4.2-13.20: X.main[2].f
CALL   stdIn:2.15-17.4: X.main[2]
-

(Note that because of a FLINt bug the above code currently does not compile without BTrace turned on.)

Here is another example, using my modified Tiger compiler:

Standard ML of New Jersey v110.28.1 [FLINT v1.5], June 5, 2000
- SMLofNJ.Internals.BTrace.mode (SOME true);
[autoloading]
[autoloading done]
val it = false : bool
- CM.make "sources.cm";
[autoloading]
...
[autoloading done]
[scanning sources.cm]
[parsing (sources.cm):parse.sml]
[creating directory CM/SKEL ...]
[parsing (sources.cm):tiger.lex.sml]
...
[wrote CM/sparc-unix/semant.sml]
[compiling (sources.cm):main.sml]
[wrote CM/sparc-unix/main.sml]
[New bindings added.]
val it = true : bool
- Main.compile ("../testcases/merge.tig", "foo.out");
*** BACK-TRACE ***
CALL   lib/semant.sml:99.2-396.21: SemantFun[2].transExp.trvar
CALL   lib/semant.sml:99.2-396.21: SemantFun[2].transExp.trexp
CALL   lib/semant.sml:289.3-295.22: SemantFun[2].transExp.trexp.check[2]
GOTO   lib/semant.sml:289.3-295.22: SemantFun[2].transExp.trexp.check[2]
CALL   lib/semant.sml:99.2-396.21: SemantFun[2].transExp.trexp
CALL   lib/semant.sml:99.2-396.21: SemantFun[2].transExp.trexp
CALL   lib/semant.sml:488.3-505.6: SemantFun[2].transDec.trdec[2].transBody[2]
     / lib/semant.sml:411.65-543.8: SemantFun[2].transDec
CALL-\ lib/semant.sml:413.2-540.9: SemantFun[2].transDec.trdec[2]
CALL   lib/semant.sml:99.2-396.21: SemantFun[2].transExp.trexp
CALL   lib/semant.sml:8.52-558.4: SemantFun[2].transProg[2]
CALL   main.sml:1.18-118.4: Main.compile[2]
-

Matthias Blume

[2000/06/21]: CM manual update: Path environments documented.

Matthias Blumen

[2000/06/19]: CM manual and system/README update. This only covers the fact that there are no more implicit anchors. (Path environments and the "bind" option to "cm" have yet to be documented.)

Matthias Blume

[2000/06/19]: Fixed a bug in new SrcPath module that sometimes led to a bad chDir call.

Matthias Blume

[2000/06/18]: I updates the previous HISTORY entry where I forgot to mention that implicit anchors are no longer with us.

The current update also gets rid of the (now useless) controller CM.Control.implicit_anchors.

Matthias Blume

[2000/06/16]

This patch implements the long anticipated (just kidding :) "anchor environment" mechanism. In the course of doing this, I also re-implemented CM’s internal "SrcPath" module from scratch. The new one should be more robust in certain boundary cases. In any case, it is a lot cleaner than its predecessor (IMHO).

This time, although there is yet another boot file format change, I kept the unpickler backward-compatible. As a result, no new bootfiles are necessary and bootstrapping is straightforward. (You cannot read new bootfiles into an old system, but the other way around is no problem.)

Visible changes:

Implicit path anchors (without the leading $-symbol) are no longer recognized at all. This means that such path names are not illegal either. For example, the name basis.cm simply refers to a local file called "basis.cm" (i.e, the name is an ordinary path relative to .cm-files directory). Or, to put it differently, only names that start with $ are anchored paths.

The $<singlearc> abbreviation for $/<singlearc> has finally vanished.

John (Reppy) had critizised this as soon as I originally proposed and
implemented it, but at that time I did not really deeply believe
him. :) Now I came full-circle because I need the $<singlearc> syntax
in another place where it cannot be seen as an abbreviation for
$/<singlearc>.  To avoid the confusion, $<singlearc> now means what it
seems to mean (_i.e._, it "expands" into the corresponding anchor
value).

However, when paths are used as members in CM description files, it
continues to be true that there must be at least another arc after the
anchor.  This is now enforced separately during semantic analysis
(_i.e._, from a lexical/syntactical point of view, the notation is ok.)

The "cm" class now accepts an option "bind". The option’s value is a sub-option list of precisely two items — one labeled "anchor" and the other one labeled "value". As you might expect, "anchor" is used to specify an anchor name to be bound, and "value" specifies what the anchor is being bound to.

The value must be a directory name and can be given in either standard
syntax (including the possibility that it is itself an anchored path)
or native syntax.

Examples:

foo.cm (bind:(anchor:bar value:$mystuff/bar))
lib.cm (bind:(anchor:a value:"H:\\x\\y\\z"))  (* only works under windows *)

and so on.

The meaning of this is that the .cm-file will be processed with an
augmented anchor environment where the given anchor(s) is/are bound to
the given values(s).

The rationale for having this feature is this: Suppose you are trying
to use two different (already stable) libraries a.cm and b.cm (that
you perhaps didn't write yourself).  Further, suppose each of these
two libraries internally uses its own auxiliary library $aux/lib.cm.
Normally you would now have a problem because the anchor "lib" can not
be bound to more than one value globally.  Therefore, the project that
uses both a.cm and b.cm must locally redirect the anchor to some other
place:

a.cm (bind:(anchor:lib value:/usr/lib/smlnj/a-stuff))
b.cm (bind:(anchor:lib value:/usr/lib/smlnj/b-stuff))

This hard-wires $lib/aux.cm to /usr/lib/smlnj/a-stuff/aux.cm or
/usr/lib/smlnj/b-stuff/aux.cm, respectively.

Hard-wiring path names is a bit inflexible (and CM will verbosely warn
you when you do so at the time of CM.stabilize).  Therefore, you can
also use an anchored path as the value:

a.cm (bind:(anchor:lib value:$a-lib))
b.cm (bind:(anchor:lib value:$b-lib))

Now you can globally configure (using the usual CM.Anchor.anchor or
pathconfig machinery) bindings for "a-lib" and "b-lib".  Since "lib"
itself is always locally bound, setting it globally is no longer
meaningful or necessary (but it does not hurt either).  In fact, "lib"
can still be used as a global anchor for separate purposes.  As a
matter of fact, one can locally define "lib" in terms of a global
"lib":

a.cm (bind:(anchor:lib value:$lib/a))
b.cm (bind:(anchor:lib value:$lib/b))

4: The encoding of path names has changed. This affects the way path names are shown in CM’s progress report and also the internal protocol encoding used for parallel make.

The encoding now uses one or more ':'-separated segments.  Each
segments corresponds to a file that has been specified relative to the
file given by its preceding segment.  The first segment is either
relative to the CWD, absolute, or anchored.  Each segment itself is
basically a Unix pathname; all segments but the first are relative.

Example:

$foo/bar/baz.cm:a/b/c.sml

This path denotes the file bar/a/b/c.sml relative to the directory
denoted by anchor "foo".  Notice that the encoding also includes
baz.cm which is the .cm-file that listed a/b/c.sml.  As usual, such
paths are resolved relative to the .cm-files directory, so baz.cm must
be ignored to get the "real" pathname.

To make this fact more obvious, CM puts the names of such "virtual
arcs" into parentheses when they appear in progress reports. (No
parentheses will appear in the internal protocol encoding.)  Thus,
what you really see is:

$foo/bar/(baz.cm):a/b/c.sml

I find this notation to be much more informative than before.

Another new feature of the encoding is that special characters
including parentheses, colons, (back)slashes, and white space are
written as \ddd (where ddd is the decimal encoding of the character).

NOTE: The CM manual still needs to be updated.

Matthias Blume

[2000/06/15]: x86 Peephole fix by Fermin. Affects c-- and moby only.

Allen Leung

[2000/06/12]: More cleanup after changing the file naming scheme: This time I repaired the parallel make mechanism for CMB.make which I broke earlier.

Matthias Blume

[2000/06/09]

None of these things should affect normal SML/NJ operations

Peephole improvements provided by Fermin (c--)
New annotation DEFUSE for adding extra dependence (moby)
New X86 LOCK instructions (moby)
New machine description language for reservation tables (scheduling)
Fixes to various optimization/analysis modules (branch chaining, dominator trees etc.)
I’ve changed the CM files so that they can work with versions 110.0.6, 110.25 and 110.28

Allen Leung

[2000/06/09]

Removed all(?) remaining RCS Log entries from sources.
Fixed bug in ml-yacc and ml-lex sources (use explicit anchors for anchored paths).

Matthias Blume

[2000/06/07]

This update changes the default setting for CM.Control.implicit_anchors from true to false. This means that implicit anchors are no longer permitted by default. I also tried to make sure that nothing else still relies on implicit anchors. (This is the next step on the schedule towards a CM that does not even have the notion of implicit anchors anymore.)
More CM manual updates.
I managed to track down and fix the pickling bug I mentioned last time. Because of the previously existing workaround, this entails no immediate practical changes.

Matthias Blume

Version 110.28.1; 2000/06/06

[2000/06/06]

The main purpose of this update is to make library pickles lazier in order to reduce the initial space penalty for autoloading a library. As a result, it is now possible to have $smlnj/compiler.cm pre-registered. This should take care of the many complaints or inquiries about missing structure Compiler. This required changes to CM’s internal data structures and small tweaks to some algorithms.

As a neat additional effect, it is no longer necessary (for the sake of lean heap image files) to distinguish between a "minimal" CM and a "full" CM. Now, there is only one CM (i.e., the "full" version: $smlnj/cm.cm aka $smlnj/cm/full.cm), and it is always available at the interactive top level. ($smlnj/cm/minimal.cm is gone.)

To make the life of compiler-hackers easier, "makeml" now also pre-registers $smlnj/cmb.cm (aka $smlnj/cmb/current.cm). In other words, after you bootstrap a new sml for the first time, you will not have to autoload $smlnj/cmb.cm again afterwards. (The first time around you will still have to do it, though.)
A second change consists of major updates to the CM manual. There are now several appendices with summary information and also a full specification of the CM description file syntax.
In directory src/system I added the script "allcross". This script invokes sml and cross-compiles the compiler for all supported architectures. (Useful when providing a new set of boot files.)
There seems to be a latent bug in my "lazy pickles" mechanism. I added a small tweak to pickle-util.sml to work around this problem, but it is not a proper fix yet. I will investigate further. (The effect of the bug was an inflation of library pickle size.)
Version number increased to 110.28.1 (to avoid compatibility problems).

Matthias Blume

[2000/05/25]

Fixed a bug in freezing phase of the register allocator.

Allen Leung

[2000/05/15]

Alpha

Slight cleanup.  Removed the instruction SGNXL

X86

Added the following instructions to the instruction set:

ROLx, RORx,
BTx, BTSx, BTLx, BTRx,
XCHGx, and variants with the LOCK prefix

The module ra-rewrite-with-renaming has been improved.

These have no effect on SML/NJ.

Allen Leung

[2000/05/15]

I added an alternative to "-rebuild" to "makeml". The difference is that prior to calling CMB.make' the CM-variable "LIGHT" will be defined. In effect, the command will not build any cross-compiler backends and therefore finish more quickly.
```
The "fixpt" script also takes a "-light" switch to be able to use
this new facility while compiling for a fixpoint.
```
I replaced all mentions of anchored paths in group owner specifications with simple relative paths (usually starting with ".."). The rationale is that a library’s internal workings should not be compromised by the lack of some anchor. (An anchor is necessary for someone who wants to refer to the library by an anchored path, but it should not be necessary to build the same library in the first place.)
I changed the way CM’s tool mechanism determines the shell command string used for things like ml-yacc etc. so that it does not break when CM.Control.implicit_anchors is turned off.

Matthias Blume

[2000/05/12]: Fixed a bug in config/_ml-build that prevented ml-yacc and ml-lex from getting installed properly (by config/install.sh).

Matthias Blume

[2000/05/12]

!!! NEW BOOT FILES !!!

This change is in preparation of fading out support for "implicitly anchored path names". I went through all sources and used the explicit (and relatively new) $-notation. See system/README and the CM manual for more info on this.

I also modified the anchoring scheme for some things such as "smlnj", "MLRISC", "cm", etc. to take advantage of the fact that explicit anchors are more expressive: anchor name and first arc do not have to coincide. This entails the following user-visible change:

You have to write $smlnj/foo/bar instead of smlnj/foo/bar. In particular, when you fire up sml with a command-line argument, say, e.g.:

sml '$smlnj/cmb.cm'

At the ML toplevel prompt:

CM.autoload "$smlnj/cmb.cm";

There is also a new controller in CM.Control that can be used to turn off all remaining support for implicit anchors by saying:

CM.autoload "$smlnj/
#set CM.Control.implicit_anchors false;

This causes CM to reject implicitly anchored paths. This is (for the time being) less permissive than the "final" version where there will be no more such implicit anchors and relative paths will be just that: relative.

The next step (version after next version?) will be to make the default for CM.Control.implicit_anchors false. After the dust has settled, I can then produce the "final" version of this…

Note: Since bootstrapping is a bit tricky, I provided new boot files.

Matthias Blume

[2000/05/11]

The main change is that I added function CM.sources as a generalized version of the earlier CM.makedepend. This entails the following additional changes:

CM.makedepend has been dropped.
CM manual has been updated.
TOOLS signature and API have been changed.

Matthias Blume

[2000/05/10]

  Various bug fixes and new features for C--, Moby and MLRISC optimizations.
None of these affect SML/NJ.

Register Allocation
1. A new ra spilling module (ra/ra-spill-with-renaming) is implemented. This module tries to remove local (i.e. basic block level) redundancies during spilling.
2. A new framework for performing region based register allocation. Not yet entirely functional.
X86
1. DefUse for POP was missing the stack pointer [found by Lal]
2. Reload for CALL was incorrect in X86Spill [found by John]
3. Various fixes in X86Spill so that it can be used correctly for the new spilling module.
SSA/IR
1. New module ir/dj-dataflow.sml implements elimination based data flow analysis.
MLRiscGen
1. Fix for gc type annotation

MDGen

Various fixes for machine description -> ml code translation.  For ssa
only.

Allen Leung

[2000/05/08]

Fermin has found a few assembly problems with constant expressions
generated in LabelExp.  Mostly, the problems involve extra parentheses,
which choke on dumb assemblers.  This is his fix.

Allen Leung

Version 110.28; 2000/04/09

[2000/04/09]

Updated src/compiler/TopLevel/main/version.sml to version 110.28
Updated config/version to 110.28
Updated config/srcarchiveurl
New boot files! ftp://ftp.research.bell-labs.com/dist/smlnj/working/110.28/

Dave MacQueen

[2000/05/01]

A new noweb tool has been added. The existing system is entirely unaffected by this, but some CM users have asked for renewed noweb support. Everything is documented in the CM manual.

New (plugin) libraries:

noweb-tool.cm
nw-ext.cm

Matthias Blume

[2000/04/30]

Fix for bug 1498 smlnj/src/system/Basis/Implementation/Unsafe/object.sig smlnj/src/system/Basis/Implementation/Unsafe/object.sml added toRealArray function smlnj/src/compiler/MiscUtil/print/ppobj.sml added check for tag Obj.RealArray to array printing case in ppObj
Fix for bug 1510 smlnj/src/compiler/Semant/types/typesutil.sml fixed definition of dummyargs (used by equalTycon) so that dummy args are distinct types

Dave MacQueen

[2000/04/30]

CM version numbering added. This is an implementation of Lal’s proposal for adding version numbers and version checking to .cm files. Lal said that his proposal was just that — a proposal. For the time being I went ahead and implemented it so that people can comment on it. Everything is completely backward-compatible (except for the stable library format, i.e., new bootfiles!).
```
As usual, see the CM manual for details.
```
An alternative syntax for anchored paths has been implemented. Dave has recently voiced the same concerns that I had when I did this, so there should be some support. My take is that eventually I will let support for the current syntax (where anchors are "implicit") fade out in favor of the new, explicit syntax. In order to be backward-compatible, both old and new syntax are currently supported.
```
Again, see the CM manual for details.
```
Parallel make is trying to be slightly smarter: When the master process finds a "bottleneck", i.e., when there is only one compilation unit that can be compiled and everybody else is waiting on it, then it will simply compile it directly instead of clumsily telling one of the slaves to do it.
Support for "unsharing" added. This is necessary in order to be able to have two different versions of the same library running at the same time (e.g., for trying out a new MLRISC while still having the old MLRISC linked into the current compiler, etc.) See the CM manual.
Simple "makedepend" functionality added for generating Makefile dependency information. (This is rather crude at the moment. Expect some changes here in the future.)
".fun" added as a recognized suffix for ML files. Also documented explicitly in the manual that the fallback behavior (unknown suffix → ML file) is not an official feature!
Small changes to the pickler for stable libraries.
Several internal changes to CM (for cleanup/improvement).

+ !!!! NEW BINFILES !!!!

+ Matthias Blume

[2000/04/28]

I changed config/install.sh to remove duplicate entries from the lib/pathconfig file at the end. Moreover, the final version of lib/pathconfig is sorted alphabetically. The same (sorting) is done in src/system/installml.
The config/install.sh script now consistently uses relative pathnames in lib/pathconfig whenever the anchor is in the lib directory. (So far this was true for the libraries that come pre-compiled and bundled as part of the bootfiles but not for libraries that are compiled by the script itself.)

Matthias Blume

[2000/04/26]: Added ".fun" as a recognized file name suffix (for ML code).

Matthias Blume

[2000/04/25]

Alpha

    PSEUDOARITH was missing in AlphaRewrite.  This causes an endless loop
in C--.

Added a flag "ra-dump-size" to print out the size of the flowgraph
and the interference graph.

Allen Leung

[2000/04/25]

Updated mlyacc.tex sections 5 and 7 for SML '97 and CM.
Updated all three examples in src/ml-yacc/examples to run
under 110.* using CM.make.

Dave MacQueen

[2000/04/20]

  This update synchronizes my repository with Yale's.  Most of these
changes, however, do not affect SML/NJ at all (the RA is an exception).

Register Allocator
1. An improvement in the interference graph construction: Given a copy
  s <- t
  no interference edge between s and t is added for this definition of s.
2. I’ve added two new spill heuristic modules that Fermin and I developed (in the new library RA.cm). These are unused in SML/NJ but maybe useful for others (Moby?)
X86
1. Various fixes in the backend provided by Fermin [C--] and Lal.
Alpha
1. Added the BSR instruction and code generation that goes with it [C--]
2. Other fixes too numerous to recount provided by Fermin [C--]
Regmaps
1. The regmaps are not initialized with the identity physical bindings at creation time. This is unneeded.
MLRISC Optimizations
1. The DJ-Graph module can now compute the iterated dominance frontiers intersects with liveness incrementally in linear time! Woohoo! This is now used in my new SSA construction algorithm.
2. THe branch reorganization module is now smarter about linear chains of basic blocks.
  
  Allen Leung

[2000/04/12]: Changed install.sh script to handle archive files without version number and to use "boot.<arch>-<os>" instead of "sml.boot.<arch>-<os>" for the name of the boot file archive.

Matthias Blume

Version 110.27; 2000/04/09

[2000/04/09]

Updated src/compiler/TopLevel/main/version.sml to version 110.27
Updated src/config/version to 110.27
New boot files!

Dave MacQueen

[2000/04/09]

Yet another fix for x86 assembly for idivl, imull, mull and friends.
Miscellaneous improvements to MLRISC (unused in sml/nj)

Allen Leung

[2000/04/07]: Improved handling of branches (mostly those generated from polymorphic equality), removed switchoff and changed the default optimization settings (more cpsopt and less flintopt).

Stefan

[2000/04/06]

Forgot a few files.

Allen Leung

[2000/04/06]

New Peephole code
Minor improvement to X86 instruction selection
Various fixes to SSA and machine description → code translator

Allen Leung

[2000/04/05]: This update just merges three minor cosmetic updates to CM’s sources to get ready for the 110.27 code freeze on Friday. No functionality has changed.

Matthias Blume

[2000/04/04]

Fixed a problem in X86 assembly.

Things like

jmp %eax
jmp (%eax)

should be output as

jmp *%eax
jmp *(%eax)

Assembly output

Added a new flag

"asm-indent-copies" (default to false)

When this flag is on, parallel copies will be indented an extra level.

Allen Leung

[2000/04/04]

All of these fixes are related to C--, Moby, and my own optimization
stuff; so they shouldn't affect SML/NJ.

X86

Various fixes related floating point, and extensions.

Alpha

Some extra patterns related to loads with signed/zero extension
provided by Fermin.

Assembly

When generating assembly, resolve the value of client defined constants,
instead of generating symbolic values.  This is controlled by the
new flag "asm-resolve-constants", which is default to true.

Machine Descriptions
1. The precedence parser was slightly broken when parsing infixr symbols.
2. The type generalizing code had the bound variables reversed, resulting in a problem during arity raising.
3. Various fixes in machine descriptions.
Allen Leung

[2000/04/03]: I eliminated coreEnv from compInfo. Access to the "Core" structure is now done via the ordinary static environment that is context to each compilation unit.

To this end, I arranged that instead of "structure Core" as "structure _Core" is bound in the pervasive environment. Core access is done via _Core (which can never be accidentally rebound because _Core is not a legal surface-syntax symbol).

The current solution is much cleaner because the core environment is now simply part of the pervasive environment which is part of every compilation unit’s context anyway. In particular, this eliminates all special-case handling that was necessary until now in order to deal with dynamic and symbolic parts of the core environment.

Remaining hackery (to bind the "magic" symbol _Core) is localized in the compilation manager’s bootstrap compiler (actually: in the "init group" handling). See the comments in src/system/smlnj/init/init.cmi for more details.

I also tried to track down all mentions of "Core" (as string argument to Symbol.strSymbol) in the compiler and replaced them with a reference to the new CoreSym.coreSym. Seems cleaner since the actual name appears in one place only.

Binfile and bootfile format have not changed, but the switchover from the old "init.cmi" to the new one is a bit tricky, so I supplied new bootfiles anyway.

Matthias Blume

[2000/04/02]

Renamed the constructor CALL in MLTREE by popular demand.
Added a bunch of files from my repository. These are currently used by other non-SMLNJ backends.

Allen Leung

[2000/03/31]

This update contains a rewritten (and hopefully more correct) module for extracting aliasing information from CPS.

To turn on this feature:

Compiler.Control.CG.memDisambiguate := true

To pretty print the region information with assembly

Compiler.Control.MLRISC.getFlag "asm-show-region" := true;

To control how many levels of aliasing information are printed, use:

Compiler.Control.MLRISC.getInt "points-to-show-level" := n

The default of n is 3.

Allen Leung

[2000/03/31]

This update contains:

runtime/c-lib/c-libraries.c includes added in revision 1.2 caused compilation errors on hppa-hpux
fix for bug 1556 system/Basis/Implementation/NJ/internal-signals.sml

Dave MacQueen

[2000/03/31]

This update contains:

A small change to CM’s handling of stable libraries: CM now maintains one "global" modmap that is used for all stable libraries. The use of such a global modmap maximizes sharing and minimizes the need for re-traversing parts of environments during modmap construction. (However, this has minor impact since modmap construction seems to account for just one percent or less of total compile time.)
I added a "genmap" phase to the statistics. This is where I got the "one percent" number (see above).
CM’s new tool parameter mechanism just became even better. :)
- The parser understands named parameters and recursive options.
- The "make" and "shell" tools use these new features. (This makes it a lot easier to cascade these tools.)
- There is a small syntax change: named parameters use a
  <name> : ( <option> ... ) or <name> : <string>
  syntax. Previously, named parameters were implemented in an ad-hoc fashion by each tool individually (by parsing strings) and had the form
  <name>=<string>
  See the CM manual for a full description of these issues.
  Matthias Blume

Version 110.26.2; 2000/03/30

[2000/03/30]

!!!!! WARNING !!!!!! !! New binfiles !! !!!!!!!!!!!!!!!!!!!!

This update contains:

Moderate changes to CM:

Changes to CM’s tools mechanism. In particular, it is now possible to have tools that accept additional "command line" parameters (specified in the .cm file at each instance where the tool’s class is used).

This was done to accommodate the new "make" and "shell" tools which
facilitate fairly seamless hookup to portions of code managed using
Makefiles or Shell scripts.

There are no classes "shared" or "private" anymore.  Instead, the
sharing annotation is now a parameter to the "sml" class.

There is a bit of generic machinery for implementing one's own
tools that accept command-line parameters.  However, I am not yet fully
satisfied with that part, so expect changes here in the future.

All existing tools are described in the CM manual.

Slightly better error handling. (CM now suppresses many followup error messages that tended to be more annoying than helpful.)

Major changes to the compiler’s static environment data structures.
- no CMStaticEnv anymore.
- no CMEnv, no "BareEnvironment" (actually, only BareEnvironment, but it is called Environment), no conversions between different kinds of static environments
- There is still a notion of a "modmap", but such modmaps are generated on demand at the time when they are needed. This sounds slow, but I sped up the code that generates modmaps enough for this not to lead to a slowdown of the compiler (at least I didn’t detect any).
- To facilitate rapid modmap generation, static environments now contain an (optional) "modtree" structure. Modtree annotations are constructed by the unpickler during unpickling. (This means that the elaborator does not have to worry about modtrees at all.) Modtrees have the advantage that they are compositional in the same way as the environment data structure itself is compositional. As a result, modtrees never hang on to parts of an environment that has already been rendered "stale" by filtering or rebinding.
- I went through many, many trials and errors before arriving at the current solution. (The initial idea of "linkpaths" did not work.) But the result of all this is that I have touched a lot of files that depend on the "modules" and "types" data structures (most of the elaborator). There were a lot of changes during my "linkpath" trials that could have been reverted to their original state but weren’t. Please, don’t be too harsh on me for messing with this code a bit more than what was strictly necessary… (I did resist the temptation of doing any "global reformatting" to avoid an untimely death at Dave’s hands. :)
- One positive aspect of the previous point: At least I made sure that all files that I touched now compile without warnings (other than "polyEqual").
- compiler now tends to run "leaner" (i.e., ties up less memory in redundant modmaps)
Matthias Blume

[2000/03/29]

Boot files (optional): ftp://react-ilp.cs.nyu.edu/leunga/110.26.1-sml.boot.x86-unix-20000330.tar.gz

   This update contains *MAJOR* changes to the way code is generated from CPS
in the module mlriscGen, and in various backend modules.

CHANGES

MLRiscGen: forward propagation fix.

There was a bug in forward propagation introduced at about the same time
as the MLRISC x86 backend, which prohibits coalescing to be
performed effectively in loops.

Effect: speed up of loops in RISC architectures.
        By itself, this actually slowed down certain benchmarks on the x86.

MLRiscGen: forward propagating addresses from consing.

I've changed the way consing code is generated.  Basically I separated
out the initialization part:

store tag,   offset(allocptr)
store elem1, offset+4(allocptr)
store elem2, offset+8(allocptr)
...
store elemn, offset+4n(allocptr)

and the address computation part:

celladdr <- offset+4+alloctpr

and move the address computation part

Effect:  register pressure is generally lower as a result.  This
         makes compilation of certain expressions much faster, such as
         long lists with non-trivial elements.

[(0,0), (0,0), .... (0,0)]

MLRiscGen: base pointer elimination.

As part of the linkage mechanism, we generate the sequence:

L:  ...  <- start of the code fragment

L1:
    base pointer <- linkreg - L1 + L

  The base pointer was then used for computing relocatable addresses
in the code fragment.  Frequently (such as in lots of continuations)
this is not needed.  We now eliminate this sequence whenever possible.

  For compile time efficiency, I'm using a very stupid local heuristic.
But in general, this should be done as a control flow analysis.

Effect:  Smaller code size.  Speed up of most programs.

Hppa back end

   Long jumps in span dependence resolution used to depend on the existence
of the base pointer.

A jump to a long label L was expanded into the following sequence:

LDIL %hi(L-8192), %r29
LDO  %lo(L-8192)(%r29), %r29
ADD  %r29, baseptr, %r29
BV,n %r0(%r29)

  In the presence of change (3) above, this will not work.  I've changed
it so that the following sequence of instructions are generated, which
doesn't mention the base pointer at all:

     BL,n  L', %r29           /* branch and link, L' + 4 -> %r29 */
L':  ADDIL L-(L'+4), %r29     /* Compute address of L */
     BV,n  %r0(%r29)          /* Jump */

Alpha back end

   New alpha instructions LDB/LDW have been added, as per Fermin's
suggestions.   This is unrelated to all other changes.

X86 back end

I've changed andl to testl in the floating point test sequence
whenever appropriate.  The Intel optimization guide states that
testl is preferable to andl.

RA (x86 only)

  I've improved the spill propagation algorithm, using an approximation
of maximal weighted independent sets.   This seems to be necessary to
alleviate the negative effect in light of the slow down in (1).

I'll write down the algorithm one of these days.

MLRiscGen: frequencies

  I've added an annotation that states that all call gc blocks have zero
execution frequencies.  This improves register allocation on the x86.

BENCHMARKS

I've only perform the comparison on 110.25.

The platforms are:

HPPA  A four processor HP machine (E9000) with 5G of memory.
X86   A 300Hhz Pentium II with 128M of memory, and
SPARC An Ultra sparc 2 with 512M of memory.

I used the following parameters for the SML benchmarks:

        @SMLalloc
HPPA    256k
SPARC   512k
X86     256k

COMPILATION TIME

Here are the numbers comparing the compilation times of the compilers.
I've only compared 110.25 compiling the new sources versus
a fixpoint version of the new compiler compiling the same.

            110.25                                  New
      Total  Time in RA  Spill+Reload   Total  Time In RA Spill+Reload
HPPA   627s    116s        2684+3584     599s    95s       1003+1879
SPARC  892s    173s        2891+3870     708s    116s      1004+1880
X86    999s    315s       94006+130691   987s    296s    108877+141957

          110.25         New
       Code Size      Code Size
HPPA   8596736         8561421
SPARC  8974299         8785143
X86    9029180         8716783

So in summary, things are at least as good as before.   Dramatic
reduction in compilation is obtained on the Sparc; I can't explain it,
but it is reproducible.  Perhaps someone should try to reproduce this
on their own machines.

SML BENCHMARKS

On the average, all benchmarks perform at least as well as before.

HPPA         Compilation Time     Spill+Reload      Run Time
           110.25  New            110.25    New   110.25  New

    barnesHut  3.158  3.015  4.75%    1+1       0+0   2.980  2.922   2.00%
        boyer  6.152  5.708  7.77%    0+0       0+0   0.218  0.213   2.34%
 count-graphs  1.168  1.120  4.32%    0+0       0+0  22.705 23.073  -1.60%
          fft  0.877  0.792 10.74%    1+3       1+3   0.602  0.587   2.56%
  knuthBendix  3.180  2.857 11.32%    0+0       0+0   0.675  0.662   2.02%
       lexgen  6.190  5.290 17.01%    0+0       0+0   0.913  0.788  15.86%
         life  0.803  0.703 14.22%   25+25      0+0   0.153  0.140   9.52%
        logic  2.048  2.007  2.08%    6+6       1+1   4.133  4.008   3.12%
   mandelbrot  0.077  0.080 -4.17%    0+0       0+0   0.765  0.712   7.49%
       mlyacc 22.932 20.937  9.53%  154+181    32+57  0.468  0.430   8.91%
      nucleic  5.183  5.060  2.44%    2+2       0+0   0.125  0.120   4.17%
ratio-regions  3.357  3.142  6.84%    0+0       0+0  116.225 113.173 2.70%
          ray  1.283  1.290 -0.52%    0+0       0+0   2.887  2.855   1.11%
       simple  6.307  6.032  4.56%   28+30      5+7   3.705  3.658   1.28%
          tsp  0.888  0.862  3.09%    0+0       0+0   7.040  6.893   2.13%
         vliw 24.378 23.455  3.94%  106+127    25+45  2.758  2.707   1.91%
--------------------------------------------------------------------------
 Average                     6.12%                                   4.09%

SPARC        Compilation Time     Spill+Reload      Run Time
           110.25  New            110.25    New   110.25  New

    barnesHut  3.778  3.592  5.20%    2+2       0+0   3.648  3.453    5.65%
        boyer  6.632  6.110  8.54%    0+0       0+0   0.258  0.242    6.90%
 count-graphs  1.435  1.325  8.30%    0+0       0+0  33.672 34.737   -3.07%
          fft  0.980  0.940  4.26%    3+9       2+6   0.838  0.827    1.41%
  knuthBendix  3.590  3.138 14.39%    0+0       0+0   0.962  0.967   -0.52%
       lexgen  6.593  6.072  8.59%    1+1       0+0   1.077  1.078   -0.15%
         life  0.972  0.868 11.90%   26+26      0+0   0.143  0.140    2.38%
        logic  2.525  2.387  5.80%    7+7       1+1   5.625  5.158    9.05%
   mandelbrot  0.090  0.093 -3.57%    0+0       0+0   0.855  0.728   17.39%
       mlyacc 26.732 23.827 12.19%  162+189    32+57  0.550  0.560   -1.79%
      nucleic  6.233  6.197  0.59%    3+3       0+0   0.163  0.173   -5.77%
ratio-regions  3.780  3.507  7.79%    0+0       0+0 133.993 131.035   2.26%
          ray  1.595  1.550  2.90%    1+1       0+0   3.440  3.418    0.63%
       simple  6.972  6.487  7.48%   29+32      5+7   3.523  3.525   -0.05%
          tsp  1.115  1.063  4.86%    0+0       0+0   7.393  7.265    1.77%
         vliw 27.765 24.818 11.87%  110+135    25+45  2.265  2.135    6.09%
----------------------------------------------------------------------------
 Average                     6.94%                                    2.64%

X86          Compilation Time     Spill+Reload      Run Time
           110.25  New            110.25    New   110.25  New

    barnesHut  5.530  5.420  2.03%  593+893   597+915   3.532  3.440   2.66%
        boyer  8.768  7.747 13.19%  493+199   301+289   0.327  0.297  10.11%
 count-graphs  2.040  2.010  1.49%  298+394   315+457  26.578 28.660  -7.26%
          fft  1.327  1.302  1.92%  112+209   115+210   1.055  0.962   9.71%
  knuthBendix  5.218  5.475 -4.69%  451+598   510+650   0.928  0.932  -0.36%
       lexgen  9.970  9.623  3.60% 1014+841  1157+885   0.947  0.928   1.97%
         life  1.183  1.183  0.00%  162+182   145+148   0.127  0.103  22.58%
        logic  3.285  3.512 -6.45%  514+684   591+836   5.682  5.577   1.88%
   mandelbrot  0.147  0.143  2.33%   38+41     33+54    0.703  0.690   1.93%
       mlyacc 35.457 32.763  8.22% 3496+4564 3611+4860  0.552  0.550   0.30%
      nucleic  7.100  6.888  3.07%  239+168   201+158   0.175  0.173   0.96%
ratio-regions  6.388  6.843 -6.65% 1182+257   981+300  120.142 120.345 -0.17%
          ray  2.332  2.338 -0.29%  346+398   402+494   3.593  3.540   1.51%
       simple  9.912  9.903  0.08% 1475+941  1579+1168  3.057  3.178  -3.83%
          tsp  1.623  1.532  5.98%  266+200   250+211   8.045  7.878   2.12%
         vliw 33.947 35.470 -4.29% 2629+2774 2877+3171  2.072  1.890   9.61%
----------------------------------------------------------------------------
 Average                     1.22%                                     3.36%

Allen Leung

[2000/03/23]

X86 fixes/changes
1. The old code generated for SETcc was completely wrong. The Intel optimization guide is VERY misleading.
ALPHA fixes/changes
1. Added the instructions LDBU, LDWU, STB, STW as per Fermin’s suggestion.
2. Added a new mode byteWordLoadStores to the functor parameter to Alpha()
3. Added reassociation code for address computation.
Allen Leung

[2000/03/22]

X86 fixes/changes
1. x86Rewrite bug with MUL3 (found by Lal)
2. Added the instructions FSTS, FSTL
PA-RISC fixes/changes
1. B label should not be a delay slot candidate! Why did this work?
2. ADDT(32, REG(32, r), LI n) now generates one instruction instead of two, as it should be.
3. The assembly syntax for fstds and fstdd was wrong.
4. Added the composite instruction COMICLR/LDO, which is the immediate operand variant of COMCLR/LDO.
Generic MLRISC
1. shuffle.sml rewritten to be slightly more efficient
2. DIV bug in mltree-simplify fixed (found by Fermin)
Register Allocator
1. I now release the interference graph earlier during spilling. May improve memory usage.
Allen Leung

Version 110.26.1; 2000/03/14

[2000/03/14]

Tools.registerStdShellCmdTool (from smlnj/cm/tool.cm) takes an additional argument called "template" which is an optional string that specifies the layout of the tool command line. See the CM manual for explanation.
A special-purpose tool can be "registered" by simply dropping the corresponding <…>-tool.cm (and/or <…>-ext.cm) into the same directory where the .cm file lives that uses this tool. (The behavior/misfeature until now was to look for the tool description files in the current working directory.) As before, tool description files could also be anchored — in which case they can live anywhere they like. Following the recent e-mail discussion, this change should make it easier to have special-purpose tools that are shipped together with the sources of the program that uses them.

Matthias Blume

[2000/03/10]: I added a re-written version of Dave’s fixpt script to src/system. Changes relative to the original version: - sh-ified (not everybody has ksh) - automatically figures out which architecture it runs on - uses ./makeml a bit more cleverly - never invokes ./installml (and, thus, does not clobber your good and working installation of sml in case something goes wrong) - accepts max iteration count using option "-iter <n>" - accepts a "base" name using option "-base <base>"

+ It does not build any extraneous heap images but directly rebuilds bin- and boot-hierarchies using makeml’s "-rebuild" switch. Finally, it can incorporate existing bin- and boot- hierarchies. For example, suppose the base is set to "sml" (which is the default). Then it successively builds

+ sml.bin.<arch>-unix and sml.boot.<arch>-unix then sml1.bin.<arch>-unix and sml1.boot.<arch>-unix then sml2.bin.<arch>-unix and sml2.boot.<arch>-unix … then sml<n>.bin.<arch>-unix and sml<n>.boot.<arch>-unix

+ and so on. If any of these already exist, it will just use what’s there. In particular, many people will have the initial set of bin and boot files around, so this saves time for at least one full rebuild. Having sets of the form <base><k>.{bin,boot}.<arch>-unix for <k>=1,2,… is normally not a good idea when invoking fixpt. However, they might be the result of an earlier partial run of fixpt (which perhaps got accidentally killed). In this case, fixpt will quickly move through what exists before continuing where it left off earlier, and, thus, saves a lot of time.

+ Matthias Blume

[2000/03/10]: More assembly output problems involving the indexed addressing mode on the x86 have been found and corrected. Thanks to Fermin Reig for the fix.

The interface and implementation of the register allocator have been changed slightly to accommodate the possibility to skip the register allocation phases completely and go directly to memory allocation. This is needed for C-- use.

Allen Leung

[2000/03/09]

Complete re-organization of library names. Many libraries have been consolidated so that they share the same path anchor. For example, all MLRISC-related libraries are anchored at MLRISC, most libraries that are SML/NJ-specific are under "smlnj". Notice that names like host-cmb.cm or host-compiler.cm no longer exist. See system/README for a complete description of the new naming scheme. Quick reference:
```
host-cmb.cm        -> smlnj/cmb.cm
host-compiler.cm   -> smlnj/compiler.cm
full-cm.cm         -> smlnj/cm.cm
<arch>-<os>.cm     -> smlnj/cmb/<arch>-<os>.cm
<arch>-compiler.cm -> smlnj/compiler/<arch>.cm
```
Bug fixes in CM.
- exceptions in user code are being passed through (i.e., reach top level)
- more bugs in paranoia mode fixed
- bug related to checking group owners fixed
New install.sh script that automagically fetches archive files: The new file config/srcarchiveurl must contain the URL of the (remote) directory that contains bin files (or other source archives). If install.sh does not find the archive locally, it tries to get it from that remote directory. This should simplify installation further: For machines that have access to the internet, just fetch <version>-config.tgz, unpack it, edit config/targets, and go (run config/install.sh). The script will fetch everything else that it might need all by itself.

For CVS users, this mechanism is not relevant for source archives, but it is convenient for getting new sets of binfiles.

Archives should be tar files compressed with either gzip, compress, or bzip2. The script recognizes .tgz, .tar, tar.gz, tz, .tar.Z, and .tar.bz2.

Matthias Blume

[2000/03/07]

size info in BOOTLIST
- no fixed upper limits for number of bootfiles or length of bootfile names in runtime
- falling back to old behavior if no BOOTLIST size info found
allocation size heuristics in .run-sml
- tries to read cache size from /proc/cpuinfo (this is important for small-cache Celeron systems!)
install.sh robustified
CM manual updates
paranoid mode
- no more CMB.deliver() (i.e., all done by CMB.make())
- can re-use existing sml.boot.* files
- init.cmi now treated as library
- library stamps for consistency checks
sml.boot.<arch>-<os>/PIDMAP file
- This file is read by the CM startup code. This is used to minimize the amount of dynamic state that needs to be stowed away for the purpose of sharing between interactive system and user code.
CM.Anchor.anchor instead of CM.Anchor.{set,cancel}
- Upon request by Elsa. Anchors now controlled by get-set-pair like most other CM state variables.
Compiler.CMSA eliminated
- No longer supported by CM anyway.
fixed bugs in pickler that kept biting Stefan
- past refs to past refs (was caused by the possibility that ad-hoc sharing is more discriminating than hash-cons sharing)
- integer overflow on LargeInt.minInt
ml-{lex,yacc} build scripts now use new mechanism for building standalone programs
fixed several gcc -Wall warnings that were caused by missing header files, missing initializations, etc., in runtime (not all warnings eliminated, though)

Matthias Blume