Overview


Charm is a high-level programming language and set of development tools which is at the same time suitable for beginners, yet powerful enough for use in the undertaking of substantial projects.  It is particularly tailored towards the Raspberry PiRISC PC range of computers and emulators running the RISC OS operating system, and is geared towards rapid production of small and fast modules and applications.

Background


At the heart of all computers is the central processing unit (CPU).  This chip understands a machine language comprising a set of instructions which it fetches from the computer's memory (RAM).  The CPU contains a small number of registers which can hold data or addresses.  A reduced instruction set (RISC) CPU like the ARM chip generally has just a few instructions to fetch and store the registers from and to memory.  All other operations (e.g. adding two numbers) are carried out in the registers for speed.

Computer programs stored in memory appear as machine code.  If viewed as data, each group of numbers in memory encode a particular instruction.  Although it is possible to patch code straight into a file, or into memory, this is generally a very time consuming, error prone and tedious occupation, and is only to be undertaken as a last resort.

To help programmers, programs called assemblers were developed in the early days of computers.  These work by translating textual mnemonics for instructions, data and addresses into machine code e.g.

MOV   R0,R1

might mean move the contents of register R1 in to register R0.  On the RISC PC this instruction is stored in memory as the hexadecimal (base 16) number E1A00001, which is rather less meaningful.  The MOV instruction would no doubt result in an entirely different encoded number if assembled for a different CPU, assuming the instruction set even contained that instruction.

Whilst CPUs have no problems understanding and executing assembly language machine code, for many reasons it is not an ideal language for us humans to write in.  Some of the reasons for this are :-

  • each instruction is very simple, and therefore many instructions are generally necessary to accomplish a given task.
  • assembly language programs cannot easily be moved from one machine to another, particularly if the machines use different CPUs.
  • the programmer must take great care in allocating, saving and tracking the contents of the CPU registers.

To get around these problems, a class of programs known as interpreters, BASIC being the prime example, were developed.  An interpreter can load the source text for an interpreted language and run it by executing the equivalent machine language necessary to implement each instruction in that language.

Interpreters have a number of advantages over assembly language, but one major disadvantage.  They are comparatively slow.  The reason for this is that the CPU must spend a good deal of its time decoding the text of the interpreted language and looking up the relevant machine code to execute.

Fortunately, compilers can offer the best of both worlds, namely a high level language which is powerful and easy to understand, together with (almost) the speed of native machine code.  This is achieved by translating the source text of the language into machine code at compile time (when the compiler is run).  Hence the input to the compiler is a file containing high level instructions in the source language, and the output from the compiler is a file containing the equivalent machine code which is executed later at run time.

Most compilers do not compile directly to machine language, but instead to assembly language source text (though Charm can do this by importing the necessary assembler modules).  A further assembly step is then required to arrive at machine or object code.  This object code must normally then be linked with other compiled modules (source code files) and a run time library (RTL) which comprises a run time support package containing for example I/O (input / output) routines.  The output from the linker is an executable program (application or module), and optionally a map file defining the addresses of modules, procedures and data.

Compilers take care of all the nitty gritty detail that the assembly language programmer must attend to such as managing registers.  Also, compilers provide machine independence (the same source language can be run on any number of different hardware platforms, provided the appropriate compiler is chosen and available).

The following example shows a simple Charm module and the assembly code and object code it produces once it has been run through the compiler and assembler respectively :-

module Example
{
    int x;
    proc increment ()
    {
        x := x + 1;
    }
}

                  Assembler listing of ram:$.arm.example
    1:0000:4578616D            string      "Example"
   2:                         align
   3:                   x
   4:0008:00000000            equd  0
   5:                         align
   6:                   _increment
   7:0000:E92D4000            stmfd       sp!,{rp}
   8:0004:E59CE000            ldr         r14,x
   9:0008:E28EE001            add         r14,r14,#1
  10:000C:E58CE000            str         r14,x
  11:0010:E8BD8000            ldmfd       sp!,{pc}
  12:                         end
   

The detail is not important, just the understanding that the compiler has translated the high level Charm source text into assembly language, and the assembler has translated the assembly source text into machine code (32 bit hexadecimal code words).

History


The history of Charm dates back to its origins on the 68000 CPU at the start of the 1980s.  I had always been interested in computers from before my university days, and after receiving my degree in the sciences, decided to launch myself into a career as a programmer.  Having no formal training however, I decided to pick up as much as possible about the subject from my colleagues, books, and best of all from practical experience.  To this end, I bought myself a hobbyist S-100 10 slot motherboard based system from the USA to replace my aging Apple ][.  This now archaic beast came with twin 8" floppy drives and ran CPM/68K.

For those of you that remember back that far, CPM is a very primitive operating system, and a forerunner of the now ubiquitous DOS / Windows of the PC world.  In fact I was so underwhelmed with its speed and capabilities that I decided to write my own multitasking pre-emptive operating system.  This made extensive use of interrupts and supported aysnchronous I/O, stream read-ahead buffering and the like.  In fact I became so enthusiastic that I even rewrote the boot EPROM to include a simple monitor / debugger.  By the time I had finished the project, I was able to claim that every line of code running on the machine had been written by myself !

My main bug-bear was the lack of a decent compiler, so that all of my work had to be written in 68000 assembly language.  I had always wondered how compilers worked, so here was a golden opportunity to learn more about them by writing one and end up with a useful development tool. 

First I needed some good books on the subject.  Two I can recommend to those who are interested in the subject are :-

  •  Principles of Compiler Design by Aho and Ullman
  •  Compilers, Principles, Techniques and Tools by Aho, Sethi and Ullman

Second, I needed a language to compile.  For this, I took parts of languages I knew and liked, plus one or two new ideas of my own.  I had just finished writing software for a large Motorway Control System project written in RTL/2, and this language had considerable influence in my language specification.  Readers may also notice similarities with Modula 2 and C.

Third and last I needed an implementation language to write the compiler in.  Since all I then possessed was an assembler, I wrote the first simplified version of the compiler directly in 68000 assembly language.

Bootstrapping

The original assembly language version of the compiler was only capable of handling simple constructs and generated copious quantities of rather inefficient code.  What was needed was a re-write in a language more suitable for further development and enhancement.  The obvious choice was Charm itself.  It was relatively easy to translate the assembly language code a module at a time into Charm source code, and this also proved to be an excellent way of testing the compiler.

Once all of the software was translated, Charm became self-compiling.  This is a major milestone in the development of a compiler, and the means by which it is achieved which (akin to finding a solution for the "chicken and egg" problem) is known as "bootstrapping".

Once I had a basic version of Charm running, I started to make additions and enhancements to the language.  As a result of these improvements, the Charm source code itself could be simplified and made more elegant.  This is very much an interactive and continual process of refinement, and still continues to a lesser extent today.

Other additions to the compiler source code were made to produce better optimised machine code.  These often had the somewhat bizarre effect of generating an executable image smaller than the previous version, despite the additional source code required.

Porting


Gradually my clunky old S100 system became obsolescent.  One of my friends invited me to see his latest acquirement, intended to replace his old BBC micro.  It was one of the earliest Archimedes A310s.  I was hooked immediately, and had my own with the month.  Not wishing to lose all of the work I had already done on Charm, I resolved to port the compiler to the new machine.  Unfortunately my 8" discs didn't fit in the 3 1/2" floppy drive on the Archimedes, so I made up a serial cable and wrote some simple file transfer software.

I soon managed to transfer the source code across.  Getting a version of the compiler to run on the Archimedes was a little more involved.  I had to rewrite the back-end to emit ARM code rather than 68000 code.  Actually because the ARM chip is a RISC chip, code generation was easier than for the 68000.

Once I had a "cross-compiler" (i.e. a compiler built of 68000 code, but generating ARM code) running on the S100 system, I was able to recompile the Charm source code and produce a RISC OS version, which I then transferred to the Archimedes over the serial link.

The final step was to recompile the Charm source code on the Archimedes to make it "self-compiling" on the new platform.  This process is known as "porting" a compiler, and is important if the compiler is to be made available on a number of different hardware platforms.

Performance


The Charm compiler consists of around 15000 lines of source code spread over 39 modules.  A complete recompilation, assembly and link of the source code takes around 40 seconds on a RISC PC with an ARM3 with a hard disc.  On a reasonably powerful Windows PC running RPCEmu, compilation and linking takes around 4 seconds and linking around 1 second. This equates to an average build speed (including assembly and linking) of around 1500 lines per second, making Charm one of the fastest set of tools around.

Speed is achieved by a number of techniques, including :-

·     single pass compilation
·     use of hashing for searching symbol tables
·     coding of run time library support routines in assembly language
·     use of caching on file operations
·     analysis and optimisation of the most frequently called routines using a profiler
·     coding of some small key routines directly in assembly language
·     non mandatory intermediate assembly language file

Note that all of the Charm utilities supplied in the download are now written in Charm, including the compiler, assembler, linker, editor and desktop shell.

Make a Free Website with Yola.