Internship Proposal

Supervisor: Éric Tanter

Level: L3

Keywords: Virtual Machines, Just-in-Time Compilation, Analysis

Requirements: Excellent programming skills. Knowledge of Python, compilers, interpreters, and virtual machines

Abstract:

When designing a new programming language, the question of how much work should go in the implementation is a recurrent one. Defining the language with an interpreter is nicely descriptive, but usually extremely slow. Designing a full compiler can be very efficient but is complex and time-consuming. Another approach is to build a virtual machine. This requires designing an intermediate (bytecode) language, compiling the source to this intermediate code, and then have the virtual machine run the bytecode, with just-in-time compilation. The advantages of virtual machines are manyfold, and most modern languages choose this route. However, building a virtual machine from scratch is not an easy task either. Reusing an existing one (like the Java VM) is preferable, but only if the VM matches the semantics of the intended language; strong semantic mismatches are a killer.

Recently, the PyPy project has developed a new language, called RPython, a statically-typed restricted subset of Python, well-suited to write VMs. RPython allows meaningful whole-program static analyses to be performed, provide a good garbage collector, and useful datatypes. RPython also integrates well with C code. But most of all, it can provide a custom just-in-time compiler for the language, for free (almost). By supplying annotations, the VM developer can help RPython generate a tracing JIT with important performance benefits.

So far, a few experiments have been made on RPython: building VMs for Python itself, for Prolog, and for Converge. The results are impressive, and many start to believe that this approach is going to revolutionize language design and implementation.

The goal of this internship is to study the RPython approach to building and optimizing VMs, starting with various textbook-like programming languages of increasing complexity. Each time the language is extended, the corresponding cost to update the VM while preserving optimizations will be assessed. The expected outcome is a set of small-to-medium virtual machines built with RPython with their evaluation, both in terms of performance and programming complexity. The material thus generated will be of great help to anyone interested in adopting this approach to experiment with new languages and build their own VMs.

References