Home > Programming, Software management > LLVM (Low Level Virtual Machine) Compiler Infrastructure

LLVM (Low Level Virtual Machine) Compiler Infrastructure

Low Level Virtual Machine LogoThe Low Level Virtual Machine (LLVM) is a compiler and toolchain infrastructure, written in C++, designed for compile-time, link-time, run-time, and “idle-time” optimization of programs written in arbitrary programming languages. Originally implemented for C/C++, LLVM is now used with a variety programming languages such as Python, Ruby and may others. Code in the LLVM project is licensed under the “UIUC” BSD-Style license.

LLVM can be used to replace and/or supplement the GNU tools such as gcc, g++, gdb, etc…

LLVM now consists of a number of different sub-projects including:

  1. The LLVM Core libraries provide a source- and target-independent optimizer, along with code generation support for many popular CPUs. These libraries are built around a well specified code representation known as the LLVM intermediate representation (“LLVM IR”). The LLVM Core libraries are well documented, and it is particularly easy to invent your own language (or port an existing compiler) to use LLVM as an optimizer and code generator.
  2. Clang is a “LLVM native” C/C++/Objective-C/C++ compiler, which aims to deliver amazingly fast compiles (e.g. up to 3x faster than GCC when compiling Objective-C code in a debug configuration), extremely useful error and warning messages and to provide a platform for building source level tools such as the Clang Static Analyzer which automatically finds bugs in your code.
  3. dragonegg and llvm-gcc 4.2 integrate the LLVM optimizers and code generator with respectively the GCC 4.5  and GCC 4.2 parsers. This allows LLVM to compile Ada, Fortran, and other languages supported by the GCC compiler frontends, and provides high-fidelity drop-in compatibility with their respective versions of GCC.
  4. The LLDB project builds on libraries provided by LLVM and Clang to provide a native debugger. It uses the Clang ASTs and expression parser, LLVM JIT, LLVM disassembler, etc. It is also faster and much more memory efficient than GDB at loading symbols.
  5. The libc++ and libc++ ABI projects provide a standard compliant and high-performance implementation of the C++ Standard Library, including full support for C++’0x.
  6. The compiler-rt project provides highly tuned implementations of the low-level code generator support routines like “__fixunsdfdi” and other calls generated when a target doesn’t have a short sequence of native instructions to implement a core IR operation.
  7. The vmkit project is an implementation of the Java and .NET Virtual Machines that is built on LLVM technologies.
  8. The klee project implements a “symbolic virtual machine” which uses a theorem prover to try to evaluate all dynamic paths through a program in an effort to find bugs and to prove properties of functions. A major feature of klee is that it can produce a testcase in the event that it detects a bug.
  9. The SAFECode project is a memory safety compiler for C/C++ programs. It instruments code with run-time checks to detect memory safety errors (e.g., buffer overflows) at run-time. It can be used to protect software from security attacks and can also be used as a memory safety error debugging tool like Valgrind.

LLVM is used by several companies and open source projects such as:

  • Adobe: Optimizer and JIT codegen for the Hydra Language and ActionScript 3 Compiler.
  • Apple: LLVM is used to compile MacOS X (OpenCL and OpenGL compilation)
  • Electronic Arts: Experimental backend for custom language implementation
  • Nvidia: OpenCL runtime compiler (Clang + LLVM)
  • PyPy Project: Python interpreter written in Python. Targets LLVM and C.
  • iPhone tool chain: llvm-gcc Compiler for iPhone Dev Wiki toolchain.
  • Mono: Mono Project has an option to use LLVM for JIT compilation

Linaro will also investigate LLVM and decide whether it will support an LLVM toolchain for ARM in the future. This will be decided in November and if they finally go for it, implementation will start later in 2012.

If you want to try it you can either download the source code or a binary for your operating systems or alternatively you can run the online demo that allows you to compile a C or C++ program and generate x86 binaries, LLCM C++ API code or LLVM assembly. If you are using Ubuntu, llvm should already be installed. But if it is not installed on your distribution you can run:

sudo apt-get install llvm clang

clang will generate more explicit and detailled error messages than gcc. For example:

 $ gcc-4.2 -fsyntax-only -Wformat format-strings.c
  format-strings.c:91: warning: too few arguments for format
 $ clang -fsyntax-only format-strings.c
  format-strings.c:91:13: warning: '.*' specified field precision is missing a matching 'int' argument
   printf("%.*d");

Here’s what LLVM C++ API Code looks like for “hello world!” C program (beginning only):

// Generated by llvm2cpp - DO NOT MODIFY!

#include <llvm/LLVMContext.h>
#include <llvm/Module.h>
#include <llvm/DerivedTypes.h>
#include <llvm/Constants.h>
#include <llvm/GlobalVariable.h>
#include <llvm/Function.h>
#include <llvm/CallingConv.h>
#include <llvm/BasicBlock.h>
#include <llvm/Instructions.h>
#include <llvm/InlineAsm.h>
#include <llvm/Support/FormattedStream.h>
#include <llvm/Support/MathExtras.h>
#include <llvm/Pass.h>
#include <llvm/PassManager.h>
#include <llvm/ADT/SmallVector.h>
#include <llvm/Analysis/Verifier.h>
#include <llvm/Assembly/PrintModulePass.h>
#include <algorithm>
using namespace llvm;

Module* makeLLVMModule();

int main(int argc, char**argv) {
  Module* Mod = makeLLVMModule();
  verifyModule(*Mod, PrintMessageAction);
  PassManager PM;
  PM.add(createPrintModulePass(&outs()));
  PM.run(*Mod);
  return 0;
}

Module* makeLLVMModule() {
 // Module Construction
 Module* mod = new Module("/tmp/webcompile/_23517_0.bc", getGlobalContext());
 mod->setDataLayout("e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64");
 mod->setTargetTriple("x86_64-unknown-linux-gnu");
...

and LLVM Assembly (full code):

; ModuleID = '/tmp/webcompile/_23857_0.bc'
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"
target triple = "x86_64-unknown-linux-gnu"

@str = internal constant [12 x i8] c"Hello World\00"

define i32 @main(i32 %argc, i8** nocapture %argv) nounwind {
  %puts = tail call i32 @puts(i8* getelementptr inbounds ([12 x i8]* @str, i64 0, i64 0))
  ret i32 0
}

declare i32 @puts(i8* nocapture) nounwind

Finally, here’s a small benchmark compiling dropbear-0.53.1 with gcc 4.4.3 and clang 1.1 (llvm 2.7):

GCC 4.4.3:

./configure
time make
real 1m17.518s
user 1m5.012s
sys 0m10.789s

Clang 1.1:

make clean
export CC=clang
./configure
time make
real 1m9.178s
user 0m56.668s
sys 0m13.309s

This shows clang is about 10% faster than gcc to build dropbear.

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter