Affiliate Disclosure
If you buy through our links, we may get a commission. Read our ethics policy.

Apple's other open secret: the LLVM Compiler

SproutCore, profiled earlier this week, isn't the only big news spill out from the top secret WWDC conference due to Apple's embrace of open source sharing. Another future technology featured by the Mac maker last week was LLVM, the Low Level Virtual Machine compiler infrastructure project.

Like SproutCore, LLVM is neither new nor secret, but both have been hiding from attention due to a thick layer of complexity that has obscured their future potential.

Looking for LLVM at WWDC

Again, the trail of breadcrumbs for LLVM starts in the public WWDC schedule. On Tuesday, the session "New Compiler Technology and Future Directions" detailed the following synopsis:

"Xcode 3.1 introduces two new compilers for Mac OS X: GCC 4.2 and LLVM-GCC. Learn how the new security and performance improvements in GCC 4.2 can help you produce better applications. Understand the innovations in LLVM-GCC, and find out how you can use it in your own testing and development. Finally, get a preview of future compiler developments."

There's a lot of unpronounceable words in all capital letters in that paragraph, LOLBBQ. Let's pull a few out and define them until the synopsis starts making sense.

Introducing GCC

The first acronym in our alphabet soup is GCC, originally the GNU C Compiler. This project began in the mid 80s by Richard Stallman of the Free Software Foundation. Stallman's radical idea was to develop software that would be shared rather than sold, with the intent of delivering code that anyone could use provided that anything they contribute to it would be passed along in a form others could also use.

Stallman was working to develop a free version of AT&T's Unix, which had already become the standard operating system in academia. He started at the core: in order to develop anything in the C language, one would need a C compiler to convert that high level, portable C source code into machine language object code suited to run on a particular processor architecture.

GCC has progressed through a series of advancements over the years to become the standard compiler for GNU Linux, BSD Unix, Mac OS X, and a variety of embedded operating systems. GCC supports a wide variety of processor architecture targets and high level language sources.

Apple uses a specialized version of GCC 4.0 and 4.2 in Leopard's Xcode 3.1 that supports compiling Objective-C/C/C++ code to both PowerPC and Intel targets on the desktop and uses GCC 4.0 to target ARM development on the iPhone.

The Compiler

A compiler refers to the portion of the development toolchain between source code building and debugging and deployment. The first phase of compiling is the Front End Parser, which performs initial language-specific syntax and semantic analysis on source code to create an internal representation of the program.

Code is then passed through an Optimizer phase which improves it by doing things like deleting any code redundancies or dead code that doesn't need to exist in the final version.

The Code Generator phase then takes the optimized code and maps it to the output processor, resulting in assembly language code which is no longer human readable.

The Assembler phase converts assembly language code into object code that can be interpreted by a hardware processor or a software virtual machine.

The final phase is the Linker, which combines object code with any necessary library code to create the final executable.

Introducing LLVM

GCC currently handles all those phases for compiling code within Xcode, Apple's Mac OS X IDE (Integrated Development Environment). However, there are some drawbacks to using GCC.

One is that it is delivered under the GPL, which means Apple can't integrate it directly into Xcode without making its IDE GPL as well. Apple prefers BSD/MIT style open source licensees, where there is no limitation upon extending open projects as part of larger proprietary products.

Another is that portions of GCC are getting long in the tooth. LLVM is a modern project that has aspired to rethink how compiler parts should work, with emphasis on Just In Time compilation, cross-file optimization (which can link together code from different languages and optimize across file boundaries), and a modular compiler architecture for creating components that have few dependencies on each other while integrating well with existing compiler tools.

LLVM only just got started at the University of Illinois in 2000 as a research project of Chris Lattner. It was released as version 1.0 in 2003. Lattner caught the attention of Apple after posting questions about Objective-C to the company's objc-language mailing list. Apple in turn began contributing to the LLVM project in 2005 and later hired Lattner to fund his work.

Clang and LLVM-GCC

Last year the project released Clang as an Apple led, standalone implementation of the LLVM compiler tools aimed to provide fast compiling with low memory use, expressive diagnostics, a modular library-based architecture, and tight integration within an IDE such as Xcode, all offered under the BSD open source license.

In addition to the pure LLVM Clang project, which uses an early, developmental front end code parser for Objective C/C/C++, Apple also started work on integrating components of LLVM into the existing GCC based on Lattner's LLVM/GCC Integration Proposal. That has resulted in a hybrid system that leverages the mature components of GCC, such as its front end parser, while adding the most valuable components of LLVM, including its modern code optimizers.

That project, known as LLVM-GCC, inserts the optimizer and code generator from LLVM into GCC, providing modern methods for "aggressive loop, standard scalar, and interprocedural optimizations and interprocedural analyses" missing in the standard GCC components.

LLVM-GCC is designed to be highly compatible with GCC so that developers can move to the new compiler and benefit from its code optimizations without making substantial changes to their workflow. Sources report that LLVM-GCC "compiles code that consistently runs 33% faster" than code output from GCC.

Apple also uses LLVM in the OpenGL stack in Leopard, leveraging its virtual machine concept of common IR to emulate OpenGL hardware features on Macs that lack the actual silicon to interpret that code. Code is instead interpreted or JIT on the CPU.

Apple is also using LLVM in iPhone development, as the project's modular architecture makes it easier to add support for other architectures such as ARM, now supported in LLVM 2.0 thanks to work done by Nokia's INdT.

On page 2 of 2: LLVM and Apple's Multicore Future; and Open for Improvement.

LLVM and Apple's Multicore Future

LLVM plays into Apple's ongoing strategies for multicore and multiprocessor parallelism. CPUs are now reaching physical limits that are preventing chips from getting faster simply by driving up the gigahertz. Intel's roadmaps indicate that the company now plans to drive future performance by adding multiple cores. Apple already ships 8-core Macs on the high end, and Intel has plans to boost the number of cores per processor into the double digits.

Taking advantage of those cores is not straightforward. While the classic Mac OS' and Windows' legacy spaghetti code was made faster through a decade of CPUs that rapidly increased their raw clock speeds, future advances will come from producing highly efficient code that can take full advantage of multiple cores.

Existing methods of thread scheduling are tricky to keep in sync across multiple cores, resulting in inefficient use of modern hardware. With features like OpenCL and Grand Central Dispatch, Snow Leopard will be better equipped to manage parallelism across processors and push optimized code to the GPU's cores, as described in WWDC 2008: New in Mac OS X Snow Leopard. However, in order for the OS to efficiently schedule parallel tasks, the code needs to be explicitly optimized for for parallelism by the compiler.

Open for Improvement

LLVM will be a key tool in prepping code for high performance scheduling. As the largest contributor to the LLVM project, Apple is working to push compiler technology ahead along with researchers in academia and industry partners, including supercomputer maker Cray. Apple is also making contributions to GCC to improve its performance and add features.

Because both projects are open source, it's easy to find hints of what the company is up to next. Enhancements to code debugging, compiler speed, the speed of output code, security features related to stopping malicious buffer overflows, and processor specific optimizations will all work together to create better quality code.

That means applications will continue to get faster and developers will have an easier time focusing on the value they can add rather than having their time consumed by outdated compiler technology.

For Apple, investing its own advanced compiler expertise also means that it can hand tune the software that will be running while it also optimizes the specialized processors that will be running it, such as the mobile SoCs Apple will be building with its acquisition of PA Semi, as noted in How Apple’s PA Semi Acquisition Fits Into Its Chip History.

There's more information on The LLVM Compiler Infrastructure Project. Lattner also published a PDF of his presentation of The LLVM Compiler System at the 2007 Bossa Conference.