SourceXR

C/C++ Cross-Reference Tool

Getting Started with Clang

This article describes how to get started with the clang analyzer to extract information about C and C++ source code. Clang belongs to the LLVM suite of compiler and tools and is delivered with an API allowing interaction with each stage of the compilation steps.

Prerequisites

First of all, you need to download and install clang. You can either use the pre-built binaries or download the LLVM and Clang sources tarballs.

The process is described on the Clang installation page and is summarized below (we use LLVM version 3.3):

  1. Download clang and llvm here.
  2. Extract the llvm tar ball somewhere.
  3. Extract clang tarball within the tools subdirectory of llvm and rename the cfe directory to clang.
  4. Make a build directory, e.g. build, and step into this directory.
  5. Call the configure script from this directory: ../llvm-3.3.src/configure --prefix=/usr/local/llvm
  6. make && make install

Now we are ready to implement a clang sample tool whose purpose is to enumerate virtual member functions of a given source file.

Several interfaces are available to implement clang tools. Depending on your requirements you may choose one or another (the C interface, the tool interface, etc.).

For this sample, we are using the libTooling interface of clang: the tool will run outside of any build mechanism, i.e. the tool will be given a source file to analyze.

Implementation

Compilation

LLVM is installed with a helper script that will display command lines option to compile and link the client binary. To compile the following command line can be used:

-fno-rtti $(/usr/local/llvm/bin/llvm-config --cflags)

To link, the required clang libraries are:

-lclangTooling -lclangFrontend -lclangEdit -lclangParse -lclangSema \
-lclangAnalysis

The complete link command is therefore:

$(/usr/local/llvm/bin/llvm-config --ldflags) \
-lclangTooling -lclangFrontend -lclangEdit -lclangParse -lclangSema \
-lclangAnalysis -lclangAST -lclangLex -lclangBasic -lclangDriver \
-lclangSerialization -lclangEdit \
$(/usr/local/llvm/bin/llvm-config --libs)

Source Code

To parse source files, several steps are to be performed: clang builds an AST of the compilation unit and calls an AST consumer while going through this tree. The AST consumer can now performs any action based on the contents of the tree.

From the bottom-up, we first need a compilation database (clang::tooling::FixedCompilationDatabase) which initializes clang from the command line parameters (such as specific compiler options, additional include directories, C++ language customization, etc.). Clang looks for arguments after the -- separator.

    clang::tooling::FixedCompilationDatabase *comp =
        clang::tooling::FixedCompilationDatabase::loadFromCommandLine
        (argc, const_cast<const char **> (argv));
    if (!comp) {
        std::cerr << "Failed to load compilation database\n";
        return 1;
    }

Then, a FrontendActionFactory, that binds a ClangTool to the AST consumer.

class FrontendActionFactory : public clang::tooling::FrontendActionFactory {

public:
    virtual clang::FrontendAction *create () {
        return new FrontendAction;
    }
};

This factory creates a frontend action which calls the semantic parser with our AST consumer:

class FrontendAction : public clang::FrontendAction {

    clang::CompilerInstance *_ci;
    ASTConsumer *_consumer;

public:
    virtual bool usesPreprocessorOnly () const {
        return false;
    }

    clang::ASTConsumer *CreateASTConsumer (clang::CompilerInstance &ci,
                                           clang::StringRef) {
        _ci = &ci;
        _consumer = new ASTConsumer;
        return _consumer;
    }

    virtual void ExecuteAction () {
        clang::Preprocessor &pp (_ci->getPreprocessor ());
        clang::Sema sema (pp, _ci->getASTContext (),
                          *_consumer,
                          clang::TU_Complete, NULL);
        clang::ParseAST (pp, _consumer, _ci->getASTContext ());
    }
};

Finally, the AST consumers initializes a visitor, which is called with all nodes of the tree. This visitor inherits the RecursiveASTVisitor class of clang.

class ASTConsumer : public clang::ASTConsumer {

    Visitor _visitor;

private:
    virtual bool HandleTopLevelDecl (clang::DeclGroupRef D) {
        clang::DeclGroupRef::iterator it (D.begin ());
        const clang::DeclGroupRef::iterator itEnd (D.end ());
        while (it != itEnd) {
            clang::Decl *decl (*it);
            if (!_visitor.TraverseDecl (decl)) {
                return false;
            }
            ++it;
        }
        return true;
    }
};

In the virtual member function called for all classes and struct found in the translation unit, we add the application logic: displaying virtual member functions:

class Visitor : public clang::RecursiveASTVisitor<Visitor>
{

public:
    bool VisitCXXRecordDecl (clang::CXXRecordDecl *decl) {

        const std::string name (decl->getNameAsString ());

        clang::CXXRecordDecl::method_iterator it
            (decl->method_begin ());
        const clang::CXXRecordDecl::method_iterator it_end
            (decl->method_end ());

        while (it != it_end) {
            if (it->isVirtual ()) {
                std::cout << name << ": virtual member: "
                          << it->getNameAsString () << "\n";
            }
            ++it;
        }
        return true;
    }

};

They are many more Visit* calls which are used for all objects found in sources files: classes declarations, functions declarations, members declarations, variables, control flow, casts, literals, etc. All calls do not have to be implemented, it depends on your application. Returning false will abort tree traversal.

If you are interested in preprocessor directives, the virtual members of the clang::PPCallbacks have to be implemented: for example, to be notified of a file inclusion, a macro definition, etc.

And finally, the program entry point which takes the input file as parameter and builds all needed objects:

int main (int argc, char **argv)
{
    if (argc < 2) {
        std::cerr << "Use " << argv[0] << " source_file -- [clang_args]\n";
        return 0;
    }

    const std::string file (argv[1]);

    // CompDB
    clang::tooling::FixedCompilationDatabase *comp =
        clang::tooling::FixedCompilationDatabase::loadFromCommandLine
        (argc, const_cast<const char **> (argv));
    if (!comp) {
        std::cerr << "Failed to load compilation database\n";
        return 1;
    }
    // CompDB

    std::vector<std::string> sources;
    sources.push_back (file);

    clang::tooling::ClangTool tool (*comp, sources);

    FrontendActionFactory act;
    if (tool.run (&act) != 0) {
        return 1;
    }
    return 0;
}

Source File

The source file of the implementation can be found here.

Comments !