SourceXR

C/C++ Cross-Reference Tool

List Redundant Include Files with Clang

In this article, we show how to implement a small clang tool to list redundant include files of a translation unit.

This sample is based on our getting started article but in this sample, we will use the preprocessor only: we do not build the AST of the compilation unit.

If need be, follow instructions in the previous article to install clang and compile the sample.

Algorithm Steps

To detect and list redundant include files, we first use clang preprocessor to parse the source file. The #include preprocessor directive is used to build a graph of includes files. Next, we go through the graph and build its transitive reduction: removed edges from the original graph give us the list of redundant include files.

Preprocessor Callbacks

To get notified of preprocessor events i.e. whenever a preprocessor directive is encountered by clang, the clang::PPCallbacks interface is to be implemented. This class is documented here.

In this sample, we are only interested in the #include directive, therefore we only implement the InclusionDirective member function. There exists members for every preprocessor directive: #if, #pragma, etc.

Preprocessing A Source File

To preprocess a source file, we build a frontend action in which we merely call the preprocessor available through the compiler instance (_ci is given in parameter of the frontend action). We add our implementation of the clang::PPCallbacks class. We process all tokens until end of file is reached.

The following excerpt displays this step:

    virtual void ExecuteAction () {
        clang::SourceManager &sm (_ci->getSourceManager ());
        PPCallbacks *ppc = new PPCallbacks (sm, _inc);
        clang::Preprocessor &pp (_ci->getPreprocessor ());
        pp.addPPCallbacks (ppc);

        clang::Token token;
        pp.EnterMainSourceFile ();

        do {
            pp.Lex (token);
        }
        while (token.isNot (clang::tok::eof));
    }

InclusionDirective Implementation

This call allows us to know which file is to be included, even if include guards prevent the inclusion. Thus we are able to build a graph of all files included by a compilation unit.

We convert all file paths to their absolute form and store the relationship between the included file and the source file of the include location.

We store this relation in an adjacency list.

    virtual void InclusionDirective (clang::SourceLocation HashLoc,
                                     const clang::Token &,
                                     clang::StringRef,
                                     bool,
                                     clang::CharSourceRange,
                                     const clang::FileEntry *File,
                                     clang::StringRef,
                                     clang::StringRef,
                                     const clang::Module *) {
        // Convert to absolute path
        const char *path = realpath (_sm.getBufferName (HashLoc), NULL);
        if (path == NULL) {
            std::cerr << "null path " << _sm.getBufferName (HashLoc)
                      << ": " << strerror (errno)
                      << "\n";
        }
        if (File != NULL) {
            const char *fpath = realpath (File->getName (), NULL);
            if (fpath == NULL) {
                std::cerr << "null fpath " << File->getName ()
                          << ": " << strerror (errno)
                          << "\n";
            }

            if ((fpath != NULL)
                && (path != NULL)) {
                // Store the relationship
                _inc.addInclude (fpath, path);
            }
        }
    }

Ajdacency List Holder

To ease further processing, we use an index for each included file. The relationship is then simply stored in a map of indices whose value is the list of files (indices) included by the file.

Transitive Reduction

Once the file is parsed, we now have a graph of include files. This graph is possibly cyclic.

To detect files already included, we build its transitive reduction.

The following picture shows what a transitive reduction is. Vertices are source and header files and edges represent the inclusion directive.

Sample inclusion graph:

/images/transitive_reduction.svg

After transitive reduction:

/images/transitive_reduction_2.svg

We use the algorithm described in Harry Hsu. "An algorithm for finding a minimal equivalent graph of a digraph.", Journal of the ACM, 22(1):11-16, January 1975.

This algorithm works on a path matrix representation of a graph. Therefore, we first build it.

Path Matrix

The path matrix of a graph is a matrix whose values are boolean. The elements of the matrix are vertices of the graph. A given element (i, j) is true if there exists a path from vertex i to vertex j (with possibly more than one edge).

To build the path matrix of the graph, we go through the adjacency list recursively.

Transitive Reduction

The algorithm is the following (N is the matrix size, and pm the path matrix):

for (int j = 0; j < N; ++j) {
    for (int i = 0; i < N; ++i) {
         if (pm[i][j]) {
             for (int k = 0; k < N; ++k) {
                 if (pm[i][k]) {
                     // file (i) included by file (k) is redundant
                 }
             }
         }
    }
}

Source File

The source file of the implementation can be found here.

Comments !