123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563 |
- ===============================================================
- Tutorial for building tools using LibTooling and LibASTMatchers
- ===============================================================
- NOTE: this document applies to the original Clang project, not the DirectX
- Compiler. It's made available for informational purposes only.
- This document is intended to show how to build a useful source-to-source
- translation tool based on Clang's `LibTooling <LibTooling.html>`_. It is
- explicitly aimed at people who are new to Clang, so all you should need
- is a working knowledge of C++ and the command line.
- In order to work on the compiler, you need some basic knowledge of the
- abstract syntax tree (AST). To this end, the reader is incouraged to
- skim the :doc:`Introduction to the Clang
- AST <IntroductionToTheClangAST>`
- Step 0: Obtaining Clang
- =======================
- As Clang is part of the LLVM project, you'll need to download LLVM's
- source code first. Both Clang and LLVM are maintained as Subversion
- repositories, but we'll be accessing them through the git mirror. For
- further information, see the `getting started
- guide <http://llvm.org/docs/GettingStarted.html>`_.
- .. code-block:: console
- mkdir ~/clang-llvm && cd ~/clang-llvm
- git clone http://llvm.org/git/llvm.git
- cd llvm/tools
- git clone http://llvm.org/git/clang.git
- cd clang/tools
- git clone http://llvm.org/git/clang-tools-extra.git extra
- Next you need to obtain the CMake build system and Ninja build tool. You
- may already have CMake installed, but current binary versions of CMake
- aren't built with Ninja support.
- .. code-block:: console
- cd ~/clang-llvm
- git clone https://github.com/martine/ninja.git
- cd ninja
- git checkout release
- ./bootstrap.py
- sudo cp ninja /usr/bin/
- cd ~/clang-llvm
- git clone git://cmake.org/stage/cmake.git
- cd cmake
- git checkout next
- ./bootstrap
- make
- sudo make install
- Okay. Now we'll build Clang!
- .. code-block:: console
- cd ~/clang-llvm
- mkdir build && cd build
- cmake -G Ninja ../llvm -DLLVM_BUILD_TESTS=ON # Enable tests; default is off.
- ninja
- ninja check # Test LLVM only.
- ninja clang-test # Test Clang only.
- ninja install
- And we're live.
- All of the tests should pass, though there is a (very) small chance that
- you can catch LLVM and Clang out of sync. Running ``'git svn rebase'``
- in both the llvm and clang directories should fix any problems.
- Finally, we want to set Clang as its own compiler.
- .. code-block:: console
- cd ~/clang-llvm/build
- ccmake ../llvm
- The second command will bring up a GUI for configuring Clang. You need
- to set the entry for ``CMAKE_CXX_COMPILER``. Press ``'t'`` to turn on
- advanced mode. Scroll down to ``CMAKE_CXX_COMPILER``, and set it to
- ``/usr/bin/clang++``, or wherever you installed it. Press ``'c'`` to
- configure, then ``'g'`` to generate CMake's files.
- Finally, run ninja one last time, and you're done.
- Step 1: Create a ClangTool
- ==========================
- Now that we have enough background knowledge, it's time to create the
- simplest productive ClangTool in existence: a syntax checker. While this
- already exists as ``clang-check``, it's important to understand what's
- going on.
- First, we'll need to create a new directory for our tool and tell CMake
- that it exists. As this is not going to be a core clang tool, it will
- live in the ``tools/extra`` repository.
- .. code-block:: console
- cd ~/clang-llvm/llvm/tools/clang
- mkdir tools/extra/loop-convert
- echo 'add_subdirectory(loop-convert)' >> tools/extra/CMakeLists.txt
- vim tools/extra/loop-convert/CMakeLists.txt
- CMakeLists.txt should have the following contents:
- ::
- set(LLVM_LINK_COMPONENTS support)
- set(LLVM_USED_LIBS clangTooling clangBasic clangAST)
- add_clang_executable(loop-convert
- LoopConvert.cpp
- )
- target_link_libraries(loop-convert
- clangTooling
- clangBasic
- clangASTMatchers
- )
- With that done, Ninja will be able to compile our tool. Let's give it
- something to compile! Put the following into
- ``tools/extra/loop-convert/LoopConvert.cpp``. A detailed explanation of
- why the different parts are needed can be found in the `LibTooling
- documentation <LibTooling.html>`_.
- .. code-block:: c++
- // Declares clang::SyntaxOnlyAction.
- #include "clang/Frontend/FrontendActions.h"
- #include "clang/Tooling/CommonOptionsParser.h"
- #include "clang/Tooling/Tooling.h"
- // Declares llvm::cl::extrahelp.
- #include "llvm/Support/CommandLine.h"
- using namespace clang::tooling;
- using namespace llvm;
- // Apply a custom category to all command-line options so that they are the
- // only ones displayed.
- static llvm::cl::OptionCategory MyToolCategory("my-tool options");
- // CommonOptionsParser declares HelpMessage with a description of the common
- // command-line options related to the compilation database and input files.
- // It's nice to have this help message in all tools.
- static cl::extrahelp CommonHelp(CommonOptionsParser::HelpMessage);
- // A help message for this specific tool can be added afterwards.
- static cl::extrahelp MoreHelp("\nMore help text...");
- int main(int argc, const char **argv) {
- CommonOptionsParser OptionsParser(argc, argv, MyToolCategory);
- ClangTool Tool(OptionsParser.getCompilations(),
- OptionsParser.getSourcePathList());
- return Tool.run(newFrontendActionFactory<clang::SyntaxOnlyAction>().get());
- }
- And that's it! You can compile our new tool by running ninja from the
- ``build`` directory.
- .. code-block:: console
- cd ~/clang-llvm/build
- ninja
- You should now be able to run the syntax checker, which is located in
- ``~/clang-llvm/build/bin``, on any source file. Try it!
- .. code-block:: console
- echo "int main() { return 0; }" > test.cpp
- bin/loop-convert test.cpp --
- Note the two dashes after we specify the source file. The additional
- options for the compiler are passed after the dashes rather than loading
- them from a compilation database - there just aren't any options needed
- right now.
- Intermezzo: Learn AST matcher basics
- ====================================
- Clang recently introduced the :doc:`ASTMatcher
- library <LibASTMatchers>` to provide a simple, powerful, and
- concise way to describe specific patterns in the AST. Implemented as a
- DSL powered by macros and templates (see
- `ASTMatchers.h <../doxygen/ASTMatchers_8h_source.html>`_ if you're
- curious), matchers offer the feel of algebraic data types common to
- functional programming languages.
- For example, suppose you wanted to examine only binary operators. There
- is a matcher to do exactly that, conveniently named ``binaryOperator``.
- I'll give you one guess what this matcher does:
- .. code-block:: c++
- binaryOperator(hasOperatorName("+"), hasLHS(integerLiteral(equals(0))))
- Shockingly, it will match against addition expressions whose left hand
- side is exactly the literal 0. It will not match against other forms of
- 0, such as ``'\0'`` or ``NULL``, but it will match against macros that
- expand to 0. The matcher will also not match against calls to the
- overloaded operator ``'+'``, as there is a separate ``operatorCallExpr``
- matcher to handle overloaded operators.
- There are AST matchers to match all the different nodes of the AST,
- narrowing matchers to only match AST nodes fulfilling specific criteria,
- and traversal matchers to get from one kind of AST node to another. For
- a complete list of AST matchers, take a look at the `AST Matcher
- References <LibASTMatchersReference.html>`_
- All matcher that are nouns describe entities in the AST and can be
- bound, so that they can be referred to whenever a match is found. To do
- so, simply call the method ``bind`` on these matchers, e.g.:
- .. code-block:: c++
- variable(hasType(isInteger())).bind("intvar")
- Step 2: Using AST matchers
- ==========================
- Okay, on to using matchers for real. Let's start by defining a matcher
- which will capture all ``for`` statements that define a new variable
- initialized to zero. Let's start with matching all ``for`` loops:
- .. code-block:: c++
- forStmt()
- Next, we want to specify that a single variable is declared in the first
- portion of the loop, so we can extend the matcher to
- .. code-block:: c++
- forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl()))))
- Finally, we can add the condition that the variable is initialized to
- zero.
- .. code-block:: c++
- forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl(
- hasInitializer(integerLiteral(equals(0))))))))
- It is fairly easy to read and understand the matcher definition ("match
- loops whose init portion declares a single variable which is initialized
- to the integer literal 0"), but deciding that every piece is necessary
- is more difficult. Note that this matcher will not match loops whose
- variables are initialized to ``'\0'``, ``0.0``, ``NULL``, or any form of
- zero besides the integer 0.
- The last step is giving the matcher a name and binding the ``ForStmt``
- as we will want to do something with it:
- .. code-block:: c++
- StatementMatcher LoopMatcher =
- forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl(
- hasInitializer(integerLiteral(equals(0)))))))).bind("forLoop");
- Once you have defined your matchers, you will need to add a little more
- scaffolding in order to run them. Matchers are paired with a
- ``MatchCallback`` and registered with a ``MatchFinder`` object, then run
- from a ``ClangTool``. More code!
- Add the following to ``LoopConvert.cpp``:
- .. code-block:: c++
- #include "clang/ASTMatchers/ASTMatchers.h"
- #include "clang/ASTMatchers/ASTMatchFinder.h"
- using namespace clang;
- using namespace clang::ast_matchers;
- StatementMatcher LoopMatcher =
- forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl(
- hasInitializer(integerLiteral(equals(0)))))))).bind("forLoop");
- class LoopPrinter : public MatchFinder::MatchCallback {
- public :
- virtual void run(const MatchFinder::MatchResult &Result) {
- if (const ForStmt *FS = Result.Nodes.getNodeAs<clang::ForStmt>("forLoop"))
- FS->dump();
- }
- };
- And change ``main()`` to:
- .. code-block:: c++
- int main(int argc, const char **argv) {
- CommonOptionsParser OptionsParser(argc, argv, MyToolCategory);
- ClangTool Tool(OptionsParser.getCompilations(),
- OptionsParser.getSourcePathList());
- LoopPrinter Printer;
- MatchFinder Finder;
- Finder.addMatcher(LoopMatcher, &Printer);
- return Tool.run(newFrontendActionFactory(&Finder).get());
- }
- Now, you should be able to recompile and run the code to discover for
- loops. Create a new file with a few examples, and test out our new
- handiwork:
- .. code-block:: console
- cd ~/clang-llvm/llvm/llvm_build/
- ninja loop-convert
- vim ~/test-files/simple-loops.cc
- bin/loop-convert ~/test-files/simple-loops.cc
- Step 3.5: More Complicated Matchers
- ===================================
- Our simple matcher is capable of discovering for loops, but we would
- still need to filter out many more ourselves. We can do a good portion
- of the remaining work with some cleverly chosen matchers, but first we
- need to decide exactly which properties we want to allow.
- How can we characterize for loops over arrays which would be eligible
- for translation to range-based syntax? Range based loops over arrays of
- size ``N`` that:
- - start at index ``0``
- - iterate consecutively
- - end at index ``N-1``
- We already check for (1), so all we need to add is a check to the loop's
- condition to ensure that the loop's index variable is compared against
- ``N`` and another check to ensure that the increment step just
- increments this same variable. The matcher for (2) is straightforward:
- require a pre- or post-increment of the same variable declared in the
- init portion.
- Unfortunately, such a matcher is impossible to write. Matchers contain
- no logic for comparing two arbitrary AST nodes and determining whether
- or not they are equal, so the best we can do is matching more than we
- would like to allow, and punting extra comparisons to the callback.
- In any case, we can start building this sub-matcher. We can require that
- the increment step be a unary increment like this:
- .. code-block:: c++
- hasIncrement(unaryOperator(hasOperatorName("++")))
- Specifying what is incremented introduces another quirk of Clang's AST:
- Usages of variables are represented as ``DeclRefExpr``'s ("declaration
- reference expressions") because they are expressions which refer to
- variable declarations. To find a ``unaryOperator`` that refers to a
- specific declaration, we can simply add a second condition to it:
- .. code-block:: c++
- hasIncrement(unaryOperator(
- hasOperatorName("++"),
- hasUnaryOperand(declRefExpr())))
- Furthermore, we can restrict our matcher to only match if the
- incremented variable is an integer:
- .. code-block:: c++
- hasIncrement(unaryOperator(
- hasOperatorName("++"),
- hasUnaryOperand(declRefExpr(to(varDecl(hasType(isInteger())))))))
- And the last step will be to attach an identifier to this variable, so
- that we can retrieve it in the callback:
- .. code-block:: c++
- hasIncrement(unaryOperator(
- hasOperatorName("++"),
- hasUnaryOperand(declRefExpr(to(
- varDecl(hasType(isInteger())).bind("incrementVariable"))))))
- We can add this code to the definition of ``LoopMatcher`` and make sure
- that our program, outfitted with the new matcher, only prints out loops
- that declare a single variable initialized to zero and have an increment
- step consisting of a unary increment of some variable.
- Now, we just need to add a matcher to check if the condition part of the
- ``for`` loop compares a variable against the size of the array. There is
- only one problem - we don't know which array we're iterating over
- without looking at the body of the loop! We are again restricted to
- approximating the result we want with matchers, filling in the details
- in the callback. So we start with:
- .. code-block:: c++
- hasCondition(binaryOperator(hasOperatorName("<"))
- It makes sense to ensure that the left-hand side is a reference to a
- variable, and that the right-hand side has integer type.
- .. code-block:: c++
- hasCondition(binaryOperator(
- hasOperatorName("<"),
- hasLHS(declRefExpr(to(varDecl(hasType(isInteger()))))),
- hasRHS(expr(hasType(isInteger())))))
- Why? Because it doesn't work. Of the three loops provided in
- ``test-files/simple.cpp``, zero of them have a matching condition. A
- quick look at the AST dump of the first for loop, produced by the
- previous iteration of loop-convert, shows us the answer:
- ::
- (ForStmt 0x173b240
- (DeclStmt 0x173afc8
- 0x173af50 "int i =
- (IntegerLiteral 0x173afa8 'int' 0)")
- <<>>
- (BinaryOperator 0x173b060 '_Bool' '<'
- (ImplicitCastExpr 0x173b030 'int'
- (DeclRefExpr 0x173afe0 'int' lvalue Var 0x173af50 'i' 'int'))
- (ImplicitCastExpr 0x173b048 'int'
- (DeclRefExpr 0x173b008 'const int' lvalue Var 0x170fa80 'N' 'const int')))
- (UnaryOperator 0x173b0b0 'int' lvalue prefix '++'
- (DeclRefExpr 0x173b088 'int' lvalue Var 0x173af50 'i' 'int'))
- (CompoundStatement ...
- We already know that the declaration and increments both match, or this
- loop wouldn't have been dumped. The culprit lies in the implicit cast
- applied to the first operand (i.e. the LHS) of the less-than operator,
- an L-value to R-value conversion applied to the expression referencing
- ``i``. Thankfully, the matcher library offers a solution to this problem
- in the form of ``ignoringParenImpCasts``, which instructs the matcher to
- ignore implicit casts and parentheses before continuing to match.
- Adjusting the condition operator will restore the desired match.
- .. code-block:: c++
- hasCondition(binaryOperator(
- hasOperatorName("<"),
- hasLHS(ignoringParenImpCasts(declRefExpr(
- to(varDecl(hasType(isInteger())))))),
- hasRHS(expr(hasType(isInteger())))))
- After adding binds to the expressions we wished to capture and
- extracting the identifier strings into variables, we have array-step-2
- completed.
- Step 4: Retrieving Matched Nodes
- ================================
- So far, the matcher callback isn't very interesting: it just dumps the
- loop's AST. At some point, we will need to make changes to the input
- source code. Next, we'll work on using the nodes we bound in the
- previous step.
- The ``MatchFinder::run()`` callback takes a
- ``MatchFinder::MatchResult&`` as its parameter. We're most interested in
- its ``Context`` and ``Nodes`` members. Clang uses the ``ASTContext``
- class to represent contextual information about the AST, as the name
- implies, though the most functionally important detail is that several
- operations require an ``ASTContext*`` parameter. More immediately useful
- is the set of matched nodes, and how we retrieve them.
- Since we bind three variables (identified by ConditionVarName,
- InitVarName, and IncrementVarName), we can obtain the matched nodes by
- using the ``getNodeAs()`` member function.
- In ``LoopConvert.cpp`` add
- .. code-block:: c++
- #include "clang/AST/ASTContext.h"
- Change ``LoopMatcher`` to
- .. code-block:: c++
- StatementMatcher LoopMatcher =
- forStmt(hasLoopInit(declStmt(
- hasSingleDecl(varDecl(hasInitializer(integerLiteral(equals(0))))
- .bind("initVarName")))),
- hasIncrement(unaryOperator(
- hasOperatorName("++"),
- hasUnaryOperand(declRefExpr(
- to(varDecl(hasType(isInteger())).bind("incVarName")))))),
- hasCondition(binaryOperator(
- hasOperatorName("<"),
- hasLHS(ignoringParenImpCasts(declRefExpr(
- to(varDecl(hasType(isInteger())).bind("condVarName"))))),
- hasRHS(expr(hasType(isInteger())))))).bind("forLoop");
- And change ``LoopPrinter::run`` to
- .. code-block:: c++
- void LoopPrinter::run(const MatchFinder::MatchResult &Result) {
- ASTContext *Context = Result.Context;
- const ForStmt *FS = Result.Nodes.getStmtAs<ForStmt>("forLoop");
- // We do not want to convert header files!
- if (!FS || !Context->getSourceManager().isFromMainFile(FS->getForLoc()))
- return;
- const VarDecl *IncVar = Result.Nodes.getNodeAs<VarDecl>("incVarName");
- const VarDecl *CondVar = Result.Nodes.getNodeAs<VarDecl>("condVarName");
- const VarDecl *InitVar = Result.Nodes.getNodeAs<VarDecl>("initVarName");
- if (!areSameVariable(IncVar, CondVar) || !areSameVariable(IncVar, InitVar))
- return;
- llvm::outs() << "Potential array-based loop discovered.\n";
- }
- Clang associates a ``VarDecl`` with each variable to represent the variable's
- declaration. Since the "canonical" form of each declaration is unique by
- address, all we need to do is make sure neither ``ValueDecl`` (base class of
- ``VarDecl``) is ``NULL`` and compare the canonical Decls.
- .. code-block:: c++
- static bool areSameVariable(const ValueDecl *First, const ValueDecl *Second) {
- return First && Second &&
- First->getCanonicalDecl() == Second->getCanonicalDecl();
- }
- If execution reaches the end of ``LoopPrinter::run()``, we know that the
- loop shell that looks like
- .. code-block:: c++
- for (int i= 0; i < expr(); ++i) { ... }
- For now, we will just print a message explaining that we found a loop.
- The next section will deal with recursively traversing the AST to
- discover all changes needed.
- As a side note, it's not as trivial to test if two expressions are the same,
- though Clang has already done the hard work for us by providing a way to
- canonicalize expressions:
- .. code-block:: c++
- static bool areSameExpr(ASTContext *Context, const Expr *First,
- const Expr *Second) {
- if (!First || !Second)
- return false;
- llvm::FoldingSetNodeID FirstID, SecondID;
- First->Profile(FirstID, *Context, true);
- Second->Profile(SecondID, *Context, true);
- return FirstID == SecondID;
- }
- This code relies on the comparison between two
- ``llvm::FoldingSetNodeIDs``. As the documentation for
- ``Stmt::Profile()`` indicates, the ``Profile()`` member function builds
- a description of a node in the AST, based on its properties, along with
- those of its children. ``FoldingSetNodeID`` then serves as a hash we can
- use to compare expressions. We will need ``areSameExpr`` later. Before
- you run the new code on the additional loops added to
- test-files/simple.cpp, try to figure out which ones will be considered
- potentially convertible.
|