Tutorial
HERE YOU WILL LEARN HOW TO USE XOGASTAN
How to install XOGastan
This tutorial supposes that the variable $XOGASTANROOT refers to the base path of XOGASTAN package. You must set this variable in your environment.
To install XOGastan API you must unpackage the tarball file. After you must go into directory $XOGASTANROOT/src and edit by hand the file Makefile.inc. This file contains some "make" variables used by XOGastan during compilation.
The most important variables to modify are:
XERCES-C Variables
- XERCESC_INCLUDES Path for include files of Xerces-C.
- XERCESC_DEFINES Some special define for Xerces-C. Normally it is a blank variable.
- XERCESC_FLAGS Some special flags for Xerces-C. Normally it is a blank variable.
- XERCESC_LIB_PATH Path where the Xerces-C library files are.
- XERCESC_LIB_NAME Name of Xerces-C library in your system (without "lib" prefix).
XOGastan Variables
- XOGASTAN_INCLUDES Path for include files of XOGastan. Normally it is $(XOGASTANROOT)/include.
- XOGASTAN_DEFINES Some special define for XOGastan.
- XOGASTAN_FLAGS Some special flags for XOGastan. Normally it is a blank variable, but you can use it to add debug flag (-G).
- XOGASTAN_LIB_PATH Path for library files of XOGastan.
OTHER Variables
- PLATFORM The operating system of your machine. In this version is supported only the LINUX operating system.
- PREFIX_LIB Where you want to install the dynamic library files. Default value is /usr/local/lib.
- PREFIX_INCLUDE Where you want to install the include files. Default value is /usr/local/include.
Now, to install XOGastan API you must type the following commands:
- [lucas75it@beren]$ cd $XOGASTANROOT/src
- [lucas75it@beren]$ make lib
- [lucas75it@beren]$ make install
If you want to uninstall the API you must type:
- [lucas75it@beren]$ cd $XOGASTANROOT/src
- [lucas75it@beren]$ make uninstall
If you want to clear only the object files *.o but don't uninstall the API you must write:
- [lucas75it@beren]$ cd $XOGASTANROOT/src
- [lucas75it@beren]$ make clean
If you have doxygen and graphviz installed on your systemm you can build on line documentation by typing:
- [lucas75it@beren]$ cd $XOGASTANROOT/src
- [lucas75it@beren]$ make doc
Before this you could modify by hand the file $XOGastanROOT/doc/doxygen.conf
Go to the top
How to translate a GCC Dumped AST
People who can use XOgastan are the developers and not generic computer users. More precisely, XOgastan can be useful to the C programmers.
Let us suppose, for example, that Aragorn, a friend of us, is a C programmer. He writes a big application like hello.c and he compiles it.
The application runs, but it is very complex; there are: a lots of functions, a lots of typedefs, structs, and so on.
Donald Duck can't remember all by heart, and he will have problems when, after some time, he wants to make some changes in its code.
Now, Donald Duck has not written down any documentation about hello.c. What can he do? He can use XOgastan!
He must follow these steps:
1. Compile hello.c by using gcc options -fdump-translation-unit:
- [lucas75it@home]# gcc -fdump-translation-unit hello.c
2. Translate the output file using the perl tool gcc2gxl:
- [lucas75it@home]# gcc2gxl -ifile hello.c.tu
3. Write their own analyzer (by using XOGastan) and run it:
- [lucas75it@home]# my-XOGastan-analyzer hello.c.tu.gxl
4. Open a beer and read the program's output!
Go to the top
Understanding XOGastan internal AST
XOGastan implements an AST that is similar to the Tree of Gnu Compiler Collection.
The GCC Tree is documented into "GCC Manual Internal" and if you want to know more about the relationship between nodes, read it.
XOGastan is written in C++. We defined a hierarchy of classes to represent an AST and this has been called NAST (New Abstract Syntax Tree).
In other words, we create an object oriented AST.
This AST is not the same as the AST of gcc: some parts of the gcc AST are similar to the NAST (in the following we use AST to refer to gcc ast, and NAST to refer to XOGastan AST), but some other parts of the AST are not present into NAST.
Let us take a look! NAST is a three levels hierarchy of nodes:
- first level - the father, the class TreeNode.
- second level - the different set of concepts: declarations, types, constants, expressions, statements, other different concepts.
- third level - the elements of a program: variable declaration, integer type, real constant, call expression, for statement, identifier, and son on .
The figure below describe the difference between three level hierarchy.
First level implements the basic structure of a node: identifier, code, edges. It also implements the method for manage this structure. The second and third levels add more specialized informations.
The full list of NAST classes is:
- Declarations:
- Types:
- Constants:
- Expressions:
- Statements:
- Others:
Click here to see the NAST hierarchy.
When XOGastan reads a node from the gxl file, it parses the code of the node and it creates an object of a leaf class. For example:
- if XOGastan reads a var_decl node then it creates an object var_TreeDecl
- if XOGastan reads a integer_type node then it creates an object integer_TreeType
- if XOGastan reads a real_cst node then it creates an object real_TreeConst
- if XOGastan reads a call_expr node then it creates an object othr_TreeExpr
- if XOGastan reads a for_stmt node then it creates an object for_TreeStmt
- if XOGastan reads a identifier_node node then it creates an object TreeIdentifier
You should note three important points. The code of gcc node differs from the name of the NAST classes only for the final chars at the end of the word (see var_decl and var_TreeDecl).
The correspondence between the AST nodes and the NAST classes is not always one-to-one. A var_decl AST-node has only the correspondent var_TreeDecl NAST-class, but the AST-nodes plus_expr, mult_expr, le_expr correspond to the NAST-class unary_TreeExpr.
When XOGastan reads some AST-node that does not belong to any of the defined classes then it creates an object of one of the special classes: othr_TreeDecl, othr_TreeType, othr_TreeConst, othr_TreeStmt, othr_TreeExpr, othr_TreeOthr !
Is this mechanism very powerful and extendable? Maybe, in the future we add some class for node decoration!!!
Go to the top
My first program with XOGastan API
To read a GXL AST file you have to: initialize the XOGastan environment, use an XAstReader Object, terminate the XOGastan environment.
The XAstReader class gives you the ability to use some basic methods for reading a GXL AST file:
TreeNode *
XAstReader::read(const InputSource& source); Reads the GXL AST specified how InputSource
TreeNode *
XAstReader::read(const XMLCh* const systemId); Reads the GXL AST specified how XMLCh*
TreeNode *
XAstReader::read(const char* const systemId); Reads the GXL AST specified how char *
Types InputSource and XMLCh are defined in Xerces-C API.
The return value of the XAstReader::read method is a pointer to the root node of the AST.
The following example show how can you read a simple GXL AST file.
#include "XOGastan/System/XOGastanSystem.hpp"
#include "XOGastan/XAstParser/XAstReader.hpp"
#include <iostream>
int main (int argC, char *argV[])
{
// Initialize the XOGastan system
try {
XOGastanSystem::Initialize ();
}
catch (....) {
exit (EXIT_FAILURE);
}
cout << "XOGastanSystem initialized...\n";
//
// Create an XAstReader object, and define a pointer to
// AST root node
// Read Gxl file and build the AST
//
XAstReader *reader = new XAstReader;
TreeNode *root;
try {
cout << "Start parsing ...\n";
root = reader->read ("helloworld.c.tu.gxl");
cout << "... end parsing\n";
}
catch (XOGastanError &toCatch)
{
XOGastanSystem::Terminate ();
toCatch.printMessages ();
exit (EXIT_FAILURE);
}
catch (...) {
delete reader;
XOGastanSystem::Terminate ();
cerr << "Error:: Unknown event force the program termination.";
exit (EXIT_FAILURE);
}
delete reader;
// And call the termination method
XOGastanSystem::Terminate ();
cout << "XOGastanSystem finalized...\n";
return EXIT_SUCCESS;
}
Go to the top
Basic methods for node management
Each node has its own identifier and its own code. You can retrieve them using the methods:
long TreeNode::getId(); Get the identifier (long) of the TreeNode
NodeCode TreeNode::getCode(); Get the code (NodeCode) of the TreeNode
Here NodeCode is an enumeration of legal node's codes. The full list of enumeration's value NodeCode is placed in the file $XOGastanROOT/include/XOGastan/Tree/NodeCode.def
The TreeNode class defines the abstraction of edges.
An edge is a couple of information: the edge's code and the link to the final node pointed by the.
We have an enumeration EdgeCode of the legal edge's codes.
The full list of enumeration's value EdgeCode is placed in the file $XOGastanROOT/include/XOGastan/Tree/EdgeCode.def
You can browse the edge of a TreeNode (and its derived classes) using two pattern:
- direct pattern: using the code of the edge
- list pattern: using a simple list interface
For direct pattern you can use the method:
TreeNode *TreeNode::edgeName(EdgeCode ec);
For list pattern you can use the methods:
bool TreeNode::hasEdges(void); Checks if the node has 1 edges or more
TreeNode *TreeNode::firstEdge(void); Returns the TreeNode * pointed by the first edge
TreeNode *TreeNode::hasNextEdge(); Checks if there is an edge after the last visited
TreeNode *TreeNode::nextEdge(); Returns the TreeNode * pointed by the next edge after the last visited
EdgeCode TreeNode::currentEdgeCode(); Returns the code of the last edge visited
If you want to visit all edges of a node you can write code like this:
TreeNode node;
if (node.hasEdges()) {
TreeNode *toNode;
toNode = node.firstEdge();
makeSomethingWith(toNode);
while (node.hasNextEdge()) {
toNode = node.nextEdge();
makeSomethingWith(toNode);
}
}
You can use a static class called a2b that implements some useful function for conversion between enumerations and strings. The basic methods for conversion are:
string a2b::NodeCode2string(NodeCode nc); Conversion from Node Code to string
string a2b::EdgeCode2string(EdgeCode ec); Conversion from Edge Code to string
NodeCode a2b::string2NodeCode(string snc); Conversion from string to Node Code
EdgeCode a2b::string2EdgeCode(string sec); Conversion from string to Edge Code
Using these methods you can write code for dumping information:
TreeNode node;
if (node.hasEdges()) {
TreeNode *toNode;
cout << "Node " << a2b::NodeCode2string(node.getCode()) << " has:\n";
toNode = node.firstEdge();
cout << "\tan edge labeled \"" <<
a2b::EdgeCode2string(node.currentEdgeName()) <<
"\" to node " << a2b::NodeCode2string(toNode->getCode()) << "\n";
while (node.hasNextEdge()) {
toNode = node.nextEdge();
cout << "\tan edge labeled \"" <<
a2b::EdgeCode2string(node.currentEdgeName()) <<
"\" to node " << a2b::NodeCode2string(toNode->getCode()) << "\n";
}
}
In the last example notes that toNode is a pointer to TreeNode and you must use the -> operator.
Moreover, note that you use always the type TreeNode and you don't use any casting.
The methods that you use are defined in the class TreeNode and are accessible from derived classes.
You must use a dynamic cast if you want to use the specific methods of a derived class.
If you want to revisit the edges of node then you must rewrite the latest pattern.
The method TreeNode::firstEdge() resets the count of the edges visited.
Go to the top
A simple GXL dumper
Now you are ready for a more complex example.
The following example reads a GXL AST file and after dumps a new GXL file.
The file dumped contains all nodes read and the ordered list of the out coming edges.
Nodes are ordered using their identifier, and for each node it only dumps the type of the node.
The edges are ordered using the string crescent versus of the name.
Click here to get the reference code
In the example above there are some new features.
You use a new reader method:
list *
XAstReader::readlist(const char * source) Read the GXL AST specified how char *, and return the full list of the TreeNode built.
This method return the full list of nodes read from GXL file. The list contains a TreeNode * for each node read. You can use the overloaded version:
list *
XAstReader::readlist(const InputSource& const systemId) Read the GXL AST specified how InputSource, and return the full list of the TreeNode built.
list *
XAstReader::readlist(const XMLCh * const systemId) Read the GXL AST specified how char XMLCh *, and return the full list of the TreeNode built.
In this example you use a new class: TreeNodeCmp. This class compare 2 TreeNode TN1 and TN2 by overloading the () operator. Given the 2 object TreeNode TN1 and TN2 you have:
- TN1 < TN2 if and only if TN1::id < TN2::id
- TN1 == TN2 if and only if TN1::id == TN2::id
- TN1 > TN2 if and only if TN1::id > TN2::id
The methods:
bool operator()(TreeNode& TN1, TreeNode& TN2); Compare two TreeNode using a reference access.
bool operator()(TreeNode* TN1, TreeNode* TN2); Compare two TreeNode using a pointer access.
return True if TN1::id < TN2::id, False otherwise. Remember that the type of the attribute TreeNode::id is long.
The instruction:
TreeNodeList->sort(TreeNodeCmp());
sort the list of TreeNode pointer in ascending order from the smallest TreeNode identifier to the biggest TreeNode identifier.
Go to the top
Using XOGastan to analyze source code
The final aim of XOGastan API is code surfing and code analysis. You can write code for analyzing code !
The following example shows how write code for statistical analysis of NAST.
It counts the number of declarations, types, statements, constants, expressions.
Click here to get the reference code
In this example you use a new feature: the TreeUtils static class.
This class defines some useful functions.
The above functions used are for testing if a TreeNode belong to a class of concept. They are:
boolean TreeUtils::isStmt(NodeCode nc); Returns true if the code belongs to Statements class
boolean TreeUtils::isDecl(NodeCode nc); Returns true if the code belongs to Declarations class
boolean TreeUtils::isType(NodeCode nc); Returns true if the code belongs to Types class
boolean TreeUtils::isConst(NodeCode nc); Returns true if the code belongs to Constants class
boolean TreeUtils::isExpr(NodeCode nc); Returns true if the code belongs to Expressions class
boolean TreeUtils::isOthr(NodeCode nc); Returns true if the code belongs to Others class
boolean TreeUtils::isUnaryExpr(NodeCode nc); Returns true if the code belongs to Unary Expressions class
boolean TreeUtils::isBinaryExpr(NodeCode nc); Returns true if the code belongs to Binary Expressions class
boolean TreeUtils::isTernaryExpr(NodeCode nc); Returns true if the code belongs to Ternary Expressions class
boolean TreeUtils::isRefExpr(NodeCode nc); Returns true if the code belongs to Reference Expressions class
If you have read "Gcc Manual Internal" you can write programs to surfs on NAST.
You can retrieve the list of arguments of a function, you can build the Control Flow Graph of a program, and so on ... XOGastan can help you, but you must know what you are doing.
The following example shows how you can dump a Graphviz file for drawing the Statement Block of functions.
Click here to get the reference code
In the above example you navigate the AST using the edge's code.
The edge's codes are similar to they used in GCC. If you know the edge of a node then you can traverse it.
Go to the top
An advanced topic: the visitor pattern
To understand the visitor pattern you could analyze the source code of XOGastan.
The following is an extract of the classes used for constants.
Click here to get the reference code
Each leaf class (third level class) has a public overridden method: virtual void accept (visitor &);
This method accepts in input a reference to a visitor class (or a derived class from visitor).
The code of the method visitor is the same for all methods. It is like:
void
integer_TreeConst::accept (visitor & v)
{
v.visit_integer_TreeConst (this);
}
The accept() method of integer_TreeConst class calls in turn a specified visitor's method.
This visitor's method accepts in input a pointer to the caller object (an instance of integer_TreeConst)
and makes some action.
An extract of the visitor class is the following:
class visitor {
public:
................................
................................
................................
//! The initialization operation
virtual void initialize(void);
//! The finalization operation
virtual void terminate(void);
................................
................................
................................
//************************************
// Used to visit constant nodes
//************************************
virtual void visit_integer_TreeConst(integer_TreeConst *);
virtual void visit_real_TreeConst(real_TreeConst *);
virtual void visit_string_TreeConst(string_TreeConst *);
virtual void visit_complex_TreeConst(complex_TreeConst *);
virtual void visit_ptrmem_TreeConst(ptrmem_TreeConst *);
virtual void visit_othr_TreeConst(othr_TreeConst *);
................................
................................
................................
};
Using visitor-pattern you can write your own visitors to analyze NAST.
For example, you can write visitor to analyze the functions signature, the instruction's flow,
the typedef alias, and so on ...
The following example shows how to write a visitor to get statistics information about constant nodes.
Click here to get the reference code
Go to the top
How to use the included files
The rules to include file are the followings.
- #include "XOGastan/System/XOGastanSystem.hpp"
You must include it for initialize and terminate the XOGastan API System.
- #include "XOGastan/XAstParser/XAstReader.hpp"
You must include it to use the reader methods.
- #include "XOGastan/Utils/a2b.hpp"
You must include it if you to use methods for conversion.
- #include "XOGastan/Tree/TreeUtils.hpp"
You must include it if you to use methods for checking the class of a node.
- #include "XOGastan/Visitor/Visitor.hpp"
You must include it if you to write your own visitor.
Go to the top
|