cprover
ansi-c/README.md
Go to the documentation of this file.
1 \ingroup module_hidden
2 \defgroup ansi-c ansi-c
3 # Folder ansi-c
4 
5 \author Kareem Khazem, Martin Brain
6 
7 \section overview Overview
8 
9 Contains the front-end for ANSI C, plus a variety of common extensions.
10 This parses the file, performs some basic sanity checks (this is one
11 area in which the UI could be improved; patches most welcome) and then
12 produces a goto-program (see below). The parser is a traditional Flex /
13 Bison system.
14 
15 `internal_addition.c` contains the implementation of various ‘magic’
16 functions that are that allow control of the analysis from the source
17 code level. These include assertions, assumptions, atomic blocks, memory
18 fences and rounding modes.
19 
20 The `library/` subdirectory contains versions of some of the C standard
21 header files that make use of the CPROVER built-in functions. This
22 allows CPROVER programs to be ‘aware’ of the functionality and model it
23 correctly. Examples include `stdio.c`, `string.c`, `setjmp.c` and
24 various threading interfaces.
25 
26 \section preprocessing Preprocessing & Parsing
27 
28 In the \ref ansi-c directory
29 
30 **Key classes:**
31 * \ref languaget and its subclasses
32 * ansi_c_parse_treet
33 
34 \dot
35 digraph G {
36  node [shape=box];
37  rankdir="LR";
38  1 [shape=none, label=""];
39  2 [label="preprocessing & parsing"];
40  3 [shape=none, label=""];
41  1 -> 2 [label="Command line options, file names"];
42  2 -> 3 [label="Parse tree"];
43 }
44 \enddot
45 
46 
47 \section type-checking Type-checking
48 
49 In the \ref ansi-c directory
50 
51 **Key classes:**
52 * \ref languaget and its subclasses
53 * \ref irept
54 * \ref irep_idt
55 * \ref symbolt
56 * symbol_tablet
57 
58 \dot
59 digraph G {
60  node [shape=box];
61  rankdir="LR";
62  1 [shape=none, label=""];
63  2 [label="type checking"];
64  3 [shape=none, label=""];
65  1 -> 2 [label="Parse tree"];
66  2 -> 3 [label="Symbol table"];
67 }
68 \enddot
69 
70 This stage generates a symbol table, mapping identifiers to symbols;
71 \ref symbolt "symbols" are tuples of (value, type, location, flags).
72 
73 This is a good point to introduce the \ref irept ("internal
74 representation") class---the base type of many of CBMC's hierarchical
75 data structures. In particular, \ref exprt "expressions",
76 \ref typet "types" and \ref codet "statements" are all subtypes of
77 \ref irept.
78 An irep is a tree of ireps. A subtlety is that an irep is actually the
79 root of _three_ (possibly empty) trees, i.e. it has three disjoint sets
80 of children: \ref irept::get_sub() returns a list of children, and
81 \ref irept::get_named_sub() and \ref irept::get_comments() each return an
82 association from names to children. **Most clients never use these
83 functions directly**, as subtypes of irept generally provide more
84 descriptive functions. For example, the operands of an
85 \ref exprt "expression" (\ref exprt::op0() "op0", op1 etc) are
86 really that expression's children; the
87 \ref code_assignt::lhs() "left-hand" and right-hand side of an
88 \ref code_assignt "assignment" are the children of that assignment.
89 The \ref irept::pretty() function provides a descriptive string
90 representation of an irep.
91 
92 \ref irep_idt "irep_idts" ("identifiers") are strings that use sharing
93 to improve memory consumption. A common pattern is a map from irep_idts
94 to ireps. A goto-program contains a single symbol table (with a single
95 scope), meaning that the names of identifiers in the target program are
96 lightly mangled in order to make them globally unique. If there is an
97 identifier `foo` in the target program, the `name` field of `foo`'s
98 \ref symbolt "symbol" in the goto-program will be
99 * `foo` if it is global;
100 * <code>bar\::foo</code> if it is a parameter to a function `bar()`;
101 * <code>bar\::3\::foo</code> if it is a local variable in a function
102  `bar()`, where `3` is a counter that is incremented every time a
103  newly-scoped `foo` is encountered in that function.
104 
105 The use of *sharing* to save memory is a pervasive design decision in
106 the implementation of ireps and identifiers. Sharing makes equality
107 comparisons fast (as there is no need to traverse entire trees), and
108 this is especially important given the large number of map lookups
109 throughout the codebase. More importantly, the use of sharing saves vast
110 amounts of memory, as there is plenty of duplication within the
111 goto-program data structures. For example, every statement, and every
112 sub-expression of a statement, contains a \ref source_locationt
113 that indicates the source file and location that it came from. Every
114 symbol in every expression has a field indicating its type and location;
115 etc. Although each of these are constructed as separate objects, the
116 values that they eventually point to are shared throughout the codebase,
117 decreasing memory consumption dramatically.
118 
119 The Type Checking stage turns a parse tree into a
120 \ref symbol_tablet "symbol table". In this context, the 'symbols'
121 consist of code statements as well as what might more traditionally be
122 called symbols. Thus, for example:
123 * The statement `int foo = 11;` is converted into a symbol whose type is
124  integer_typet and value is the \ref constant_exprt
125  "constant expression" `11`; that symbol is stored in the symbol table
126  using the mangled name of `foo` as the key;
127 * The function definition `void foo(){ int x = 11; bar(); }` is
128  converted into a symbol whose type is \ref code_typet (not to be
129  confused with \ref typet or \ref codet!); the code_typet contains the
130  parameter and return types of the function. The value of the symbol is
131  the function's body (a \ref codet), and the symbol is stored in the
132  symbol table with `foo` as the key.
133 
134 
135 \section performance Parsing performance considerations
136 
137 * Measured on trunk/regression/ansi-c/windows_h_VS_2012/main.i
138 
139 * 13%: Copying into i_preprocessed
140 
141 * 5%: ansi_c_parser.read()
142 
143 * 53%: yyansi_clex()
144 
145 * 29%: parser (without typechecking)
146 
147 \section references Compiler References
148 
149 CodeWarrior C Compilers Reference 3.2:
150 
151 http://cache.freescale.com/files/soft_dev_tools/doc/ref_manual/CCOMPILERRM.pdf
152 
153 http://cache.freescale.com/files/soft_dev_tools/doc/ref_manual/ASMX86RM.pdf
154 
155 ARM 4.1 Compiler Reference:
156 
157 http://infocenter.arm.com/help/topic/com.arm.doc.dui0491c/DUI0491C_arm_compiler_reference.pdf