2 \defgroup ansi-c ansi-c
5 \author Kareem Khazem, Martin Brain
7 \section overview Overview
9 Contains the front-end for ANSI C, plus a variety of common extensions.
10 This parses the file, performs some basic sanity checks (this is one
11 area in which the UI could be improved; patches most welcome) and then
12 produces a goto-program (see below). The parser is a traditional Flex /
15 `internal_addition.c` contains the implementation of various ‘magic’
16 functions that are that allow control of the analysis from the source
17 code level. These include assertions, assumptions, atomic blocks, memory
18 fences and rounding modes.
20 The `library/` subdirectory contains versions of some of the C standard
21 header files that make use of the CPROVER built-in functions. This
22 allows CPROVER programs to be ‘aware’ of the functionality and model it
23 correctly. Examples include `stdio.c`, `string.c`, `setjmp.c` and
24 various threading interfaces.
26 \section preprocessing Preprocessing & Parsing
28 In the \ref ansi-c directory
31 * \ref languaget and its subclasses
38 1 [shape=none, label=""];
39 2 [label="preprocessing & parsing"];
40 3 [shape=none, label=""];
41 1 -> 2 [label="Command line options, file names"];
42 2 -> 3 [label="Parse tree"];
47 \section type-checking Type-checking
49 In the \ref ansi-c directory
52 * \ref languaget and its subclasses
62 1 [shape=none, label=""];
63 2 [label="type checking"];
64 3 [shape=none, label=""];
65 1 -> 2 [label="Parse tree"];
66 2 -> 3 [label="Symbol table"];
70 This stage generates a symbol table, mapping identifiers to symbols;
71 \ref symbolt "symbols" are tuples of (value, type, location, flags).
73 This is a good point to introduce the \ref irept ("internal
74 representation") class---the base type of many of CBMC's hierarchical
75 data structures. In particular, \ref exprt "expressions",
76 \ref typet "types" and \ref codet "statements" are all subtypes of
78 An irep is a tree of ireps. A subtlety is that an irep is actually the
79 root of _three_ (possibly empty) trees, i.e. it has three disjoint sets
80 of children: \ref irept::get_sub() returns a list of children, and
81 \ref irept::get_named_sub() and \ref irept::get_comments() each return an
82 association from names to children. **Most clients never use these
83 functions directly**, as subtypes of irept generally provide more
84 descriptive functions. For example, the operands of an
85 \ref exprt "expression" (\ref exprt::op0() "op0", op1 etc) are
86 really that expression's children; the
87 \ref code_assignt::lhs() "left-hand" and right-hand side of an
88 \ref code_assignt "assignment" are the children of that assignment.
89 The \ref irept::pretty() function provides a descriptive string
90 representation of an irep.
92 \ref irep_idt "irep_idts" ("identifiers") are strings that use sharing
93 to improve memory consumption. A common pattern is a map from irep_idts
94 to ireps. A goto-program contains a single symbol table (with a single
95 scope), meaning that the names of identifiers in the target program are
96 lightly mangled in order to make them globally unique. If there is an
97 identifier `foo` in the target program, the `name` field of `foo`'s
98 \ref symbolt "symbol" in the goto-program will be
99 * `foo` if it is global;
100 * <code>bar\::foo</code> if it is a parameter to a function `bar()`;
101 * <code>bar\::3\::foo</code> if it is a local variable in a function
102 `bar()`, where `3` is a counter that is incremented every time a
103 newly-scoped `foo` is encountered in that function.
105 The use of *sharing* to save memory is a pervasive design decision in
106 the implementation of ireps and identifiers. Sharing makes equality
107 comparisons fast (as there is no need to traverse entire trees), and
108 this is especially important given the large number of map lookups
109 throughout the codebase. More importantly, the use of sharing saves vast
110 amounts of memory, as there is plenty of duplication within the
111 goto-program data structures. For example, every statement, and every
112 sub-expression of a statement, contains a \ref source_locationt
113 that indicates the source file and location that it came from. Every
114 symbol in every expression has a field indicating its type and location;
115 etc. Although each of these are constructed as separate objects, the
116 values that they eventually point to are shared throughout the codebase,
117 decreasing memory consumption dramatically.
119 The Type Checking stage turns a parse tree into a
120 \ref symbol_tablet "symbol table". In this context, the 'symbols'
121 consist of code statements as well as what might more traditionally be
122 called symbols. Thus, for example:
123 * The statement `int foo = 11;` is converted into a symbol whose type is
124 integer_typet and value is the \ref constant_exprt
125 "constant expression" `11`; that symbol is stored in the symbol table
126 using the mangled name of `foo` as the key;
127 * The function definition `void foo(){ int x = 11; bar(); }` is
128 converted into a symbol whose type is \ref code_typet (not to be
129 confused with \ref typet or \ref codet!); the code_typet contains the
130 parameter and return types of the function. The value of the symbol is
131 the function's body (a \ref codet), and the symbol is stored in the
132 symbol table with `foo` as the key.
135 \section performance Parsing performance considerations
137 * Measured on trunk/regression/ansi-c/windows_h_VS_2012/main.i
139 * 13%: Copying into i_preprocessed
141 * 5%: ansi_c_parser.read()
145 * 29%: parser (without typechecking)
147 \section references Compiler References
149 CodeWarrior C Compilers Reference 3.2:
151 http://cache.freescale.com/files/soft_dev_tools/doc/ref_manual/CCOMPILERRM.pdf
153 http://cache.freescale.com/files/soft_dev_tools/doc/ref_manual/ASMX86RM.pdf
155 ARM 4.1 Compiler Reference:
157 http://infocenter.arm.com/help/topic/com.arm.doc.dui0491c/DUI0491C_arm_compiler_reference.pdf