cprover
goto-symex/README.md
Go to the documentation of this file.
1 \ingroup module_hidden
2 \defgroup goto-symex goto-symex
3 
4 # Folder goto-symex
5 
6 \author Kareem Khazem, Martin Brain
7 
8 This directory contains a symbolic evaluation system for goto-programs.
9 This takes a goto-program and translates it to an equation system by
10 traversing the program, branching and merging and unwinding loops as
11 needed. Each reverse goto has a separate counter (the actual counting is
12 handled by `cbmc`, see the `–unwind` and `–unwind-set` options). When a
13 counter limit is reach, an assertion can be added to explicitly show
14 when analysis is incomplete. The symbolic execution includes constant
15 folding so loops that have a constant number of iterations will be
16 handled completely (assuming the unwinding limit is sufficient).
17 
18 The output of the symbolic execution is a system of equations; an object
19 containing a list of `symex_target_elements`, each of which are
20 equalities between `expr` expressions. See `symex_target_equation.h`.
21 The output is in static, single assignment (SSA) form, which is *not*
22 the case for goto-programs.
23 
24 \section symbolic-execution Symbolic Execution
25 
26 In the \ref goto-symex directory.
27 
28 **Key classes:**
29 * goto_symex_statet
30 * goto_symext
31 
32 \dot
33 digraph G {
34  node [shape=box];
35  rankdir="LR";
36  1 [shape=none, label=""];
37  2 [label="goto conversion"];
38  3 [shape=none, label=""];
39  1 -> 2 [label="goto-programs, goto-functions, symbol table"];
40  2 -> 3 [label="equations"];
41 }
42 \enddot
43 
44 \subsection symex-overview Overview
45 
46 The \ref bmct class gets a reference to an \ref symex_target_equationt
47 "equation" (initially, an empty list of \ref symex_target_equationt::SSA_stept
48 "single-static assignment steps") and a goto-program from the frontend.
49 The \ref bmct creates a goto_symext to symbolically execute the
50 goto-program, thereby filling the equation, which can then be passed
51 along to the SAT solver.
52 
53 The symbolic execution state is maintained by goto_symex_statet. This is
54 a memory-heavy data structure, so goto_symext creates it on-demand and
55 lets it go out-of-scope as soon as possible. However, the process of
56 symbolic execution fills out an additional \ref symbol_tablet
57 "symbol table" of dynamically-created objects; this symbol table is
58 needed when solving the equation. This symbol table must thus be
59 exported out of the state before it is torn down; this is done through
60 the parameter "`new_symbol_table`" passed as a reference to the various
61 functions in goto_symext.
62 
63 The main symbolic execution loop code is goto_symext::symex_step(). This
64 function case-switches over the type of the instruction that we're
65 currently executing, and calls various other functions appropriate to
66 the instruction type, i.e. goto_symext::symex_function_call() if the
67 current instruction is a function call, goto_symext::symex_goto() if the
68 current instruction is a goto, etc.
69 
70 \subsection symex-path Path exploration
71 
72 By default, CBMC symbolically executes the entire program, building up
73 an equation representing all instructions, which it then passes to the
74 solver. Notably, it _merges_ paths that diverge due to a goto
75 instruction, forming a constraint that represents having taken either of
76 the two paths; the code for doing this is goto_symext::merge_goto().
77 
78 CBMC can operate in a different mode when the `--paths` flag is passed
79 on the command line. This disables path merging; instead, CBMC executes
80 a single path at a time, calling the solver with the equation
81 representing that path, then continues to execute other paths.
82 The 'other paths' that would have been taken when the program branches
83 are saved onto a workqueue so that CBMC can continue to execute the
84 current path, and then later retrieve saved paths from the workqueue.
85 Implementation-wise, \ref bmct passes a worklist to goto_symext in
86 bmct::do_language_agnostic_bmc(). If path exploration is enabled,
87 goto_symext will fill up the worklist whenever it encounters a branch,
88 instead of merging the paths on the branch. After the initial symbolic
89 execution (i.e. the first call to bmct::run() in
90 bmct::do_language_agnostic_bmc()), \ref bmct continues popping the
91 worklist and executing untaken paths until the worklist empties. Note
92 that this means that the default model-checking behaviour is a special
93 case of path exploration: when model-checking, the initial symbolic
94 execution run does not add any paths to the workqueue but rather merges
95 all the paths together, so the additional path-exploration loop is
96 skipped over.
97 
98 \subsection ssa-renaming SSA renaming levels
99 
100 In goto-programs, variable names get a prefix to indicate their scope
101 (like `main::1::%foo` or whatever). At symbolic execution level, variables
102 also get a _suffix_ because we’re doing single-static assignment. There
103 are three “levels” of renaming. At Level 2, variables are renamed every
104 time they are encountered in a function. At Level 1, variables are
105 renamed every time the functions that contain those variables are
106 called. At Level 0, variables are renamed every time a new thread
107 containing those variables are spawned. We can inspect the renamed
108 variable names with the –show-vcc flag, which prints a string with the
109 following format: `%%s!%%d@%%d#%%d`. The string field is the variable name,
110 and the numbers after the !, @, and # are the L0, L1, and L2 suffixes
111 respectively. The following examples illustrate Level 1 and 2 renaming:
112 
113  $ cat l1.c
114  int main() {
115  int x=7;
116  x=8;
117  assert(0);
118  }
119  $ cbmc --show-vcc l1.c
120  ...
121  {-12} x!0@1#2 == 7
122  {-13} x!0@1#3 == 8
123 
124 That is, the L0 names for both xs are 0; the L1 names for both xs are 1;
125 but each occurrence of x within main() gets a fresh L2 suffix (2 and 3,
126 respectively).
127 
128  $ cat l2.c
129  void foo(){
130  int x=7;
131  x=8;
132  x=9;
133  }
134  int main(){
135  foo();
136  foo();
137  assert(0);
138  }
139  $ cbmc --show-vcc l2.c
140  ...
141  {-12} x!0@1#2 == 7
142  {-13} x!0@1#3 == 8
143  {-14} x!0@1#4 == 9
144  {-15} x!0@2#2 == 7
145  {-16} x!0@2#3 == 8
146  {-17} x!0@2#4 == 9
147 
148 That is, every time we enter function foo, the L1 counter of x is
149 incremented (from 1 to 2) and the L0 counter is reset (back to 2, after
150 having been incremented up to 4). The L0 counter then increases every
151 time we access x as we walk through the function.
152 
153 ---
154 \section counter-example-production Counter Example Production
155 
156 In the \ref goto-symex directory.
157 
158 **Key classes:**
159 * symex_target_equationt
160 * prop_convt
161 * \ref bmct
162 * fault_localizationt
163 * counterexample_beautificationt
164 
165 \dot
166 digraph G {
167  node [shape=box];
168  rankdir="LR";
169  1 [shape=none, label=""];
170  2 [label="goto conversion"];
171  3 [shape=none, label=""];
172  1 -> 2 [label="solutions"];
173  2 -> 3 [label="counter-examples"];
174 }
175 \enddot