cprover
|
Goto programs are the intermediate representation of the CPROVER tool chain. They are language independent and similar to many of the compiler intermediate languages. Section goto-programs describes the goto_programt
and goto_functionst
data structures in detail. However it useful to understand some of the basic concepts. Each function is a list of instructions, each of which has a type (one of 18 kinds of instruction), a code expression, a guard expression and potentially some targets for the next instruction. They are not natively in static single-assign (SSA) form. Transitions are nondeterministic (although in practise the guards on the transitions normally cover form a disjoint cover of all possibilities). Local variables have non-deterministic values if they are not initialised. Variables and data within the program is commonly one of three types (parameterised by width): unsignedbv_typet
, signedbv_typet
and floatbv_typet
, see util/std_types.h
for more information. Goto programs can be serialised in a binary (wrapped in ELF headers) format or in XML (see the various _serialization
files).
The cbmc
option –show-goto-programs
is often a good starting point as it outputs goto-programs in a human readable form. However there are a few things to be aware of. Functions have an internal name (for example c::f00
) and a ‘pretty name’ (for example f00
) and which is used depends on whether it is internal or being presented to the user. The main
method is the ‘logical’ main which is not necessarily the main method from the code. In the output NONDET
is use to represent a nondeterministic assignment to a variable. Likewise IF
as a beautified GOTO
instruction where the guard expression is used as the condition. RETURN
instructions may be dropped if they precede an END_FUNCTION
instruction. The comment lines are generated from the locationt
field of the instructiont
structure.
goto-programs/
is one of the few places in the CPROVER codebase that templates are used. The intention is to allow the general architecture of program and functions to be used for other formalisms. At the moment most of the templates have a single instantiation; for example goto_functionst
and goto_function_templatet
and goto_programt
and goto_program_templatet
.
FIXME: This text is partially outdated.
The common starting point for working with goto-programs is the read_goto_binary
function which populates an object of goto_functionst
type. This is defined in goto_functions.h
and is an instantiation of the template goto_functions_templatet
which is contained in goto_functions_template.h
. They are wrappers around a map from strings to goto_programt
’s and iteration macros are provided. Note that goto_function_templatet
(no s
) is defined in the same header as goto_functions_templatet
and is gives the C type for the function and Boolean which indicates whether the body is available (before linking this might not always be true). Also note the slightly counter-intuitive naming; goto_functionst
instances are the top level structure representing the program and contain goto_programt
instances which represent the individual functions. At the time of writing goto_functionst
is the only instantiation of the template goto_functions_templatet
but other could be produced if a different data-structures / kinds of models were needed for functions.
goto_programt
is also an instantiation of a template. In a similar fashion it is goto_program_templatet
and allows the types of the guard and expression used in instructions to be parameterised. Again, this is currently the only use of the template. As such there are only really helper functions in goto_program.h
and thus goto_program_template.h
is probably the key file that describes the representation of (C) functions in the goto-program format. It is reasonably stable and reasonably documented and thus is a good place to start looking at the code.
An instance of goto_program_templatet
is effectively a list of instructions (and inner template called instructiont
). It is important to use the copy and insertion functions that are provided as iterators are used to link instructions to their predecessors and targets and careless manipulation of the list could break these. Likewise there are helper macros for iterating over the instructions in an instance of goto_program_templatet
and the use of these is good style and strongly encouraged.
Individual instructions are instances of type instructiont
. They represent one step in the function. Each has a type, an instance of goto_program_instruction_typet
which denotes what kind of instruction it is. They can be computational (such as ASSIGN
or FUNCTION_CALL
), logical (such as ASSUME
and ASSERT
) or informational (such as LOCATION
and DEAD
). At the time of writing there are 18 possible values for goto_program_instruction_typet
/ kinds of instruction. Instructions also have a guard field (the condition under which it is executed) and a code field (what the instruction does). These may be empty depending on the kind of instruction. In the default instantiations these are of type exprt
and codet
respectively and thus covered by the previous discussion of irept
and its descendents. The next instructions (remembering that transitions are guarded by non-deterministic) are given by the list targets
(with the corresponding list of labels labels
) and the corresponding set of previous instructions is get by incoming_edges
. Finally instructiont
have informational function
and location
fields that indicate where they are in the code.
In the goto-programs directory.
Key classes:
At this stage, CBMC constructs a goto-program from a symbol table. It does not use the parse tree or the source file at all for this step. This may seem surprising, because the symbols are stored in a hash table and therefore have no intrinsic order; nevertheless, every symbolt is associated with a source_locationt, allowing CBMC to figure out the lexical order.
The structure of what is informally called a goto-program follows. The entire target program is converted to a single goto_functionst object. The goto functions contains a set of goto_programt objects; each of these correspond to a "function" or "method" in the target program. Each goto_program contains a list of instructions; each instruction contains an associated statement—these are subtypes of codet. Each instruction also contains a "target", which will be empty for now.
This is not the final form of the goto-functions, since the lists of instructions will be 'normalized' in the next step (Instrumentation), which removes some instructions and adds targets to others.
Note that goto_programt and goto_functionst are each template instantiations; they are currently the only specialization of goto_program_templatet and goto_functions_templatet, respectively. This means that the generated Doxygen documentation can be somewhat obtuse about the actual types of things, and is unable to generate links to the correct classes. Note that the code member of a goto_programt's instruction has type codet (its type in the goto_program_templatet documentation is given as "codeT", as this is the name of the template's type parameter); similarly, the type of a guard of an instruction is guardt.
In the goto-programs directory.
Key classes:
This stage applies several transformations to the goto-programs from the previous stage:
code_ifthenelset is turned into GOTOs. In particular, the bodies of the conditionals are flattened into lists of instructions, inline with the rest of the instruction in the goto_programt. The guard of the conditional is placed onto the guard member of an instruction whose code member is of type code_gotot. The targets member of that instruction points to the appropriate branch of the conditional. (Note that although instructions have a list of targets, in practice an instruction should only ever have at most one target; you should check this invariant with an assertion if you rely on it).
The order of instructions in a list of instructions—as well as the targets of GOTOs—are both displayed as arrows when viewing a goto-program as a Graphviz DOT file with goto-instrument --dot
. The semantics of a goto-program is: the next instruction is the next instruction in the list, unless the current instruction has a target; in that case, check the guard of the instruction, and jump to the target if the guard evaluates to true.
This stage concludes the analysis-independent program transformations.