@@ -30,6 +30,15 @@ used depends on whether it is internal or being presented to the user.
3030The comment lines are generated from the ` source_location ` field
3131of the ` instructiont ` structure.
3232
33+ GOTO programs are commonly produced using the ` initialize_goto_model ` function,
34+ which combines the complete process from command-line arguments specifying
35+ source code files, through parsing and generating a symbol table, to finally
36+ producing GOTO functions.
37+
38+ Alternatively ` lazy_goto_modelt ` can be used, which facilitates producing GOTO
39+ functions on demand. See \ref section-lazy-conversion for more details on this
40+ method.
41+
3342\section goto_data_structures Data Structures
3443
3544A \ref goto_functionst object contains a set of GOTO programs. Note the
@@ -86,23 +95,19 @@ digraph G {
8695At this stage, CBMC constructs a goto-program from a symbol table.
8796Each symbol in the symbol table with function type (that is, the symbol's ` type `
8897field contains a \ref code_typet) will be converted to a corresponding GOTO
89- program.
90-
91- It does not use the parse tree or the source file at all for this step. This
92- may seem surprising, because the symbols are stored in a hash table and
93- therefore have no intrinsic order; nevertheless, every \ref symbolt is
94- associated with a \ref source_locationt, allowing CBMC to figure out the
95- lexical order.
96-
97- The structure of what is informally called a goto-program follows. The
98- entire target program is converted to a single \ref goto_functionst
99- object. The goto functions contains a set of \ref goto_programt objects;
100- each of these correspond to a "function" or "method" in the target
101- program. Each goto_program contains a list of
102- \ref goto_programt::instructiont "instructions"; each
103- instruction contains an associated statement---these are subtypes of
104- \ref codet. Each instruction also contains a "target", which will be
105- empty for now.
98+ program. The parse tree and source file are not used at all for this step.
99+
100+ The conversion happens in two phases:
101+
102+ 1 . Function ` goto_convertt::convert ` turns each ` codet ` in the symbol table
103+ into corresponding GOTO instructions.
104+ 2 . ` goto_convertt::finish_gotos ` and others populate the ` GOTO ` and ` CATCH `
105+ instructions' ` targets ` members, pointing to their possible successors.
106+ ` DEAD ` instructions are also added when GOTO instructions branch out of one
107+ or more lexical blocks.
108+
109+ Here is an example of a simple C program being converted into GOTO code (note
110+ the ` targets ` member of each instruction isn't populated yet):
106111
107112\dot
108113digraph G{
@@ -241,39 +246,10 @@ void foo(){}"];
241246}
242247\enddot
243248
244- This is not the final form of the goto-functions, since the lists of
245- instructions will be 'normalized' in the next step (Instrumentation),
246- which removes some instructions and adds targets to others.
247-
248- ---
249- \section instrumentation Instrumentation
250-
251- In the \ref goto-programs directory.
252-
253- ** Key classes:**
254- * goto_programt
255- * goto_functionst
256- * \ref goto_programt::instructiont
257-
258- \dot
259- digraph G {
260- node [ shape=box] ;
261- rankdir="LR";
262- 1 [ shape=none, label=""] ;
263- 2 [ label="goto conversion"] ;
264- 3 [ shape=none, label=""] ;
265- 1 -> 2 [ label="goto-programs, goto-functions, symbol table"] ;
266- 2 -> 3 [ label="transformed goto-programs"] ;
267- }
268- \enddot
269-
270- This stage applies several transformations to the goto-programs from the
271- previous stage:
272-
273- * The diagram in the previous stage showed a goto_programt with four
274- instructions, but real programs usually yield instruction lists that
275- are littered with \ref code_skipt "skip" statements. The
276- instrumentation stage removes the majority of these.
249+ The ` codet ` representation of method bodies that is stored in the symbol table
250+ resembles C code quite strongly: it uses lexical variable scoping and has
251+ complex control-flow constructs such as ` code_whilet ` and ` code_ifthenelset ` .
252+ In converting to a GOTO program this structure is changed in several ways:
277253
278254* Variable lifespan implied by \ref code_declt instructions and lexical scopes
279255 described by \ref code_blockt nodes is replaced by ` DECL ` and corresponding
@@ -322,14 +298,63 @@ previous stage:
322298 instruction. Each goto_programt should have precisely one such
323299 instruction.
324300
325- This stage concludes the * analysis-independent* program transformations.
326-
327- \subsection goto-program-example-section Example:
328-
329- \subsubsection goto-program-example-1-section Unsigned mult (unsigned a, unsigned b) { int acc, i; for (i = 0; i < b; i++) acc += a; return acc; }
330-
331- To be documented.
332-
301+ \section section-goto-transforms Subsequent Transformation
302+
303+ It is normal for a program that calls ` goto_convert ` to immediately pass the
304+ GOTO functions through one or more transformations. For example, the ubiquitous
305+ ` remove_returns ` transformation turns the method return sequence generated by
306+ ` goto_convert ` (something like ` RETURN x; GOTO end; ` ) into
307+ ` function_name#return_value = x; GOTO end; ` , and a callsite
308+ ` y = function_name(); ` into ` function_name(); y = function_name#return_value; ` .
309+
310+ Some subsequent parts of the CBMC pipeline assume that one or more of these
311+ transforms has been applied, making them effectively mandatory in that case.
312+
313+ Common transformations include:
314+
315+ * Removing superfluous SKIP instructions (` remove_skip ` )
316+ * Converting indirect function calls into dispatch tables over direct calls
317+ (` remove_virtual_functions ` and ` remove_function_pointers ` )
318+ * Adding checks for null-pointer dereferences, arithmetic overflow and other
319+ undefined behaviour (` goto_check ` )
320+ * Adding code coverage assertions (` instrument_cover_goals ` ).
321+
322+ By convention these post-convert transformations are applied by a driver program
323+ in a function named ` process_goto_program ` or ` process_goto_functions ` ; examples
324+ of these can be found in ` cbmc_parse_optionst ` , ` goto_analyzer_parse_optionst `
325+ and ` goto_instrument_parse_optionst ` among others.
326+
327+ \section section-lazy-conversion Lazy GOTO Conversion
328+
329+ The conventional source-code-to-GOTO-program pipeline is horizontally
330+ structured: all source-file information is converted into a symbol table, then
331+ each symbol-table method is converted into a GOTO program, then each transform
332+ is run over the whole GOTO-model in turn. This is fine when the GOTO model
333+ consumer will use most or all of the results of conversion, but if not then
334+ lazy conversion may be beneficial.
335+
336+ Lazy conversion is coordinated by ` lazy_goto_modelt ` . Early source code parsing
337+ proceeds as usual, but while the same symbol table symbols are created as using
338+ ` initialize_goto_model ` , method bodies in the symbol table are not populated.
339+ ` lazy_goto_modelt::get_goto_function ` then allows each function to be converted
340+ on demand, passing individually through symbol-table ` codet ` production, GOTO
341+ conversion and driver-program transformation.
342+
343+ ` goto_symext ` is able to use such a lazy GOTO model in place of the usual
344+ ` goto_modelt ` , enabling it to create GOTO programs on demand, and thus avoid
345+ spending time and memory generating GOTO programs that symex determines cannot
346+ in fact be reached. This is particularly useful when the source program contains
347+ many indirect calls (virtual function calls, or calls via function pointer), as
348+ the ` remove_virtual_functions ` and ` remove_function_pointers ` passes treat these
349+ very conservatively, producing large dispatch tables of possible callees which
350+ symex often finds to be largely unreachable.
351+
352+ JBMC exhibits typical usage of ` lazy_goto_modelt ` when it is called with
353+ ` --symex-driven-lazy-loading ` . It defines a function
354+ ` jbmc_parse_optionst::process_goto_function ` which performs driver-program-
355+ specific post-GOTO-convert transformation. The lazy GOTO model is then passed to
356+ ` goto_symext ` , which will trigger symbol-table population, GOTO conversion
357+ and post-processing as needed.
333358
334359\section section-goto-binary Binary Representation
335360
0 commit comments