-
Notifications
You must be signed in to change notification settings - Fork 81
Description
The current move/init analysis is elegant and illustrates the expressive power of datalog. However, this elegance comes at the cost of efficiency, as it requires instantiating at least one fact (maybe_uninitialized and/or maybe_initialized) for every move path at every point in the CFG. This becomes intractable for large bodies like the one in #181 which requires several billion initialization facts.
Using a dense relation store (like Souffle's binary radix trie) will alleviate some of the pain by reducing the memory required per fact, and I'm kind of doubtful that rustc will ever use a datalog engine to compute move errors. However, I'm still interested in expressing this part of the compiler efficiently in datalog, if only to see how far we can take the "everything is horn clauses" mantra.
The idea is to take advantage of the sparsity of the path_accessed_at relation and compute initializedness inside a basic block only when necessary. All intra-block CFG nodes have exactly one predecessor (the previous statement in the block), so their initializedness is purely a function of their immediate predecessor and any operations within that node. If we were to compute the cumulative effect of all nodes inside a basic block and do our fixpoint algorithm on blocks instead of individual nodes, we could use that information to compute the state for intrablock nodes on-demand.
This is not groundbreaking; in fact it's pretty dull. This is the first optimization discussed in any introduction to dataflow analysis. However, implementing it in datalog poses some new challenges that require (AFAIK) some non-standard extensions to overcome.
First, let's assume that we have some mapping from CFG nodes to their corresponding basic block and statement index (a Location), as well as the control-flow edges for each basic block and the typical move/init input data in terms of those locations:
.type Location = [ block: Block, stmt: unsigned ]
.decl node_loc(node: Node, loc: Location)
.input node_loc
.decl bb_edge(from: Block, to: Block)
.input bb_edge
.decl path_assigned_at(path: Path, loc: Location)
.decl path_moved_at(path: Path, loc: Location)
.decl path_accessed_at(path: Path, loc: Location)
.input path_accessed_at
.input path_assigned_at
.input path_moved_atFrom there, we can compute the effect of each basic block on the initializedness of each move path in the program by looking for an assignment to (or move from) a path that comes after any other moves from (or assignments to) that path. Unfortunately , doing this without computing the full initializedness at each statement or negating the path_assigned_at and path_accessed_at relations (which are very sparse) requires either universal quantification within rule bodies or unbound variables within negated clauses, both of which are outside the realm of traditional datalog. Luckily, Soufflé has an extension called aggregates, which allow more complex functions on certain relations.
.decl path_becomes_uninit_in_block(path: Path, bb: Block)
path_becomes_uninit_in_block(path, bb) :-
path_moved_at(path, [bb, uninit_at]),
count : { path_assigned_at(path, [bb, init_at]), init_at > uninit_at } = 0. // ∀i s.t. i > uninit_at (!path_assigned_at(path, [bb, i]))
.decl path_becomes_init_in_block(path: Path, bb: Block)
path_becomes_init_in_block(path, bb) :-
path_assigned_at(path, [bb, init_at]),
count : { path_moved_at(path, [bb, uninit_at]), uninit_at > init_at } = 0.Here, we use the count aggregate to count the number of path_assigned_at relations for this move path in the same basic block that come after a path_moved_from relation. If none exist (count = 0), we know that the path will be moved from at the exit of the basic block.
Now that we have our block transfer functions, it's time to use them to compute the entry state for each block. A path is initialized at block exit if it is initialized at block entry and not uninitialized inside the block, or if it is initialized inside the block. This is just your typical fixpoint algorithm.
.decl path_maybe_init_at_block_exit(path: Path, bb: Block) inline
path_maybe_init_at_block_exit(path, bb) :-
path_becomes_init_in_block(path, bb).
path_maybe_init_at_block_exit(path, bb) :-
path_maybe_init_at_block_entry(path, bb),
!path_becomes_uninit_in_block(path, bb).
.decl path_maybe_uninit_at_block_exit(path: Path, bb: Block) inline
path_maybe_uninit_at_block_exit(path, bb) :-
path_becomes_uninit_in_block(path, bb).
path_maybe_uninit_at_block_exit(path, bb) :-
path_maybe_uninit_at_block_entry(path, bb),
!path_becomes_uninit_in_block(path, bb).
.decl path_maybe_init_at_block_entry(path: Path, bb: Block)
path_maybe_init_at_block_entry(path, bb) :-
path_maybe_init_at_block_exit(path, pred),
bb_edge(pred, bb).
.decl path_maybe_uninit_at_block_entry(path: Path, bb: Block)
path_maybe_uninit_at_block_entry(path, bb) :-
path_maybe_uninit_at_block_exit(path, pred),
bb_edge(pred, bb).Finally, we can use this data to compute the intrablock initialization data (using the same count trick) and move errors. Note the inline on the path_maybe_uninit_at_location relation. That means that we don't compute the relation at every point, only the ones deemed significant by move_error (the predecessors of path_accessed_at points). In fact, that relation must be inlined into a rule that constrains stmt, since it is not grounded otherwise.
.decl path_maybe_uninit_at_location(path: Path, loc: Location) inline
path_maybe_uninit_at_location(path, [bb, stmt]) :-
path_maybe_uninit_at_block_entry(path, bb),
count : { path_assigned_at(path, [bb, init_at]), init_at <= stmt } = 0.
path_maybe_uninit_at_location(path, [bb, stmt]) :-
path_moved_at(path, [bb, uninit_at]),
uninit_at <= stmt,
count : { path_assigned_at(path, [bb, init_at]), init_at > uninit_at, init_at <= stmt} = 0.
.decl move_error(path: Path, node: Node)
.output move_error
move_error(path, node) :-
path_accessed_at(path, node),
cfg_edge(pred, node),
node_loc(pred, pred_loc),
path_maybe_uninit_at_location(path, pred_loc).Unfortunately, this doesn't compile in Soufflé today due to shortcomings around aggregates and how they interact with inlining and record types. However, it can be made to work by manually inlining path_maybe_uninit_at_location into move_error using a disjunction and splitting Location into its subfields in all the input relations.
Phew. With that, we've emulated a simple dataflow optimization in datalog, and avoided storing initializedness data at all points in the CFG. However, I've not tested to see whether this method is actually faster than the naive one; I don't know how optimized the count queries will be. It will be fast in datafrog, since each count is just a range query on a static Relation.