Getting Ready for Inference Collection V3.0 Update 1
One of the major changes for version 3.0 of the inference collection is a rule compiler that generates code for (portions of) the match/join process. This will be done at expansion (compile) time by the rule language macros themselves. The rule language macro include define-ruleset, define-facts, and define-rule. The rule compilation itself takes place in the define-rule macro.
So far, I have the rule parsing, rule validation, and rule normalization phases pretty much complete. The rule normalization phase converts Boolean expressions within the pattern clauses on a standard (and <pattern-clause> ...) | (or (and <pattern-clause> ...) ...). The first form is a simple rule while the second is a sequence of simple rules. For example:
(define-rule (some-rule some-ruleset)
(p1 ?a ?b ?c)
(p2 ?a ?b ?c)
(or (and (p3 ?a)
(p4 ?b))
(p3 ?c))
==>
(printf "a = ~a, b = ~a, c = ~a~n" ?a ?b ?c))
would expand into the equivalent of the two rules:
(define-rule (some-rule-a some-ruleset)
(p1 ?a ?b ?c)
(p2 ?a ?b ?c)
(p3 ?a)
(p4 ?b)
==>
(printf "a = ~a, b = ~a, c = ~a~n" ?a ?b ?c))
(define-rule (some-rule-b some-ruleset)
(p1 ?a ?b ?c)
(p2 ?a ?b ?c)
(p3 ?c))
==>
(printf "a = ~a, b = ~a, c = ~a~n" ?a ?b ?c))
This can simplify writing complex rulesets.
I am now coding the actual rule compiler. This will generate procedures for matching pattern elements and joining matches - applying join constraints. For example, the pattern
(p1 ?a 12 ?b : (> ?b 10) (c . ?c))
would compile into:
(p1
(#:variable #f ?a #f (lambda (?a bindings)
(vector-immutable
(unsafe-vector-ref bindings 0))))
(#:literal #f #f 12 (lambda (x bindings)
(if (= x 12) bindings #f)))
(#:variable #f ?b (> ?b 10) (lambda (?b bindings)
(let ((?a (unsafe-vector-ref bindings 1)))
(if (> ?b 10)
(vector-immutable
(unsafe-vector-ref bindings 0) ?a ?b)
#f))))
(#:variable c ?c #f (lambda (?c bindings)
(let ((?a (unsafe-vector-ref bindings 1))
(?b (unsafe-vector-ref bindings 2)))
(vector-immutable
(unsafe-vector-ref bindings 0) ?a ?b ?c))))))
where each pattern element is a five element list (<type> <key> <variable> <constraint> <matcher>), where <type> is the pattern element type (e.g., #:variable or #:literal), <key> is the key for association list matching, <variable> is a variable name, <constraint> is a constraint expression, and <matcher> is a procedure to match the pattern element. These are used at execution time to build the rule network.
The structure of the pattern is maintained in the compiled form and is used to structure the match nodes and the data flow between them. The <type>, <key>, <variable>, and <constraint> fields are used to determine when nodes can be shared among rules. Finally, the procedure in the <matcher> field does the actual matching and variable binding.
Data Structures for Matching and Joining
Immutable vectors are used to represent matches. For example, the pattern
(p1 ?a 12 ?b : (> ?b 10) (c . ?c))
matched against the asserted fact
(p1 10 12 14 (c . 16) (d . 18))
would produce a match of
#(<assertion> 10 14 16)
where <assertion> is the assertion object for the specified fact and 10, 14, and 16 are the values matching the variables ?a, ?b, and ?c, respectively.
Immutable vectors are also used to represent joined matches. For example, the patterns
(p1 ?a ?b ?c)
?p2 <- (p2 ?a ?b ?c)
(no (p3 ?a))
matched against the asserted facts
(p1 1 2 3)
(p2 1 2 3)
with no asserted (p3 1)) fact, would produce a match of
#(#(<assertion> 1 2 3)
#(<assertion> 1 2 3)
#(#t))
A major bottleneck in the current inference engine is the amount of work done in propagating (or unpropagating) assertions and deletions through the join (and rule) nodes of the rule network. For version 3.0, I'm planning on two (eq?) hash tables for each join node to index left and right matches and a double doubly-linked list of joined matches. This will allow efficient insertion and deletion of both left and right matches during join processing (for propagating and unpropagating).
More to come.
Labels: inference