Archive for June, 2011

Tagged-code macros, bug-fixes for examples, and other minor Clojush updates

Friday, June 24th, 2011

There have been a couple of Clojush updates since my last post here, with the most significant being tagged-code macros for facilitating the manipulation of code within the tag space (taking arguments from the tag space and then tagging results). Details are in the update comment below.

Available as usual from: https://github.com/lspector/Clojush

20110618: - Switched to Kyle Harrington's version of overlap; it's more clear,
            possibly faster, and may fix a hard-to-trace bug that cropped up
            in a long evolutionary run (?).
20110624: - Replaced lawnmower and dsoar examples with bugfixed versions
            (thanks to Kyle Harrington).
          - Added namespace and made miscellaneous other changes to 
            clojush-tests.clj.
          - Added support for tagged-code macros. Tagged-code macro calls
            have the effect of code instructions, but they take their
            code arguments from the tag space and they tag their code return
            values. They are implemented as macros to leverage the existing
            code instruction set; note that this means that a single call
            will contribute more than one iteration step toward the 
            evalpush-limit. Tagged-code macros appear in programs as hash maps
            that map :tagged_code_macro to true, :instruction to a code
            instruction, :argument_tags to a sequence of tags to be used
            to acquire arguments, and :result_tags to a sequence of tags
            to be used for tagging results. Execution of a macro expands
            the needed code onto the exec stack to grab arguments from the tag 
            space and push them on the code stack, execute the code instruction, 
            and tag results. Note that results will also be left on the code
            stack if global-pop-when-tagging is false. Conceptually, tag values
            are "baked in" to these macros in much the same way that tag values
            are "baked in" to the instruction names for stackless tag
            instructions; we use hash maps simply because there is more
            information to bake in and this prevents us from having to parse
            the names (which would get messy and also waste time). Because
            the maps make it difficult to read programs, however, a utility
            function called abbreviate-tagged-code-macros is provided to
            produce copies of programs with more human-readable (but not 
            currently executable) representations of tagged-macro calls.
            A tagged-code-macro-erc is provided to generate random tagged-code
            macros in pushgp runs. A new example, codesize20, provides 
            a simple demonstration of the use of tagged-code macros.

Overlap

Tuesday, June 7th, 2011

Overlap is a new measure of the similarity of possibly nested data structures.

Footnote: I think it’s new; let me know if you think otherwise! Also let me know if you think this should have a different name; I had a half dozen candidates and “overlap” was my favorite of those, but there may well be a better one.

Overlap can be applied to many kinds of data including Push code and symbolic expressions of the kind used in Koza-style genetic programming systems. Unlike many of the code similarity/difference measures described in the genetic programming literature overlap is not directly tied to any specific syntax or semantics or to any specific notion of “editing steps.” It deals with changes to list lengths, nesting level, and ordering in ways that reflect both what changes and what does not change in such transformations. That said, no similarity measure will be perfect for all applications, and I make no such claims about overlap. But overlap does have the following nice features:

  • Broadly applicable.
  • Handles arbitrary transformations in reasonable ways.
  • Easy to describe and to compute.
  • Ranges from 0 (for entirely distinct inputs) to 1 (for identical inputs)

Overlap is defined in terms of the collections of the items contained in each of the arguments, including nested items at all levels. For example, if x is (a (b c)) then Items(x) will contain the following five items: a, b, c, (b c) and (a (b c)). Within this context we define:

Overlap(x, y) = the maximal number of simultaneous, disjoint pairings by identity of items in Items(x) with items in Items(y), divided by the size of the larger of Items(x) and Items(y).

Following are some examples, produced by the new overlap-demo function in Clojush (see below):

a , () --- float overlap: 0.0 
a , a --- float overlap: 1.0 
a , b --- float overlap: 0.0 
a , (a b) --- float overlap: 0.33333334 
a , (a a) --- float overlap: 0.33333334 
(a) , (a b) --- float overlap: 0.33333334 
(a b) , (a c) --- float overlap: 0.33333334 
(a b) , (a b c) --- float overlap: 0.5 
(a b c) , (a b d) --- float overlap: 0.5 
(a b c d) , (a b c e) --- float overlap: 0.6 
(a b c d) , (d c b a) --- float overlap: 0.8 
(a b c d e f g) , (a b c d e f h) --- float overlap: 0.75 
(a b) , (a b c d e f) --- float overlap: 0.2857143 
(a (b (c))) , (a (b (c))) --- float overlap: 1.0 
(a (b (c))) , (a (b (x))) --- float overlap: 0.33333334 
(a (b (c))) , (a (x (c))) --- float overlap: 0.5 
(a (b (c))) , (x (b (c))) --- float overlap: 0.6666667 
(a (b c) (d (e f))) , ((d (e f)) a) --- float overlap: 0.6 
(a (b c) (d (e f))) , (a a (b c) (d (e f))) --- float overlap: 0.8181818 

I developed overlap for comparisons of Push programs (for various purposes), but I could imagine it having other uses as well. Might it be useful for analysis or diversity maintenance in Koza-style genetic programming? I don’t know, but if anyone wants to explore that I’d love to hear about it. It may be that the syntax restrictions of Koza-style symbolic expressions make the flexibility of overlap unnecessary, and maybe overlap reduces to some other measure in such circumstances… but I don’t know.

Push users should also note that overlap differs in several ways from the “discrepancy” measure that has long been included with Push implementations (e.g. in the code_discrepancy instruction). Aside from the difference in direction (higher overlap means more similar, while higher discrepancy means less similar) a major difference is that overlap is normalized to [0, 1] while discrepancy ranges from 0 up to arbitrarily large integers. Analysis of other differences is left as an exercise to the reader.

An overlap utility function, an overlap-demo function, and a code_overlap Push instruction have been added to the latest commit of Clojush; details below.

Clojush is available as usual from: https://github.com/lspector/Clojush

Here is the latest commit message:

20110607: - Added overlap utility function, overlap-demo (which just prints
            some examples to show how overlap works), and code_overlap
            instruction. Overlap can be applied to any two (possibly
            nested) things and it returns a number between 0 (meaning
            no overlap) and 1 (meaning exact match). The overlap utility
            function returns a ratio, but the code_overlap instruction
            pushes a float.
          - Removed complex number support from 20110505. There were previous
            reports of problems and I've noticed problems from the fact that
            (apply + ()) => zero (as opposed to 0) in the clojush namespace
            defined by the code as revised for complex number support. If
            someone knows how to re-introduce complex number support without
            such problems then please let me know.

Removed Clojush complex number support

Tuesday, June 7th, 2011

There were reports of other problems, and I just experienced a problem that I traced to the fact that (apply + ()) => zero (as opposed to 0) in the clojush namespace defined by the code as it was revised for complex number support. If someone knows how to re-introduce complex number support without such problems then please let me know.

Clojush is available as usual from: 
https://github.com/lspector/Clojush