Archive for October, 2010

Stackless tagging

Thursday, October 28th, 2010

Following up the recent post and comments on Lexicographic names, along with some off-blog discussion, I think I’m ready to implement something called “stackless tagging.”

First the tagging part: This is really just what we’ve been discussing as lexicographic names, but (as we discussed) the name “name” isn’t quite right, and we might want this new thing to co-exist with Push3 names some day. So we were casting about for other names/metaphors, ranging from the vacuous (“index”) to the wacky (something based on The Price is Right), and I subsequently scoured a thesaurus and even Charles Sanders Peirce’s 19th century classification of signifiers…. leading to a couple of candidates I didn’t love (I think the leader was “lexeme” — blah)…. before I realized what should have been obvious to me: “tags”. This comes from Holland via Riolo and company in their model of the evolution of altruism, on which Jon Klein and I later did some work (ALife paper, GPTP06 chapter)

The key idea is that a tag is like a name except that you don’t have to match it exactly. In the Riolo-style models tags were floating point numbers and similarity was numeric difference (which was thresholded), and we may implement something similar (probably more like integers, using differences but not thresholding), but Holland’s notion was more general (see his book Hidden Order). In any event, at a conceptual level all that we care about is that there’s a way to address things with inexact references (via similarity), and the tag concept covers this nicely. (BTW in my quick look back at Hidden Order I don’t see explicit mention of non-identical similarity, but I think it’s implicit.)

The stackless part is Tom’s suggestion that, for parsimony, we just build the tags directly into tagging and tag-lookup instruction names. So for example we may have things like integer.tag-4123 which pops the top integer and tags it with 4123, along with things like at-tag-3745 which pushes whatever is tagged with 3745 (OR, if nothing is tagged with 3745, whatever has the closet tag) onto the exec stack.

We could have typed at-tag instructions and separate tag stores by type, but for now, for simplicity if nothing else, I plan to have only one store and have at-tag-<num> instuctions push whatever has the closest tag onto the exec stack, from where it may end up on another stack via execution. I will have type-based tagging instructions, since otherwise it would be hard to tag different types of things, but not type-based tag lookup/reference instructions.

This kind of stackless mechanism would be a awkward with Push3 names, at least if you have a potentially large or unbounded set of possible names, because in that situation you really want a name stack so that you can correlate definitions with references, etc. But with similarity-based (e.g. lexicographic) lookup this seems much more reasonable. And since it does provide a highly parsimonious way to do this “nameish” thing, and is also simple to implement (although I think that I will treat these new “instructions” in a non-standard way in clojush…), I think it’s a good thing to try first.

Lexicographic names?

Thursday, October 21st, 2010

The “name” data type in Push3 (and associated instructions/conventions for defining names and getting their references) provides significant capabilities in theory, but I have yet to see them utilized in any non-trivial way in programs that were produced by evolution. This is particularly disappointing because names were redesigned for the Push2/Push3 transition to address this issue. I don’t think that there is a complete or definitive diagnosis of why Push2/3 names don’t actually get used in evolved results, but one contributing factor is probably that most names don’t end up having references when they are used, and coordination of assignment and reference to the same names is difficult to produce through incremental genetic modifications.

Several alternative naming mechanisms have been discussed over the intervening years — enough that I’m not sure what they all are or who originated various pieces of them (although I know that Maarten Keijzer, Bill Tozier, and Kyle Harrington originated several). A recent lab discussion (with Nathan Whitehouse, Daniel Gerow, Tom Helmuth, Brian Martin, and Kyle Harrington) ranged over several more or less radical permutations of these ideas — including things like allowing anything (numbers, code, etc.) to serve as a name — in combination with various other more or less radical proposed language revisions — including Maarten’s remarkable push-forth and various kinds of associative memory — but also produced some ideas for more incremental changes, e.g. involving the use of a character string type/stack and string-based names.

After mulling this conversation I’m thinking about making a change that is in many ways more modest and incremental than the others that have been discussed (in fact, in one respect it goes backwards to Push2), but that does add one new concept that might initially seem very strange: a lexicographic ordering on names, along with “closest lexicographic match” reference. (It’s possible that even some aspects of this thing I’m calling new have previously been proposed/implemented… by Bill?)

I will first explain how this does and doesn’t differ from names in Push2 and Push3, and then say something briefly about why I think it’s an interesting idea.

In this scheme names are still symbols (not strings or numbers or other kinds of data), but the sequences of characters that name them will matter (which is weird). The interpreter will treat any symbol that is not a known instruction as a name and push it onto the name stack. This in accordance with Push2; in Push3 only unbound names are pushed, while bound names are treated as references and replaced with their bound values (unless quoted). As in Push2, we therefore need a separate instruction to get and push the value that has been associated with a name. (Push2 had “set” and “get” instructions, while Push3 replaced “set” with “define” and treated bound instances as references. We should probably call the new ones something entirely new — I suggest “associate” and “refer” but maybe there’s something better.)

Note that even with the reversion to a “getting” instruction this can still be more parsimonious than Push2 because the exec stack still helps in some situations. (For details see the discussion of changes to names in the Push2/3 transition, in Push3; the savings due to the exec stack will still be available, but the savings due to eliminating “get” instructions will not.)

But here’s the new and weird part: When doing a “refer,” if the name has not been associated with a value then it refers to the closest match. “Closest match” could mean a variety of things, but for ease of implementation I would suggest “first item reached, when searching through the bound names in lexicographical (alphabetical) order, that is greater than or equal to the bound name.”

One other change that should go along with this: random names (which would be produced by an ephemeral random constant in the instruction set) should be randomly distributed in the lexicographic space. A  simple way to do this is to use names like name-1942 where the second part is a randomly selected positive integer within a specified range.

Why do I think this is a good idea? The short story is that:

  • It seems like it may solve the “most names don’t have references” problem in a fairly direct way.
  • It may allow an evolving lineage of programs to ramp up name-based storage/reference gradually over evolutionary time.
  • It does little else in terms of changing the Push language or PushGP evolutionary dynamics.

Comments would be most welcome!

Psh in Processing demo

Thursday, October 14th, 2010

Tom Helmuth has put together a demo of Psh being used in the Processing environment on a simple symbolic regression problem: http://hampshire.edu/lspector/psh/demo/regression/PshApplet

Clojush enhancements: artificial ant, krypto, tg8, decimation

Thursday, October 14th, 2010

A bunch of new stuff in Clojush, which you can find at github: http://github.com/lspector/Clojush

20101014: - [Artificial ant, krypto, tg8, decimation]
          - Added articial ant example from Koza (via Luke).
          - Added "tg8" integer symbolic regression problem.
          - Added krypto number puzzle example.
          - Added pushgp "decimation" feature, in which elimination
            tournaments, conducted after fitness testing, reduce the
            size of the population to a specified fraction of its original
            size (specified in a decimation-ratio argument to pushgp;
            the tournament sized is specified via decimation-tournament-size).
            The ordinary tournament-size parameter is still used for subsequent
            selection from the decimated population. Any specified trivial
            geography applies both to decimation and to subsequent selection.