Lexicographic names?
Thursday, October 21st, 2010The “name” data type in Push3 (and associated instructions/conventions for defining names and getting their references) provides significant capabilities in theory, but I have yet to see them utilized in any non-trivial way in programs that were produced by evolution. This is particularly disappointing because names were redesigned for the Push2/Push3 transition to address this issue. I don’t think that there is a complete or definitive diagnosis of why Push2/3 names don’t actually get used in evolved results, but one contributing factor is probably that most names don’t end up having references when they are used, and coordination of assignment and reference to the same names is difficult to produce through incremental genetic modifications.
Several alternative naming mechanisms have been discussed over the intervening years — enough that I’m not sure what they all are or who originated various pieces of them (although I know that Maarten Keijzer, Bill Tozier, and Kyle Harrington originated several). A recent lab discussion (with Nathan Whitehouse, Daniel Gerow, Tom Helmuth, Brian Martin, and Kyle Harrington) ranged over several more or less radical permutations of these ideas — including things like allowing anything (numbers, code, etc.) to serve as a name — in combination with various other more or less radical proposed language revisions — including Maarten’s remarkable push-forth and various kinds of associative memory — but also produced some ideas for more incremental changes, e.g. involving the use of a character string type/stack and string-based names.
After mulling this conversation I’m thinking about making a change that is in many ways more modest and incremental than the others that have been discussed (in fact, in one respect it goes backwards to Push2), but that does add one new concept that might initially seem very strange: a lexicographic ordering on names, along with “closest lexicographic match” reference. (It’s possible that even some aspects of this thing I’m calling new have previously been proposed/implemented… by Bill?)
I will first explain how this does and doesn’t differ from names in Push2 and Push3, and then say something briefly about why I think it’s an interesting idea.
In this scheme names are still symbols (not strings or numbers or other kinds of data), but the sequences of characters that name them will matter (which is weird). The interpreter will treat any symbol that is not a known instruction as a name and push it onto the name stack. This in accordance with Push2; in Push3 only unbound names are pushed, while bound names are treated as references and replaced with their bound values (unless quoted). As in Push2, we therefore need a separate instruction to get and push the value that has been associated with a name. (Push2 had “set” and “get” instructions, while Push3 replaced “set” with “define” and treated bound instances as references. We should probably call the new ones something entirely new — I suggest “associate” and “refer” but maybe there’s something better.)
Note that even with the reversion to a “getting” instruction this can still be more parsimonious than Push2 because the exec stack still helps in some situations. (For details see the discussion of changes to names in the Push2/3 transition, in Push3; the savings due to the exec stack will still be available, but the savings due to eliminating “get” instructions will not.)
But here’s the new and weird part: When doing a “refer,” if the name has not been associated with a value then it refers to the closest match. “Closest match” could mean a variety of things, but for ease of implementation I would suggest “first item reached, when searching through the bound names in lexicographical (alphabetical) order, that is greater than or equal to the bound name.”
One other change that should go along with this: random names (which would be produced by an ephemeral random constant in the instruction set) should be randomly distributed in the lexicographic space. A simple way to do this is to use names like name-1942 where the second part is a randomly selected positive integer within a specified range.
Why do I think this is a good idea? The short story is that:
- It seems like it may solve the “most names don’t have references” problem in a fairly direct way.
- It may allow an evolving lineage of programs to ramp up name-based storage/reference gradually over evolutionary time.
- It does little else in terms of changing the Push language or PushGP evolutionary dynamics.
Comments would be most welcome!