Lexicographic names?

October 21, 2010
by Lee Spector (lspector)

The “name” data type in Push3 (and associated instructions/conventions for defining names and getting their references) provides significant capabilities in theory, but I have yet to see them utilized in any non-trivial way in programs that were produced by evolution. This is particularly disappointing because names were redesigned for the Push2/Push3 transition to address this issue. I don’t think that there is a complete or definitive diagnosis of why Push2/3 names don’t actually get used in evolved results, but one contributing factor is probably that most names don’t end up having references when they are used, and coordination of assignment and reference to the same names is difficult to produce through incremental genetic modifications.

Several alternative naming mechanisms have been discussed over the intervening years — enough that I’m not sure what they all are or who originated various pieces of them (although I know that Maarten Keijzer, Bill Tozier, and Kyle Harrington originated several). A recent lab discussion (with Nathan Whitehouse, Daniel Gerow, Tom Helmuth, Brian Martin, and Kyle Harrington) ranged over several more or less radical permutations of these ideas — including things like allowing anything (numbers, code, etc.) to serve as a name — in combination with various other more or less radical proposed language revisions — including Maarten’s remarkable push-forth and various kinds of associative memory — but also produced some ideas for more incremental changes, e.g. involving the use of a character string type/stack and string-based names.

After mulling this conversation I’m thinking about making a change that is in many ways more modest and incremental than the others that have been discussed (in fact, in one respect it goes backwards to Push2), but that does add one new concept that might initially seem very strange: a lexicographic ordering on names, along with “closest lexicographic match” reference. (It’s possible that even some aspects of this thing I’m calling new have previously been proposed/implemented… by Bill?)

I will first explain how this does and doesn’t differ from names in Push2 and Push3, and then say something briefly about why I think it’s an interesting idea.

In this scheme names are still symbols (not strings or numbers or other kinds of data), but the sequences of characters that name them will matter (which is weird). The interpreter will treat any symbol that is not a known instruction as a name and push it onto the name stack. This in accordance with Push2; in Push3 only unbound names are pushed, while bound names are treated as references and replaced with their bound values (unless quoted). As in Push2, we therefore need a separate instruction to get and push the value that has been associated with a name. (Push2 had “set” and “get” instructions, while Push3 replaced “set” with “define” and treated bound instances as references. We should probably call the new ones something entirely new — I suggest “associate” and “refer” but maybe there’s something better.)

Note that even with the reversion to a “getting” instruction this can still be more parsimonious than Push2 because the exec stack still helps in some situations. (For details see the discussion of changes to names in the Push2/3 transition, in Push3; the savings due to the exec stack will still be available, but the savings due to eliminating “get” instructions will not.)

But here’s the new and weird part: When doing a “refer,” if the name has not been associated with a value then it refers to the closest match. “Closest match” could mean a variety of things, but for ease of implementation I would suggest “first item reached, when searching through the bound names in lexicographical (alphabetical) order, that is greater than or equal to the bound name.”

One other change that should go along with this: random names (which would be produced by an ephemeral random constant in the instruction set) should be randomly distributed in the lexicographic space. A  simple way to do this is to use names like name-1942 where the second part is a randomly selected positive integer within a specified range.

Why do I think this is a good idea? The short story is that:

  • It seems like it may solve the “most names don’t have references” problem in a fairly direct way.
  • It may allow an evolving lineage of programs to ramp up name-based storage/reference gradually over evolutionary time.
  • It does little else in terms of changing the Push language or PushGP evolutionary dynamics.

Comments would be most welcome!



21 Responses to “Lexicographic names?”

  1.   tom Says:

    I think this sounds like a great idea. I think this is what I was trying to describe in lab when I said that it would be easier to just use strings to refer to the variables, when I guess I really meant symbols.

    One interesting (and likely good, evolutionarily) effect will be the changing of a named variable’s value from one individual to its children by creating a name that is lexicographically closer with a different value. Thus, if you have a variable set to a specific value, you don’t necessarily have to set that specific variable to a new value to change the value during future references.

    I think this is my favorite method to named variables that I’ve heard of for Push.

  2.   bill Says:

    Never did anything with closest lexicographic match; names (“variables” in our ontology) are an explicit statement type, not a fallback behavior as in Push3.

    This sounds like an interesting variant of Push.

    So your expectation would be that there would be a reduction in the average proportion of unbound names on the name stack, as in your first bullet?

    What effect do you think that might have on your performance measure of choice? In symbolic regression and other mathematical problems where variables are “close to the surface” we see them being used as one might expect; in “application” problems, we don’t see any default evolutionary pressure to create local variables and use them, though the capacity is built into Nudge/AnswerFactory.

    Our approach is a good deal more “hands on” though: we’ve got some sketches of tools for reaching into an ongoing search and actively increasing the rewards for answers following desired patterns, and “using local variables instead of explicit literals” is one of them we’ve got on the to do list.

    Will be interesting to see how the two approaches—your indirect one, and the more direct conditioning one we’ve been taking—compare for problem-solving and generalization.

  3.   maarten Says:

    I like this idea a lot. It initially reminded me of the ‘address by template’ mechanism from Tom Ray’s Tierra system. There a JMP instruction was labelled with a bitstring, and it would start to look for that bitstring somewhere else to resume execution. But this is better. It will look for the nearest match of a particular bitstring (in the guise of a name), and resume execution there. This nearest bit is huge (it also reminds me of a discussion with Conor Ryan who wanted to use a ‘bad’ hash function to impose some neighbourhood function of a set of names. I guess lexicographical ordering is a pretty bad hash function)

    The difficulty of course is to get some leverage out of it, and that still might need some work.

    I think that the reason all that name stuff is seldomly used in Push2 or Push3 programs is simply that it doesn’t provide *any* benefit over just directly working with code. In Push it’s very easy to duplicate some code, push it on a stack, and use it later. So if the choice is between binding code to a name and using the name, or to just duplicate the code a couple of times and using that code directly, apparently the evolutionary engine chooses for using the direct way of manipulating code.

    What I’m trying to say is that, effectively, in Push, a name is just a cumbersome shorthand for a reference — useful for people perhaps, but totally irrelevant for computers. Say I have the following code:

    ( 1 (3 2 +) (3 2 +) )

    Well, that looks complicated, right. But. Deep down inside, the computer is probably using something like:

    ( 1 0x33493 0x33493)

    to represent this bit of code, with 0x33493 being the address of the bit of code that represents (3 2 +). Why would an algorithm care that this piece of code is called 0x33493, and not ‘maartens_computation’?? There already is some kind of binding mechanism (pointers or references), and attaching human readable names to it just doesn’t add anything. Effectively all Push-2 did was say: you can refer to code using a string, but you have to use ‘get’ and ‘set’ primitives. Push-3 said: you can refer to addresses using a string, but you have to use a ‘set’ primitive to relocate a string. And because there’s no benefit between calling a piece of code with ‘Maarten’ or with 0x33493, the system just ignores it.

    But with this idea, things change. It suddenly starts to matter if a piece of code is called 0x33493, or alternatively 0x22303, as it will behave differently under mutations. That’s potentially a big deal. Question still is how to make use of it.

    Maybe blending this ‘address by template bit’ with this ‘nearest neighbour search’ together with this ‘reference based lookup of code’ to provide a new way of doing things. Something like:

    a) Create a global namespace of strings, addresses, generally variable length bitstrings that store code, call these names
    b) Have code only consist of a variable length list of atoms: integers, floats, booleans, but also names, but NOT code.
    c) Run code in this namespace, tagging whatever is included in the evaluation, and call that an individual.
    d) Mutate atoms, not code, no recursion

    So here we will have

    (1 0x33493 0x33493) as the individual. Which will, when running code, bind to (3 2 +) because that is located at 0x33660, being the nearest piece of code.

    Given that such a system would not have any ’embedded’ code anymore, and just consist of (variable length) arrays of atoms and references/words, it might truly deserve the name ‘push-forth’

    But of course, just using it as names next to references to code might give you some leverage as well.

  4.   bill Says:

    I did some work with Simon Fraser at SFI with Tierra, and I was reminded of this too. Interesting thing we found was, the idea of an “individual” is rather fragile. It’ll be interesting to see how your reusable code pieces/references self-organize into robust, functioning code.

    That said, I’d still like to push on the motivation a bit, especially after Maarten’s points. Imagine you are running an actual trying-to-solve-a-particular-problem Push population, the way it is right now. OK? Something challenging and open-ended, not a toy, like say evolving curve fitting algorithms–as opposed to fitting curves, right?

    And you’re running this search, which I tried to pick because people tend to use local variables when describing these sorts of things, and you see something happening that males you say, “I bet it would do better, this search, if only it ‘knew’ to use names more.”

    Regardless of how you go about making that thing happen–making the system start using names differently–what is the thing you saw?

  5.   bill Says:

    [sorry for unclosed italic markup; typing on iPgibw]

  6.   lspector Says:

    Stimulating comments!

    === Tom:

    I agree that symbol mutations that move small distances in lexicographic space might make sense with this, just as small-delta mutations to numeric constants often can (but FWIW aren’t provided out of the box in any PushGP as far as I know).

    === Bill:

    1) Yes, there would be *no* unbound references once a single name is bound, since that one thing would be the closest match for any reference (until another is bound, and then references closer to the second one would pick that one out, etc.)

    2) Very interesting about the experience with Nudge/AnswerFactory. I wonder how this relates to Maarten’s (subsequent) description/diagnosis… I also think it will be interesting to see if anything notable happens with explicit encouragement of name use, and in fact this may be another way to do something I’m beginning to work on from a rather different angle, on explicit encouragement of modularity.

    === Maarten:

    I will have to mull the new push-forth further!

    But on the earlier part of your message:

    1) Absolutely I was think of Tierra too. Great to hear that Connor was thinking along these lines too. I think that “bad hash function” is an excellent name for a band.

    2) I think your description/diagnosis of the issues with Push2/3 names captures a lot that is right but misses a couple of things. One is that the duplicated expression in ( 1 (3 2 +) (3 2 +) ) is really best thought of as a copy (whether it shares memory or not, since Push code/exec instructions will copy when necessary to insure functional specifications). This means that a name offers “protection” — the two uses cannot diverge. Of course this protection cuts both ways, with duplication and divergence sometimes being important contributors to evolutionary adaptation. Probably “protection” is sometimes good and sometimes bad… and if it’s more often bad then I think this may be an argument not only for names being unhelpful but also for them being positively harmful. Another difference between names and code duplication is in parsimony, although this too is complicated because code and exec instructions can be used to duplicate code dynamically, just as names can.

    3) On getting leverage out of it: I think that we might get some right away. Consider that once you’ve bound a single name (in the lexicographic scheme) you can refer to its content again by coming up with ANY other name. It doesn’t have to be the same name OR another copy of the same content — any name will do, and that should make reference easier to come by. Of course, once a second name is bound some of the new-found names will refer to the wrong thing, but some won’t, and lots of other new names will still access the first reference, etc. And maybe the new binding will be better in some cases. I don’t think this is a knock-down argument that we’ll get real leverage, but it certainly seems worth trying.

    === and also

    Over the course of this thread I saw a couple of nice talks by Stephanie Forrest and also had some conversations with her, not about any of this directly but this stuff was in my head, and I think there may be a bigger story to tell here about relations between robustness, adaptation, and evolvability. Or not! But the upshot for me for the moment is that I definitely want to try this basic scheme and see what happens.

  7.   lspector Says:

    Bill: on the motivation question (which came while I was typing my last one): It’s easier to answer, but with the wrong answer (!), when thinking about the motivation in human programming. We all experience the “wait — this should be its own function” moments, and it’s probably all about anticipated reuse… so where do those anticipations come from? Good question for psychology of human programming, and maybe for some kinds of automatic programming (which would be a worthwhile but different discussion to continue…), but natural selection doesn’t anticipate so I don’t think that it’s really the right question to ask what would lead US to anticipate reuse when watching evolution in the scenario that you outline. I think the natural selection way is to say “let lots of naming readily happen,” to select things that work well, and so to end up with names persisting and being used when they happen to facilitate this (for reasons that we probably couldn’t predict or notice in advance). The trouble is that “let lots of naming readily happen” is only useful if you also “let lots of referring readily happen” and then the chances that the naming and referring match up are way too low. But under the lexicographic scheme that should be fixed….

  8.   bill Says:

    Not to get dogmatic, but Tierra was “automatic” programming, and GP is not analogous to natural selection. It’s analogous to artificial selection, and that has a breeder (“customer”) in the system’s loop.

  9.   lspector Says:

    Bill: agreed on Tierra being automatic programming. Half agreed on the rest because customers are natural too (that is, the natural/artificial distinction only goes so far). Still, I get what you mean and agree. Still, and I think we also agree on this, I think it’s a good idea to try an approach based on letting lots of naming/reference occur randomly (engineered to make matching of names and references likely to arise) and letting selection (of some kind) sort out how and when it should be used.

  10.   bill Says:

    [now on an actual computer with keyboard and all, and it might help me not be so abrupt-sounding :)]

    The analogy you’re drawing, which sees “old school” “hands-off” GP as analogous to natural selection qua natural… I’m still going to argue that that’s the Achilles heel of our field. But not here, as you say.

    We’re discussing this new interesting variation in the dynamics of our constructed system. I’m already scheduling time to add it to the language plugins here, since it’s not that big a change from the push-forth I’ve already got notes for. It sounds like simple fun.

    But let me try to frame my salient point another way; it’s not tangential, and it’s also not a criticism. I just want to know that you’ve thought about this. It’s just the other side of the same coin you’re staring at, with the design change you have in mind.

    OK.

    Surely all the recent Push languages have names. They have this capability built in. And yet many of us report they never get used, or retained for long.

    Can one craft a fitness function, or maybe a set of search operators, that promotes the use of names as they exist now? Without “cheating” by, say, adding an objective that maximizes name use outright?

    It seems one must be able to. Or if not, then there is something deep about representations and algorithms, not just Maarten’s point about social conventions and cognition.

    So again: what might that fitness function be?

  11.   bill Says:

    I think it’s a good idea to try an approach based on letting lots of naming/reference occur randomly (engineered to make matching of names and references likely to arise) and letting selection (of some kind) sort out how and when it should be used.

    Definitely.

  12.   lspector Says:

    Actually my most recent versions (schush and clojush) don’t have names at all, and they note this in their READMEs, because 1) when I wrote them I had already made repeated attempts to solve the name-use problem and had failed, so I was unenthusiastic about them, and 2) I saw implementing them as a pain, mainly (I see now in retrospect) because I saw them involving a global map (which I now see as a silly legacy thing — they can be implemented with local maps, just like I now have local stacks) and because there were some kludgy issues involving ephemeral random names (because I wanted reuse, so I would sometimes generate names that were already used and sometimes generate new ones… but how much of each? This issue goes away with lexicographic names :-).

    On your “Can one craft a fitness function…” question: I have not tried your “cheating” option, although I think it might actually be interesting to give it a whirl. While it would not be surprising, of course, if it promoted name use, it might conceivably change evolutionary dynamics or success etc. in other ways and that would be cool.

    What I HAVE done is to pose problems for which I thought names would be useful (I don’t recall exactly what right now), solve them, look at the non-name-using solutions (which is all that I would get), and then cripple the system in various ways (e.g. removing instructions) to prevent those solutions from arising again. I’d simultaneously pump up the number of names and instances of *.DEFINE that would appear in random code and re-run… but never with satisfying results. I went around this circle several times, watching GP find ever more clever ways to use the remaining functionality — but not names! — to solve the problems. I think some others may have independently gone around this circle a couple of times (e.g. maybe Alan Robinson, although that would have been in Push1, I think). As a sanity check I did often run hand-crafted solutions that used names, and they worked, but I couldn’t get them to arise via GP.

    BTW I don’t think I disagree entirely on the Achilles heel thing — but the question is where human hands should and shouldn’t be applied, and I think there are a lot of open questions there…

  13.   bill Says:

    I went around this circle several times, watching GP find ever more clever ways to use the remaining functionality — but not names! — to solve the problems. I think some others may have independently gone around this circle a couple of times (e.g. maybe Alan Robinson, although that would have been in Push1, I think). As a sanity check I did often run hand-crafted solutions that used names, and they worked, but I couldn’t get them to arise via GP.

    Yeah, us too. For six or seven variants of “names” that are rather diverse.

    In support of Maarten’s argument for programs-don’t-need-semiotics, and not tangentially to the discussion of “goddammit, use names!”, we’ve recently been noticing that evolved symbolic regression expressions for polynomials “prefer” to add variables together a bunch of times, rather than multiplying them. There’s no single blanket hypothesis there. Part of it is probably something about robustness to mutation, part is the low probability of finding the right constants to multiply instead of getting most of the way there via addition, part is certainly the odd “blending crossover” search operator we’ve been applying sometimes.

    But as with the names question that started you thinking about this change, our experience explaining (and controlling) addition-vs-multiplication is why I ask those incessant questions about features of search processes that might act as cues (or clues) for us to change what we’re doing to obtain a desired outcome. Even the simplest classical population-based looping algorithms are not modular enough to tease apart these biases’ sources.

  14.   bill Says:

    I’m going to ask a sort of sealed-envelope question, in fact, of the first one of us who writes a Push or push-forth interpreter that allows one to toggle between “traditional” and “approximate” name matching:

    If you turn off selection entirely, and just randomly sample the space if solutions in the two cases, is there any difference in the distribution of fitnesses?

  15.   lspector Says:

    Detail question: One refer instruction or one per type (which would mean an association map per type)?

    I think I’ll implement an associate instruction per type, like integer.associate, exec.associate, etc. The type here refers to the source of the association; integer.associate associates the name on top of the name stack with the integer on top of the integer stack.

    If there’s one refer instruction then it would take the top name, look it up in the single association map, find the closest match, and push the associated value onto the exec stack (from where it may then be pushed elsewhere via execution, e.g. if it’s an integer it will be pushed onto the integer stack in the next time step via execution).

    On the other hand, if there a refer instruction per type then integer.refer would do the lookup only in the integer association map and would push the resulting value (which will have to be an integer) directly onto the integer stack.

    A benefit of the “single refer instruction” idea is that it most aggressively pursues our goal of making most references actually refer. Under the “refer instruction per type” scheme integer.refer won’t refer to anything until after an integer.refer, specifically, while under the “single refer instruction” scheme all references will succeed as soon as something (anything) has been associated.

    But on the other hand, with the “single refer instruction” scheme mutations or additional associations may cause existing references to change types (and not just values).

    I’m not sure which to try first. Any opinions/thoughts?

  16.   tom Says:

    I have a suggestion related to the “one refer instruction or one per type” question. Let’s assume that you go with only one refer instruction. Then, after something has been associated, any refer call will have something to refer to. But, you have to decide what to do with a refer call before something has been associated.

    One obvious option is to have an unassociated refer call act as a no-op. Instead, I would recommend that an unassociated refer call would refer to something useful, for instance, the code of the original program. This way, unassociated names could be useful out of the gate, and would promote more names in the system, which may later be “commandeered” by other variables.

    If you go with one refer per type, this idea could be extended by having a default value to use for unassociated references. The integer and float stacks could default to 0, the code or exec stacks could default to the original program, etc.

    I think I’d slightly prefer the “one refer instruction” option, but could be convinced otherwise.

  17.   tom Says:

    One other thought:

    While names can be thought of as ways to define variables or functions, maybe a better way of thinking of them is a way to define new instructions during program execution. In this way, names should occupy the same namespace as instructions (instead of the namespace of strings, integers, etc.), as it appears we’re currently leaning.

    I think this is just a slightly different way of thinking about things, and may not be new to those who have pondered names for years, but it’s helping me wrap my mind around them better.

  18.   lspector Says:

    I do think of names sometimes as instructions, especially if they’re bound to code. BTW in “Lisp01” languages (like scheme and clojure) naming is the same for code as for anything else (there’s one namespace, at least within a module), while in Lisp-2 languages (like common lisp) a single symbol can have independent function and value bindings.

    Defaults definitely deserve thought, whether reference is type-specific or not. I’m not sure if a default of “the original program” is a good idea… or conversely, maybe it’s a good enough idea that there should be an “original-program” instruction rather than a default… But on the third hand this is global and non-composable, so maybe a bad idea on general principles… In any event, maybe we should have *some* defaults rather than ever acting as a no-op.

    It occurred to me this morning that the answer to my question (about one refer instruction vs. type-specific refer instructions) might just be “both.” Provide a generic refer instruction that pushes (onto the exec stack) the value associated with the closest matching name among ALL bound names), and also type-specific refer instructions that only “see” the associations to things of their types. The only question would be whether you allow multiple (different type) bindings to the same name, and if so what happens when you do a generic refer to that name. What do you think?

  19.   bill Says:

    If you allow type-specific association and lookup in hashes/lists, you’re getting close to letting (potentially) overlapping namespaces develop. At a given moment, a reference to name-1921 might be referring to one of a thousand bound integers, or one of five bound codeblocks, depending on the dynamics of search of course.

    So if you want to have both a generic and type-specific bind and refer instructions, the question becomes whether the names themselves are being bound “permanently” into a particular type’s namespace or not. If you want to use them as vanilla associations that take on the type of the item stored ‘in’ them, then when you reassociate a reference to a different type with a name already bound to something else, it just “moves” to the new namespace, as far as the type-specific reference instructions are concerned.

    When you have tried both of these little design decisions, though, I’m still going to ask: what traits of the actual function of the language made you decide one was better than another? Or, if you pass the buck and leave a toggle or a config option in place, what argues for anybody making the decision.

    Let me rush to add (from experience) that this isn’t me questioning whether you should do this! It’s me pushing you to actually address the usability (and user experience) of GP in general, which is something that led me to select Push in the first place because it’s a bit more surfaced here than in those damned S-expressions. 🙂

  20.   Push » Blog Archive » Stackless tagging Says:

    […] Lexicographic names? […]

  21.   Push » Blog Archive » Stackless tagging (+ fixes/enhancements) in Clojush Says:

    […] Lexicographic names? […]

Leave a Reply

You must be logged in to post a comment.