Archive for the 'Uncategorized' Category

Lexicographic names?

Thursday, October 21st, 2010

The “name” data type in Push3 (and associated instructions/conventions for defining names and getting their references) provides significant capabilities in theory, but I have yet to see them utilized in any non-trivial way in programs that were produced by evolution. This is particularly disappointing because names were redesigned for the Push2/Push3 transition to address this issue. I don’t think that there is a complete or definitive diagnosis of why Push2/3 names don’t actually get used in evolved results, but one contributing factor is probably that most names don’t end up having references when they are used, and coordination of assignment and reference to the same names is difficult to produce through incremental genetic modifications.

Several alternative naming mechanisms have been discussed over the intervening years — enough that I’m not sure what they all are or who originated various pieces of them (although I know that Maarten Keijzer, Bill Tozier, and Kyle Harrington originated several). A recent lab discussion (with Nathan Whitehouse, Daniel Gerow, Tom Helmuth, Brian Martin, and Kyle Harrington) ranged over several more or less radical permutations of these ideas — including things like allowing anything (numbers, code, etc.) to serve as a name — in combination with various other more or less radical proposed language revisions — including Maarten’s remarkable push-forth and various kinds of associative memory — but also produced some ideas for more incremental changes, e.g. involving the use of a character string type/stack and string-based names.

After mulling this conversation I’m thinking about making a change that is in many ways more modest and incremental than the others that have been discussed (in fact, in one respect it goes backwards to Push2), but that does add one new concept that might initially seem very strange: a lexicographic ordering on names, along with “closest lexicographic match” reference. (It’s possible that even some aspects of this thing I’m calling new have previously been proposed/implemented… by Bill?)

I will first explain how this does and doesn’t differ from names in Push2 and Push3, and then say something briefly about why I think it’s an interesting idea.

In this scheme names are still symbols (not strings or numbers or other kinds of data), but the sequences of characters that name them will matter (which is weird). The interpreter will treat any symbol that is not a known instruction as a name and push it onto the name stack. This in accordance with Push2; in Push3 only unbound names are pushed, while bound names are treated as references and replaced with their bound values (unless quoted). As in Push2, we therefore need a separate instruction to get and push the value that has been associated with a name. (Push2 had “set” and “get” instructions, while Push3 replaced “set” with “define” and treated bound instances as references. We should probably call the new ones something entirely new — I suggest “associate” and “refer” but maybe there’s something better.)

Note that even with the reversion to a “getting” instruction this can still be more parsimonious than Push2 because the exec stack still helps in some situations. (For details see the discussion of changes to names in the Push2/3 transition, in Push3; the savings due to the exec stack will still be available, but the savings due to eliminating “get” instructions will not.)

But here’s the new and weird part: When doing a “refer,” if the name has not been associated with a value then it refers to the closest match. “Closest match” could mean a variety of things, but for ease of implementation I would suggest “first item reached, when searching through the bound names in lexicographical (alphabetical) order, that is greater than or equal to the bound name.”

One other change that should go along with this: random names (which would be produced by an ephemeral random constant in the instruction set) should be randomly distributed in the lexicographic space. A  simple way to do this is to use names like name-1942 where the second part is a randomly selected positive integer within a specified range.

Why do I think this is a good idea? The short story is that:

  • It seems like it may solve the “most names don’t have references” problem in a fairly direct way.
  • It may allow an evolving lineage of programs to ramp up name-based storage/reference gradually over evolutionary time.
  • It does little else in terms of changing the Push language or PushGP evolutionary dynamics.

Comments would be most welcome!

Psh in Processing demo

Thursday, October 14th, 2010

Tom Helmuth has put together a demo of Psh being used in the Processing environment on a simple symbolic regression problem: http://hampshire.edu/lspector/psh/demo/regression/PshApplet

Clojush enhancements: artificial ant, krypto, tg8, decimation

Thursday, October 14th, 2010

A bunch of new stuff in Clojush, which you can find at github: http://github.com/lspector/Clojush

20101014: - [Artificial ant, krypto, tg8, decimation]
          - Added articial ant example from Koza (via Luke).
          - Added "tg8" integer symbolic regression problem.
          - Added krypto number puzzle example.
          - Added pushgp "decimation" feature, in which elimination
            tournaments, conducted after fitness testing, reduce the
            size of the population to a specified fraction of its original
            size (specified in a decimation-ratio argument to pushgp;
            the tournament sized is specified via decimation-tournament-size).
            The ordinary tournament-size parameter is still used for subsequent
            selection from the decimated population. Any specified trivial
            geography applies both to decimation and to subsequent selection.

Clojush updated, on github

Thursday, September 23rd, 2010

Clojush has been updated in several ways (see below and note that a couple of changes will require changes to user code) and it has a new home on github. The new URL for the project is:

http://github.com/lspector/Clojush

Recent updates:

20100918: - Created Eclipse project.
          - Deleted re-load/backtrace utils.
          - Removed shuffle, as it is now in clojure.core (in Clojure 1.2).
          - Removed gratuitous def in define-registered.
          - Added atom for instruction-table.
          - Added atom for registered-instructions; NOTE: requires user
            code that refers to registered-instructions to refer to
            @registered-instructions instead. (Example odd.clj changed
            to reflect this.)
          - Added to-do item "Convert structs to records, which should be
			faster. Experiments with Clojure 1.2 show this to be faster
            but there are not good examples yet to serve as the basis for
            changes.
          - Added atoms for global-atom-generators and
            global-max-points-in-program.
          - Changed pushgp to take keyword arguments rather than a parameter
            map. NOTE: this requires calls to pushgp to be written differently.
            Updated examples to reflect this.
10200921: - Removed random-element in favor of rand-nth.
          - Cleaned up indentation, miscellaneous other cosmetic things.
          - Added namespaces for all example files.
          - Updated README to mention requirement for clojure 1.2 and to
            remove mention of ClojureX which has been discontinued.
          - Converted structures to records (a clojure 1.2 feature, should
            be faster).

Psh v1.0

Wednesday, June 9th, 2010

Today we have released Psh v1.0. This version should be stable and mostly bug-free. This version is has an updated license, using the Apache 2.0 License.

In addition, one minor change has been made. PshGP now keeps track of the number of individual evaluations it performs and displays the running total during generational and final reports. This can be used to compare the amount of computation required by different runs of PshGP.

In order to download Psh v1.0, please visit http://github.com/jonklein/Psh/tree/v1.0, and for the most recent changes visit http://github.com/jonklein/Psh.

New Psh License

Sunday, June 6th, 2010

Psh has now been re-released under the Apache 2.0 license. The new license features some changes from the previously used GPL license.

In addition, a few changes have been made as to required parameters in .pushgp files. These changes have been made to make it easier for beginners to get started with genetic programming runs without worrying about the unimportant details.

As always, the source for Psh can be found at http://github.com/jonklein/Psh .

New Psh Instructions

Saturday, May 29th, 2010

Among other minor things, I’ve recently added a few new instructions to Psh, including those for the new Input stack. Here’s a list of the additions:

integer.abs
integer.neg
float.abs
float.neg
float.sin
float.cos
float.tan
boolean.and
boolean.or
boolean.xor
boolean.not
input.inall
input.inallrev
input.index
input.stackdepth
input.makeinputs#

Clojush subst subst

Wednesday, May 26th, 2010

The 20100526 update to clojush incorporates this change:

20100526: - Reimplemented subst to use clojure.walk/postwalk-replace. Also
            fixed comment, which described args backwards. (The behavior
            was correct, emulating Common Lisp subst.)

Clojush random integer fix

Sunday, May 2nd, 2010

The 20100502 update to clojush adds a fix for calls to the random integer generator with bignum arguments, which would previously cause an error. They’re still not handled in the best possible way (which would be to return integers in the full specified range, even when the argument is a bignum), but at least they don’t cause errors. More specifically, in the update but arguments greater than 2^31-1 are treated as if they were 2^31-1 (2147483647).

Several new Clojush updates

Sunday, April 18th, 2010

I’ve just posted a new version of clojush, the version of Push3/PushGP implemented in Clojure, which includes several significant changes, listed below.

BTW I’m also considering moving the distribution to github, but I haven’t done that yet…

20100320: - Added print-ancestors-of-solution parameter and code.
          - Print simplification in report only with non-zero value for
            report-simplifications parameter.
          - Added sextic polynomial regression example. This example also demonstrates
            the use of fitness penalties for abnormally terminating programs.
	  - Added a new argument to problem-specific-report (NOTE: not backward
            compatible).
20100417: - Added thread-local random number generator objects to avoid contention.
          - Print parameters at the start of pushgp.
          - Added readme comments about concurrency, -Xmx2000m, and -XX:+UseParallelGC.
          - Converted time limit code to use System/nanoTime (thanks to Brian Martin).
            This means that time limits must now be expressed in nanoseconds rather
            than milliseconds, and I believe it will eliminate contention for shared
            Date objects (but this should be checked; if there is contention then 
            we should revert to using Date and use thread-local date objects as is
            being done with the random number generator objects).
          - Added print-return utility function for debugging.
          - Added a new Push instruction, code_wrap, which pushes a 1-item list 
            containing the previous top item of the code stack.
          - Added a new Push instruction, code_map, which acts much like Lisp's (or
            Scheme's or Clojure's) "map" functions, using the top item on the exec
            stack as the function to map and the top item on the code stack as the
            list to map it down. The list of results is left on top of the code
            stack. This is implemented as a "macro" instruction that expands into
            a Push code fragment that: 1) for each item in the list on top of the
            code stack (or for the single non-list item that is there) quotes the
            item onto the code stack and then runs the code that's on top of the
            exec stack; 2) uses code_wrap to push a list containing the last result
            onto the code stack; 3) executes as many instances of code_cons as are 
            necessary to add all of the other results onto the list. Note that this
            will act like an ordinary "map" function only when the code on the exec
            stack leaves a single output on the code stack in place of each input on
            the code stack; if it consumes or produces more or less code then the
            effect will be quite different.