Assignment 1

[CODE]

Introduction

This time I was given a much harder problem to solve. There is a 150 binary digit black box which outputs a fitness value. Unlike our first black box, I did not know what the maximum of the function was. This posed quite a challenge.

Implementation

For this problem I slightly modified my existing GA to handle 150 bit strings. In the process I removed some memory leaks which would make my program run out of memory after a very long run. The following will rehash the GA's internals:

I ran the provided "black box" function called "eval(...)" on each of the 150 bit strings to find their fitness value. Their fitness values were totaled up, and a percentage of fitness for each parent gene was found. Based on a gene's fitness, it was mated with other genes with high fitness to create a new pool of cross over genes. These genes contained part of each parent gene by selecting a random point in the genes between 1 and length - 1 and merging at that point.

Once the genes were mated and a new gene pool was created, every 150 bit string in the mating pool was ran though a mutation function which would statistically determine if a random bit in the gene should be flipped. This percentage number should be low, since mutation is used to prevent loss of genetic material from mating and crossover.

This process repeats for a specific number of generations, after which the best string found is printed out.

Data

First I tried short data runs, 3250 generations with a gene pool of size 2000 and a 0.01 chance of mutation. The best fitness I got was a max of 64 and an average of 64. [3250.txt] By looking at the file, you can see the GA was stuck at an average of 32 for many generations, then suddenly it got better and better until it peaked at 64. I tried longer data runs, but my results became worse and worse. I ran 20000 generations with a gene pool population of 3000 [20000.txt], and the max was 48 with a pathetic average of 32.

Conclusion

Using the GA that I used will not solve this problem. I considered adding a schema to the GA, but other students in the class mentioned that there was very little improvement in performance. Using a pattern based approach will not work on this problem. There are either many hills with many false peaks, or there are dependent bits. Additional in sight into how the black box works would greatly enhance the ability to solve this black box.