2010 Convolutional Neural Nets
Convolutional Neural Nets operate on raw data from senses using some basic numerical techniques which aren’t that hard to understand. It is possible these same techniques are employed in our brain. Therefore the concern is that if these CNN have a similar ability to abstract things as a human brain does that it is simply a matter of scale to take a CNN system to the point it eventually can abstract up reality to the point it appears to be a general purpose learning machine, i.e. a human intelligence.
Part of this makes sense. The way these CNN work is that every 2 layers of the CNN “brain” create abstractions and then select. The more layers the more abstractions. This is clearly how the brain works. The brain is an abstraction building machine. It takes raw sensory data and keeps building higher and higher level abstractions that we use to understand the world around us. We also know the cortex of the brain is composed of layers (6 in the human brain) so layers seems like a workable hypothesis as a way to build a brain.
In a facial recognition algorithm using a CNN the first 2 layers of the “artificial brain” match raw stimula and try to find first level abstractions. At this first layer for a facial recognition algorithm the neurons will be able to recognize local phenomenon like an eye, a nose, an ear of different shapes and perspectives. The more neurons the more abstractions are possible to be formed, the more possible classifications the CNN can generate. However, a large number of these abstractions will be meaningless not well performing abstractions we should discard as they don’t represent abstractions that perform well (they don’t work time after time to help us understand but are just random coincidences in the data we sent to the brain). The next layer of the CNN does a reduction (filter) essentially allowing the use of only the best performing abstractions from the previous level. So, the first 2 layers give us the ability to abstract (recognize) some basic features of a face. The next 2 layers operate to recognize combinations of these features that make up a face and then select the ones that produce the most reliable “face” recognition. In the facial recognition example those next 2 layers may recognize that combinations of eyes, lips, ears and other shapes are consistent with a face, a tractor, a spoken word. So, by using a 4 layer CNN we can recognize a face from something else. The next 2 layers may abstract up facial characteristics that are common among groups of people, such as female or male, ethnicity, etc.
The more layers the more abstractions we can learn. This seems to be what the human brain does. So, if you could make enough layers would you have a human brain?
Each neuron in that first layer postulates a “match” of data by looking at local raw data that is in proximity to the neuron. The neuron tries to find a formula that best recognizes a feature of the underlying stimuli and that consistently returns a better match to the next set of data that is presented. This neuron is presented with spatially invariant data across the entire field of data from the lower layer. In a Deep Belief Network (a type of CNN) a markov random mechanism is used to try to assemble possible candidate abstractions. As stimula are presented to the system these postulates evolve and the second layer “selects” the best performing matching algorithms basically reducing the number of possible patterns down to a more workable number of likely abstractions that match better than others. It does this simply by choosing the best performing neuron of all the local neurons from the lower level neurons it sees. The algorithm each neuron uses is a function of estimating the error of the match from a known correct answer and for the neuron to adjust its matching criteria using a mathematical algorithm to rapidly approach the correct best answer. Such a mechanism requires having what is called a “labeled” dataset in which the actual answer is known so that the error function can be calculated.
The neural net algorithms work best when trained with labeled datasets of course that allow the system to rapidly approach the correct answer because the system is given the correct answer. This way of learning is not that much different than what we do with Machine Learning (another branch of AI.)
Another important advance in Neural Network design happened in 1997 which involved creating a Long short-term memory neuron which could remember data. This neuron can remember things indefinitely or forget them when needed. Adding this neuron type to a recurrent CNN has allowed for us to build much better cursive text recognition and phoneme recognition in voice systems.
Deep Belief Networks
A variation of CNN called Deep Belief Networks operate by using probabilistic techniques to reproduce a set of inputs without labels. A DBN is trained first with unlabeled data to develop abstractions and then tuned with a second phase of training using labeled data. Improvements of this type of neural net have enabled the best results of any CNN quickly and without having to have massive labeled data sets for training which is a problem. A further enhancement of this technique involves during the second phase of learning actively managing the learning of each set of layers in the neural network to produce the best results from each layer. This would be equivalent to learning your arithmetic before learning your algebra.
In humans we do have an analog to the DBN and “labeled” dataset approach to learning in the sense that we go to school. In school they tell us the correct answers so our brains can learn what lower level abstractions, what neurons to select as producing the better answers. As we go through school we fine tune our higher level abstractions. All day the human brain is confronted with learning tasks in which the human learns the correct answer and could be training its neurons to select the best performing abstraction from the candidates we build in our brain. Labeled datasets by themselves are not a discriminating issue in the difference between human brains and CNN.
CNN and DBN and variations on these neural network patterns are producing significantly better results for recognition problems