Perceptrons Revisited (Minsky & Papert) 
Perceptrons Revisited (Minsky & Papert) 
kortikal 
Apr 10, 2007, 10:35 AM
Post
#1

Awakening Group: Basic Member Posts: 198 Joined: Jan 22, 2006 Member No.: 4755 
Although it was known for a decade that simple perceptrons were limited in their ability to classify some patterns, it was not until Minsky and Papert published Perceptrons in 1969 that the extent of these limitations were fully realized. In fact, it was with this publication that the connectionist tide was stemmed[*] (at least for a while). Instead of asking if neural networks are good, Minsky and Papert asked the question ``what are neural networks good for?'' This is clearly a computational level question aimed at identifying the limitations of the representational abilities of perceptronlike networks. As Minsky and Papert point out in their prologue to the 1988 edition of Perceptrons, ``No machine can learn to recognize X unless it possesses, at least potentially, some scheme for representing X.'' (p. xiii; their italics).
Hence, their approach to the study of neural networks was based on studying the types of problems that were being proposed at the timemainly visual pattern recognition [81]. In doing so, they discovered that some pattern recognition problems (e.g., distinguishing triangles from squares) were relatively easy and could be computed by simple networks. Conversely, some problems (e.g., determining if a figure was connected or not) were extremely difficult and required large networks to solve them. The main distinction between these two types of problems was not the size of the pattern space, but the concept of order [74], p. 30. In general, the order of some function is the smallest number k for which we can find a set of predicates satisfying an equation which is a simple predicate, and is the set of all predicates that are linear threshold functions. It should be noted that the order of is a property of alone, and not relative to any particular . Functions that have an order of 1 are called ``linearly separable'' and can be solved by a single layer perceptron. The types of pattern recognition problems that gave simple perceptrons trouble were those whose order was greater than 1. These types of problems are termed ``linearly inseparable'' and require a layer of processing units between the input and output units. At the time, however, there was no reliable method of training this intermediate level, and therefore perceptrons were limited to being trained on linearly separable problems only. Minsky and Papert [74] used a very simple and elegant example to show the practical limitations of perceptrons. The exclusiveor (XOR) problem (see Figure 5) contains four patterns of two inputs each; a pattern is a positive member of a set if either one of the input bits is on, but not both. Thus, changing the input pattern by one bit changes the classification of the pattern. This is the most simple example of a linearly inseparable problem (see Figure 7). A perceptron using linear threshold functions requires a layer of internal units to solve this problem, and since the connections between the input and internal units could not be trained, a perceptron could not learn this classification. And, if perceptrons failed on this small pattern set, what hope was there for larger pattern sets that were also linearly inseparable? Furthermore, Minsky and Papert lay out other limitations of networks. For example, if a network is to solve a problem with order R, then at least one partial predicate must have as its support the whole space R. In other words, at least one internal unit must be connected to each and every input unit. This network configuration violates what is known as the ``limited order'' constraint. Another limitation that Minsky and Papert discuss is the growth of coefficients. For linearly inseparable problems, the coefficients (i.e., weights) can increase much faster than exponentially with R. This leads to both conceptual and practical limitations. Conceptually, although the behaviour of a network may be ``good'' on small problems, this behaviour may become profoundly ``bad'' when the problem is scaled up. Practically, for very large R, the amount of storage space required for the weights would overshadow the space required to simply represent the problem. Although advances in neural network research have produced methods for training multiple layers of units (e.g., [92] [93]), many of Minsky and Papert's concerns remain unanswered. Networks using linear threshold units still violate the limited order constraint when faced with linearly inseparable problems (but see section 4.8). Furthermore, the scaling of weights as the size of the problem space increases remains an issue [31]. 
LoFi Version  Time is now: 21st October 2018  07:50 AM 