The connectionist modeler's toolkit: A review of some basic processes over distributed memories
Janet Wiles
Departments of Computer Sceince and Psychology
University of Queensland
Abstract
The paper takes the view that the central difference between the basic processes of traditional symbolic computation and connectionism lies in the assumptions about the representation of information in memory - unique storage vs distributed, superimposed storage. These assumptions about memory storage have implications for how structure can be represented in memory and how information is accessed. They also have implications for how learning can be modeled, and influence the ways that a theorist tends to think about time in computations - the role of control structures for sequencing instructions, and the representation of temporal aspects in the memory for events in a sequence. The differences between connectionism and traditional symbolic approaches to memory storage are not superficial for cognitive modeling. The method of comparison is to study the primitives - the basic processes that are provided by a formalism as a whole, in contrast to the ones that are used in any one specific model.
In this paper I introduce the idea of primitives; compare traditional symbolic and connectionist frameworks; and present a list of basic connectionist methods for implementing ideas about memory, structure and representations of time. The paper is also intended as a step towards establishing a comprehensive set of primitives for connectionist cognitive modeling.
1. Introduction
The topic explored in this paper is one that is not often discussed in psychology, although in computer science it is fundamental to the design of programming languages. The topic concerns the basic computational processes used in cognitive modeling.
Psychological theories of cognition assume the operation of basic computational processes, such as memory storage and retrieval, comparison, and search. The question raised for this session was to compare connectionist and symbolic techniques for modeling such basic cognitive processes. In exploring these issues, I discovered that some processes are easy to implement in the traditional symbolic framework, but not in a connectionist one. Conversely, there were some processes, such as learning events in time, that are easy to implement in connectionist frameworks, but not in the traditional symbolic approach.
Example: A language processing task
Task: Consider the task of listening to sentences presented one word at a time. The first stage of a model might be to develop a perceptual module which extracts relevant information from the stream of words for other modules to process. For generality, it would be desirable if such a module could also serve as the first stage for models of other cognitive tasks, such as memorizing lists of items.
What sort of things might this module do? A simple function might be to store the sequence. That is, remember every word, exactly as it appeared, in the order in which it appeared. Such a module would provide a permanent record of the sentences, but nothing further in terms of extracting information. A more useful function might be to add some structure to the stream, for example by breaking the word sequence into sentences, or a parse tree.
The sub-part of the language task that the modeler selects for this module depends on what is thought to be computationally possible within one module, and in communication with adjacent modules (as well as any information we have from human data).
Consider the idea of a parser: A grammar could be specified a priori, or be provided by another module, or be learned within this module. Within the symbolic paradigm, it is well known (since Chomsky's work in the 1950s), that grammars cannot be learned from a tabular rasa. Hence, for a first approximation in modeling this module, the modeler might decide that the grammar will be provided from another module, and consequently that it is not necessary to consider its origins in order to design this module.
The two paragraphs above reflect initial thoughts towards the first level of a Marr-style analysis. The first level involves an understanding of the task - "What does the module have to do?". The second level would involve specifying the algorithm (how the task is accomplished in computational steps), and the third level would involve specifying how the steps are to be implemented (e.g., in silicon or neurons).
Note that the vague initial description "extract relevant information" does not tightly define the task. A tight description would require a specification of the information in the sequence, and its representation (how it is described); the information to be extracted, and its representation. In modeling language processes, we rarely know what the modules are, what information they process, or how it is represented. Yet these are decisions that the modeler must make, either implicitly or explicitly.
Within the symbolic formalism, it is easy to design a module that would store the words individually, or in sequence. Likewise, grammars are well understood, and parsing is relatively straightforward. By contrast, _learning_ is less well understood, so in a modeling project, we might begin with the assumption that the focus of the model would be memory for the sequence, or parsing, and that learning is outside the scope of this model.
Within a connectionist formalism, the situation is quite different. Connectionist networks are not good at _veridical storage_ (i.e., storing an exact trace), but they are designed to process information. Connectionist networks typically incorporate learning, and multi-layer networks have the capability of creating their own representations based on the patterns they process.
An example now familiar to many people is Elman's (1990) Simple Recurrent Network (SRN) model of word prediction in sentences.
The SRN consists of three layers of units: input, hidden and output. There are modifiable weights from the input to hidden layers, hidden to output layers, and recurrent connections on the hidden layer. These recurrent connections provide a mechanism for the network to use information about past events in a sequence (effectively, providing a representation of the temporal context in which each input word is encountered). The weights in the network are trained using backprop through time for one time step - on each presentation of a word (the input) and its successor word (the output), the weights are changed slightly to cause the word to produce that successor on a subsequent trial. By repeated training over a large corpus of sentences, the SRN extracts information from the stream of items that enables it to perform the prediction paradigm: Using each new item and the temporal context in which it occurs, the network acquires an ability to predict the next item in the sequence, and to refine its representation of the temporal context. The SRN learns relationships between items, and it represents this information as probabilities on the output units. In the process, it learns a simple grammar that underlies the sequence, and represents this information as a state transition function in recurrent weights in the hidden layer. The network induces a similarity structure over the input items which is shown in the similarity between vectors in the hidden layer representations.
From a symbolic perspective, the SRN performs a complicated task - that of a grammar induction. But from a connectionist perspective, the SRN is just composed of a series of layers of units, connected by modifiable weights. It can be thought of as three representation spaces (input, intermediate and output), with transformations between them. Perhaps the most interesting aspect of the architecture is that the network learns an appropriate representation for the intermediate space (i.e., the hidden layer) which includes an ability to represent the temporal context of an item. In comparison to many cognitive models, there is very little complexity in the architecture, but the basic components are quite different from the storage and parsing components in a symbolic model.
What basic connectionist components were used in this model?
Let us first consider how the network implements the cognitive processes mentioned at the beginning: memory storage and retrieval, comparison, and search. We can discuss these processes with respect to each transformation (i.e., the processing performed by each layer of weights), and also with respect to the recurrent network as a whole, which performs computations that are more powerful than any one individual transformation.
Memory storage: In Elman's simulation, a word input to the network is the stimulus for the production of the next word in the sentence. For example, in the training corpus, "cats" at the start of a sentence can be followed by any plural verb, cats chase, cats eat, cats run, etc. Under such a prediction paradigm, there is no need for the network to store the information that the current word is "cats". It only needs to produce the probability distribution for possible next words. Thus, the memory of the network is based on the functionality required by the task. The information needed to produce this output is stored at two places in the network - in the weights of the network, and in the recurrent activations in the hidden layer. Combination of both pieces of information is required to produce the correct probability distribution. The information from all learning trials is superimposed over the same set of weights. In this type of "memory", there is no individual trace of each learning trial, only the sum total of the traces of all trials.
Retrieval: The network as a whole transforms a word (the input) into a prediction for the next word in the input sequence (the output). The network function as a whole can be considered cue-based access, and hence a type of retrieval, but instead of retrieving a particular word, it retrieves a probability distribution of possible next words. Alternatively, we can analyse the transformations performed between each layer of the network. Each layer of modifiable weights transforms its own input pattern into an output pattern. Each input-output transformation is affected by information acquired over all trials of learning.
Comparison and search: In the task performed by Elman's network, there is no need for direct comparison. After training, a word will elicit the same output as a previous word in the same temporal context solely on the basis of the similarity in its representation to the previous word. Note that just as there was no storage of individual events during learning trials, so also there can be no retrieval of individual events, and hence, no processes such as sequential search through a set of unique traces.
In summary, the basic components that from a symbolic point of view we might use for a language processing task - memory storage, retrieval, comparison and search - are actually not implemented in any direct way within the recurrent network.
Instead, the processes that are provided by the network are not ones that we usually think of as simple or basic components: These include the ability for the network to generate probability distributions of outputs given an input; create and use a representation of the temporal context of an item in a sequence; and induce a simple grammar from the input sequence. The first of these properties is a property of each layer of weights; the second is a property of the recurrent connections; and the third is a property of the network as a whole. If we were to begin modeling using basic cognitive components from a symbolic framework, we would not find such a simple solution.
Such examples prompt the question, why do theorists think of memory storage and retrieval, comparison and search as "basic" processes? Where do they originate? Such decisions are rarely explicitly justified. Rather, processes are derived from a variety of sources: ones used - or assumed - by other theorists; some from folk psychology; and some from computer science.
The resurgence of connectionist networks in psychology in the past decade has provided additional computational processes for the development of models. The question arises as to what are the similarities and differences between the basic processes provided by traditional and connectionist formalisms. The next sections prepare the ground by summarizing the constituents of a set of basic processes and listing aspects of traditional and connectionist computation.
Primitives and combinations
What's in a formalism?
A theory can be expressed in many ways, but every description requires a language to be expressed, be it English, or LISP, or a connectionist network. Formal languages are defined in terms of a set of basic processes.
Connectionism and traditional symbol manipulation provide two formalisms in which psychological theories of cognition can be expressed. Connectionism and symbol processing have the same computational power (in the formal sense of the class of functions that they can compute). However, in cognitive modeling the qualitative aspects of computation are usually of primary interest (i.e., aspects that make the processes the modeler wants to model easy to express).
What constitutes an understanding of computation?
There are several levels of understanding 'computation' (cf Marr's (1982) 3 levels of "task", "algorithm" and "implementation"). In this paper, the level addressed is the algorithmic level: I take this level to consist of a specification of the finite set of elementary or irreducible processes provided by a formal language (i.e., the primitive processes); and an understanding of the ways in which the primitives can be combined to model a process. Note the distinction between the elementary processes (primitives) provided by a formalism, and the composite processes constructed by a combining primitives. Although, the set of primitives is finite, the set of potential composite processes can be infinite.
I emphasize the role of primitives as the building blocks of all cognitive models because without a restriction on their number, it is possible for a psychological theory to assume that all processes are primitive. That is, each one could be assumed to be directly implemented, and not composed of other processes. Such assumptions preclude the possibility of cross-paradigm generalization and ignore the ways in which primitives can be combined (e.g., the control structures of memory), and so exclude them from the scope of any particular model.
By what methods does a modeler select a set of primitives?
In a formal language, the designer of the language specifies the primitives. The computer programmer, working from primitives, is like a painter who is given a pure set of colours and builds up an image using successive layers of paint. The colours in the final painting are derived solely from combinations of the original colours. Empirical psychology, by contrast, is more like a wood carver, who works from the outside to reveal the statue within. The analogy serves to underline the point that the methods of selecting primitives may be unfamiliar to many empirical psychologists. (Footnote. Thanks to Guy Smith for this analogy.)
In cognitive modeling, one way of selecting primitives is to study cognitive tasks, and propose that the processes that solve the tasks are basic ones. A typical flow chart of the human information processing system might start with three general modules - perception (a way of receiving information about the world), central processing, and motor control (a way of performing actions in the world). See, for example, Norman, 1985, Figure 14.1 and Table 14.1. The essential elements of central processing might be further catalogued into phenomena related to topics such as attention, control, representation, categorization, recognition, recall, learning, memory, reasoning and decision making, problem solving, creativity, and language (the list is not necessarily complete - it is intended as a guide to topics - this selection was taken from Glass & Holyoak, 1986).
Over years of modeling research, processes have been developed (explicitly or implicitly) that are assumed to underlie such tasks. For example, in memory research, fundamental processes include comparison, storage, search, retrieval, and selection. Symbolic models seem to have direct (or by now at least well-developed) ways of implementing these processes. In fact, they could be considered basic processes of symbolic artificial intelligence. By contrast, in connectionist networks, there do not seem to be ways of implementing these processes directly. There are indirect ways, but they are not as intuitively obvious as they may be in traditional symbolic formalisms.
It is worth while questioning whether these are really the basic tasks of cognition. An alternative is that they have been adopted from symbolic computation through lack of alternative basic processes. After using symbolic processes for a while, one becomes used to them, and is tempted to ask - what else could there be? The sheer difficulty of implementing traditionally accepted processes in connectionist networks contributes to the suspicion that alternative processes may be equally or even more effective for computing the basic tasks for cognitive modeling. We return to the question of selecting primitives in a later section.
We can now refine the original question raised in this paper to whether there are important qualitative differences in the primitives and the combination mechanisms that are provided by connectionism compared to those provided by traditional symbolic approaches to computation.
Such a comparison is not a straight forward task. There is no convenient list of the primitives of each paradigm, nor combination processes. In fact, there is likely to be little agreement on any such list: Since both are functionally equivalent, either could mimic the way in which the other is used to model a task. Does this mean the distinctions are irrelevant? I think not, because, as in any programming language, the primitives provided in a formalism make certain processes easy to model, and others harder. This qualitative aspect of what is easy and hard to model using a particular formalism is, in my view, one critical difference between the connectionist and traditional symbolic paradigms.
How can connectionism and traditional approaches be compared?
The example above serves to illustrate how the difficulty of tasks varies between symbolic and connectionist frameworks. Connectionism is a formalism in which learning is inherent. Thus, the slow and incremental process of creating new representations is easy to model in this framework. By contrast, in a traditional symbolic approach, an explicit process is required to implement ideas about learning new representations. This difference in the ease of implementing such ideas may account for the relative lack of interest in traditional approaches to modeling representation construction. In typical Artificial Intelligence models representations are specified prior to modeling.
As a final point, it seems almost contradictory that specification of a set of primitives (which is a critical assumption from a formal language point of view) is beyond the scope of current psychological theorizing. How could one create a model without knowing the primitives from which it is constructed? This paper does not solve the dilemma, but it does explicitly acknowledge the central role of both the primitives and combination mechanisms used in developing and describing models. Both are implicit in theorizing anyway, but an explicit statement makes evaluation and any potential for alternatives clearer.
3. More about memory
3.1 Unique storage (the "pigeonhole" principle)
Footnote: In mathematics, the "pigeonhole principle" refers to a particular proof strategy. Here it is not used as a technical term - it is purely an analogy to illustrate individual storage of items.
In traditional approaches, items in memory are stored in unique locations (analogous to letters being sorted into pigeonholes). This sort of storage allows information to be _accessed_ using a variety of well-understood techniques, such as a) address-based storage - if an address is known, then the information can be found, as it will be in the pigeonhole in which it was stored; b) search processes - look in every pigeon hole, or c) use a system to structure which information goes in which pigeonhole.
Pigeonholes may be arranged alphabetically like a phone book, or according to a more complex code, like the Dewey system in a library. The essential feature is that the information stored is located at a specific address, analogous to a library book, which can have only one physical location at any one time, and vice versa, a location can only be occupied by one book at a time.
By allowing structure in the representation, other types of memories can be constructed. For example, to use address based storage for an associative memory, the address at which information is stored is a function of the information itself (i.e., a hash table).
Unique storage makes it easy to determine whether a specific item has been stored, as may be required for an old-new decision in list processing. However, a property of several items (such as the prototype or average of a set of values) requires accessing multiple addresses (serially or in parallel) and calculating over the items retrieved. Even languages like LISP, which operate over lists of items, have their lowest level of representation as the unique items.
3.2 Distributed, superimposed storage (the "holographic" principle)
In connectionist approaches, there is no unique location for the storage of information - information is typically distributed and superimposed (for which we have no simple analogies, although physical systems such as holograms give some intuition, since they also do not have unique storage of information).
In a connectionist model, an item is represented as a vector. Any one, or all of the elements may be essential to the information represented by the vector. A set of vectors (like an unordered list of items) is stored by superposition. Superposition is a direct way of implementing the idea of associative memory in that part of a pattern can be used to retrieve the remainder, and noisy elements can be corrected.
With such a distributed memory, prototypes emerge directly. The disadvantage of this property is that it is not possible to store similar items without interference. The problem can be circumvented by using additional information to disambiguate items, and hence it is possible to contrive a way to support unique storage. For example, by associating each item with a unique (orthogonal) key, an address has been effectively created.
The division between unique and distributed memory does not neatly divide connectionist and symbolic systems, as models developed in the symbolic tradition can use the distributed view of memory and vice versa as we have seen. The point of this paper is neither to eulogize nor crucify the connectionist or symbolic position - either approach can be adopted within either framework, as the division is at best an approximate one that describes the majority of models produced in either formalism. The main point is that in connectionism it is relatively easy to construct models based on storage by superposition, whereas in traditional programming languages it is easier to construct models based on unique storage of items.
4. What are the components of computation?
4.1 Symbolic components
The framework of symbolic computation is based on symbols, and symbol-manipulating processes. Indeed, it is questionable whether there is any kind of computation that is not symbolic, in a strictly formal sense. Hence, in this section I am not going to review the debate over symbolic vs sub-symbolic computation, but rather, examine the perspectives that the traditional symbolic approach feeds into ideas about computation. (For a review of the symbolic/sub-symbolic issues, see Smolensky, BBS; van Gelder, 19xx; and Hinton, 1990).
Below are listed aspects of the symbolic approach that can be considered in some sense "fundamental" and provide a basis for comparison of their counterparts in the connectionist approach.
i. Memory storage:
The basic assumption is unique storage, based on the idea of address-based organization. Information is encoded in representations with explicitly defined formats - and each piece of information is stored separately (i.e., at its own address). Note that unique storage of information is not necessary in traditional memory storage schemes, but it is conventional. A common example in cognitive modeling would be the representation of words in a language task: The standard ideas about a lexicon involve a unique entry for each word.
ii. The organization of memory into data structures:
There are relatively simple data structures such as arrays, lists, queues, and trees that are general purpose representations in that there are many ways to access and use the information in such representations. There are also more specialized data structures with either implicitly or explicitly associated algorithms such as semantic networks, expert systems or frames. Consider a data structure that might be used for a list of words. The data structure must enable information about the sequence of the words to be represented. In human memory models, such information might be encoded as explicit storage of the position of a word in a list, or chaining associations from one word to the next.
iii. Algorithms (or processes) that use the information in data structures:
Processes are required to store and retrieve information from data structures. Consider the list of words again. There are many ways that words could be ordered in the list. If new words were added to the end of the list, then recognition (a decision that a particular word was in the list) would require a search of the entire list. An alternative would be to maintain the list in a sorted order - a new word would be inserted at the appropriate position in the order, thus enabling a much faster retrieval process. These two ways of maintaining the list would require different list storage and accessing processes. Other examples include sorting, parsing, and search algorithms.
iv. Formalisms by which programs or algorithms are expressed:
Data structures and algorithms need to be expressed in a formal language. Examples include predicate calculus, grammars, production systems, and computer languages such as LISP, PROLOG, or PASCAL.
v. Machine hardware:
Systems of symbols obey the laws of physics - Newell & Simon (1976) coined the phrase "physical symbol system hypothesis" to emphasize that a computational system must be physically realizable. The machine hardware provides the physical instantiation of a computation in a parallel or sequential machine.
4.2 Discussion of the symbolic components
Much of the accumulated wisdom of the discipline of computer science lies in the vast collection of data structures and algorithms which are available for constructing programs (points ii and iii above). Rarely discussed, but fundamental to the the kinds of data structures and algorithms that are possible, are assumptions about the nature of memory itself - the way in which the simplest information is represented in the computer (or any information processing machine). I think that assumptions about memory are rarely spelled out in traditional symbolic approaches because virtually all data structures and algorithms share the same underlying assumption of unique storage.
Points i to v provide a glimpse of the perspective provided by the symbolic approach to computation, and hence traditional computational tools available for understanding human cognition. It is common rhetoric that connectionism provides a new paradigm for cognitive psychology. Basically, I take reference to a new paradigm to mean that there are new data structures and algorithms available for modelers to use.
Theoretical results tell us that the set of basic processes in general is not unique. The result tells us that although current processes may be sufficient, alternative sets will also be sufficient, which is good news if others turn out to be more convenient for cognitive modeling.
A caution in determining psychologically plausible components is that it is not possible to determine an optimal algorithm without reference to the representation used, nor an optimal representation without reference to the available algorithms. Points i to iii above separate the issues of memory storage, data structures and algorithms, however, these are not independent of one another. Unfortunately for the study of human cognition, they cannot be studied independently either. For example, one cannot model the search processes of memory, without knowing what the data structures of memory are like - how the information that is manipulated by the algorithm is actually stored. Sequential search has been proposed as one of the simplest ways of accessing information in human memory. The very assumption that sequential search is a possible access mechanism for information in memory relies on a deeper implicit assumption that there is some form of unique storage of items in memory.
4.3 Connectionist Components
In the example and discussions above, we have seen most of the aspects of connectionist computation. It provides a different framework for looking at computation to the traditional one. Essentially, the framework of connectionism is one of very high dimensional representational spaces, and transformations from one representational space to another.
i. Memory storage:
The basic assumption is that items are represented as vectors, and information about items, relations between items, and control processes is represented by vectors of weights. Memory in a connectionist model is a process for producing an output given a cue, rather than a passive system for veridical storage of items.
ii. Memory organization:
Both individual items and complex data structures over items are represented as vectors. Van Gelder calls this non-concatenative composition, in contrast to traditional approaches, in which data structures such as lists, trees or frames can be expressed as concatenative combinations of their individual components.
iii. Algorithms:
The composition processes required for creating complex structures in memory are embedded in a variety of aspects of networks: methods for combining vectors via the architecture (e.g., associations in matrix memories, and higher-order associations in tensor memories); methods for temporal composition, such as the recurrent connections in Elman's network which create a temporal context vector.
Each specific architecture has associated algorithms for learning and processing information. Examples include the matrix and tensor memories (Anderson, 1971; Humphreys et al 1989); multi-layer feedforward networks (Rumelhart, Hinton & Williams, 1986); associative networks such as the Brain-state-in-a-box (Anderson, Silverstein, Ritz & Jones, 1977) and Hopfield networks (Hopfield, 1982); self-organizing networks (Kohonen, 1982); recurrent networks (Jordan, 1986; Elman, 1990).
iv. Formalisms:
The formalisms of connectionism are the languages for describing network architectures and learning algorithms. These formalisms include mathematical descriptions, programs, simulators and natural language.
v. Hardware:
Implementation of connectionist networks for cognitive modeling is typically via simulation on either parallel or sequential machines. There are some specific hardware implementations of connectionist algorithms (e.g. a silicon retina built at Caltech).
5. The connectionist toolkit: a set of candidate processes
5.1 Introduction
In previous sections I have discussed the theoretical possibility of defining a finite a set of processes that could be combined to accomplish all cognitive tasks. Newell (1991) has called such a set a "cognitive architecture" and in the SOAR project, has proposed such a set based on a production system formalism.
As discussed in the first section, connectionism has not yet reached the stage of identifying and agreeing on a set of primitives and it is too large a task for one paper to review all the possible candidates with any rigor. As a start on such an endeavour however, in the Summary Table I have compiled a list of connectionist processes that I feel most markedly contrast with traditional symbolic processes: ones that relate to processes over distributed memory, higher order-structure in representations, and the treatment of time. Such a compilation is my own idiosyncratic set, drawn from a variety of sources and modeling projects. If you are a reader who is familiar with connectionist networks, you may care to compare them with processes that you consider basic to your own modeling tasks. For novices the table may serve to alert to the range of processes available.
5.2 Methods for determining basic processes: memory, structure and time
As mentioned in the section above, the Summary Table is intended only as an initial step towards a list of primitives.
By what methods could one arrive at a more complete list?
In section 2.3 it was suggested that a first attempt might be to assume that there is one primitive for each of the basic tasks of cognition. However, later sections queried whether we really know what are the basic tasks.
An alternative method involves converging on a set of primitives by analysing specific experimental paradigms, extracting the psychological tasks involved, and then identifying a set of primitives that can be combined to solve the tasks. The first aim in such a project is to find a set that will suffice for the tasks. These primitiveis can then be refined as more tasks are studied, and with the growth of understanding of the behaviour of models constructed using the initial set of primitives. Section 1 of the Summary Table is based on two such projects with colleagues at the University of Queensland that used converging lines of research to find a set of primitives for tasks in human memory (for a first attempt based on memory and analogical reasoning tasks, see Wiles, Halford, Stewart, Humphreys, Bain & Wilson, 1991; for a more complete analysis of the memory tasks, see Humphreys, Wiles & Dennis, in press).
A second method for determining primitives is to directly study the computational requirements of complex tasks. In a widely influential paper, Fodor and Pylyshyn (1988) raised questions concerning the ability of connectionist models to perform structure sensitive processing. The issues relate to the ability of networks to represent and process structure over an entire domain, as well as structure over elements within a domain. For example, given a system that can represent objects, if it can represent "blue", and "triangle", then it needs to be able to represent "blue triangle". That is, it needs to be able to compose representations. The property is called "compositionality". Not all methods of composing representations are useful, however. A further property is required, that of being able to retrieve the component properties from the composite representation. Thus, it should be possible to infer "blue" from the representation "blue triangle". In addition, given a representation of blue triangles and red squares, it should be possible to represent blue squares. This property is called "systematicity".
Fodor and Pylyshyn rejected claims that connectionism provides a new way to model cognition, stating that if it could perform structure sensitive processing, then it provided an implementation of traditional symbolic processing, not an alternative. I do not want to debate the arguments here, just draw attention to their method (which is a common one in computer science). Note that the claims were not based on any specific cognitive task, as in the memory study referred to above, but rather, they were focusing on the intrinsic computational properties of connectionist networks. In subsequent years, many connectionists have addressed issues related to structure sensitive processing - some in direct response to Fodor and Pylyshyn's criticism, others having perceived the need for connectionist models to demonstrate the capabilities of systematicity and compositionality (Hinton, 1990).
From these studies, an array of techniques have been proposed and explored for representing and processing structured objects based on distributed representations. The simplest relationships are associations between two items (one of the primitives proposed in the memory study). More complex representations are required for higher cognitive functions. Section 2 of the Summary Table lists techniques for representing relationships __between items within a domain__ (e.g., blue and green within the domain of colour; or adjacent locations in a spatial map); and __between domains__ (e.g., the combination of colours and shapes in representing a scene; or nouns and verbs in representing a sentence). Some of the studies have been concerned with issues of how such structure-sensitive representations can be learned.
The toolkit of processes would not be complete if we stopped after just the study of memory processes and higher order structure. There is also a need to study the representation of temporal information and the types of combination mechanisms that are available for constructing processes. The area of representing and processing time is one in which connectionist research has perhaps the most promise in providing alternative ideas for basic processes: E.g., one interpretation of words in Elman's simulation is that they behave like "operators" that push around the internal state of the SRN. Words are dynamic, rather than passive symbols in such a model (Wiles & Bloesch, 1992).
In section 3 of the Summary Table, I touch on three aspects of time: the representation of events in time; implicit effects of time in developmental sequences; and control processes.
Although there is insufficient space to discuss these aspects in this paper, I think that an adequate assessment of connectionist networks as computational devices, or as formalisms for cognitive modeling cannot be made without considering the topics covered in sections 2 and 3 of the Table - the structure of representations in memory, and the treatment of time in connectionist formalisms. Analyses of connectionism and psychology (such as Quinlan's (19xx), while excellent in the research that they do cover, have glossed over the debate on higher-order structure and recurrent networks, and in omitting these architectures from consideration have omitted what I believe to be the most exciting part of what connectionism has to offer psychological modeling. However, understanding of how to apply such ideas to modeling psychological phenomena is only in its infancy.
6. Conclusions
Are the basic mechanisms in connectionist and traditional approaches really any different? Is there anything that connectionism provides, that traditional approaches do not and vice versa?
In this paper I have addressed this question by focusing on the role of primitive components in symbolic and connectionist formalisms. This has necessitated explaining what primitives are, contrasting the primitives in each and comparing what is easy and hard within each formalism.
In summary, the benefits of connectionist networks are that they challenge assumptions about unique storage by providing an alternative method of representing information - that of distributed storage. They allow the integration of learning into processing, and the construction of representations. They provide new ways of thinking about how to represent the structure of temporal information, by allowing recurrent networks to create their own representations. Connectionist networks are currently among the best techniques known for temporal tasks such as time series analysis (Gershenfeld & Weigend, 1993).
The benefits of traditional systems are that they provide well-defined recursive data structures, and there is still nothing that rivals the utility of production systems for modeling higher order cognition. I think that connectionism may provide a novel perspective on control structures for cognitive models - but I am not aware of any models that have.
In the purist form, connectionist networks can be viewed as non-linear statistical techniques (though they are mechanisms used to generate data, not just to model it). However, being non-linear, there is comparatively little theory about their application, and hence most uses of connectionist techniques involve at least some research into the mechanisms themselves, as well as their application in a specific model.
==========================================================================
Summary Table - The connectionist toolkit
Basic cognitive Connectionist processes Example models
capabilities and representations and references
==========================================================================
1. Distributed Memory
a. Rep of items Vectors Rumelhart & McC 1986
dissimilar items orthogonal vectors
quasi-similar items sparse vectors
unique items local vectors
prototype context unit vector Humphreys et al 1989
b. Memory storage (superposition)
single items Vector addition Hebb 1949; Anderson 1971
c. Memory for item associations (outer product & superposition)
cue-target assoc Matrix product & addition Anderson 1971; BSB 1977;
Kohonen 1977;
Convolution Murdock 1982; Eich 1982;
Plate 1991
cue-target-context Tensor addition Humphreys et al 1989
higher-order Tensor (role,value) pairs Smolensky 1990
Tensor (n-tuples) Halford et al 1993
d. Memory access - linear Humphreys et al 1989
Matching (match is dot product of one or more cues with the memory tensor)
recognition match (cue, target & context)
familiarity match (cue & unit context)
Retrieval (inner product of one or more cues and memory tensor)
cued recall retrieval (cue & context)
free association retrieval (cue and unit context)
e. Memory access - nonlinear
Selection of an Brain-state-in-a-box; BSB 1977;
item from a Hopfield net Hopfield 1984
composite or
noisy vector
(cleanup process)
Intersection of sets scale vectors before memorizing Wiles et al 1991 (TR)
of cues or use an extra autoassoc memory
--------------------------------------------------------------------------
2. Structure in memory
a. Associations between items - see above (tensors)
b. Structure within a domain (i.e., relationships between different values
of the same variable)
Reln between items similarity between vectors
in same domain in hidden unit (HU) space Ghiselli-C & Munro 1993
Learning item rep
- supervised multi-layer nets & backprop RHW 1986;
- self-supervised auto-associators; encoders
- unsupervised self-org nets Kohonen 1982
c. Structure between domains (for supervised learning, structure between and
within domains is represented in the spatial structure in HU space)
Reln between items intersecting regions in HU Wiles & Ollila 1993;
in diff domains space Wiles 1993
Learning to separate
items into domains
- supervised multi-layer nets & backprop Plaut& McC 1993
- self-supervised auto-associators Brousse & Smol 1989;
& Min Description Length Hinton & Zemel 1993
--------------------------------------------------------------------------
3. Time - representation and processes
a. Representation of information in time Mozer 1993
Tapped delay line Buffered input units
State information Recurrent hidden layer Elman 1990
Sequence production Recurrent connections from Jordan 1987
output layer to input
b. Developmental sequences - stages of change due to:
Acquisition of info Standard bp Rumelhart & McC 1986
eg learning (can also change the statistics Elman & Ohare 1993
regularities of the training set)
before exceptions
Architectural change Increasing memory in context Elman 1991
units of SRNs
c. Control processes Recurrent nets Jordan 1987