Humans can generate and comprehend a stunning variety of conceptual messages, ranging from sophisticated types of mental representations, such as ideas, inventions, and propositions, to more primal messages that satisfy demands of the immediate environment, such as salutations and warnings. In order for these messages to be transmitted and received, however, they must be put into a physical form, such as a sound wave or a visual marking. As noted by the Swiss linguist de Saussure (2002), the relationship between mental concepts and physical manifestations of language is almost always arbitrary. The words cat sat, and mat are quite similar in terms of how they sound, but are very dissimilar in meaning; one would expect otherwise if the relationship between sound and meaning was principled instead of arbitrary. Although the relationship between linguistic form and meaning is arbitrary, it is also highly systematic. For example, changing a PHONEME in a word predictably also changes its meaning (as in the cat, sat, and mat example).
Human language is perhaps unique in the complexity of its linguistic forms (and, by implication, the system underlying these forms). Human language is compositional; that is, every sentence is made up of smaller linguistic units that have been combined in highly constrained ways. A standard view (Chomsky 1965, Pinker 1999) is that units and rules of combination exist at the levels of sound (phonemes and PHONOLOGY), words (MORPHEMES and MORPHOLOGY), and sentences (words and phrases, and SYNTAX). Collectively, these rules comprise a grammar that defines the permissible linguistic forms in the language. These forms are systematically related to, but distinct from, linguistic meaning (SEMANTICS).
Linguistic theories, however, are based on linguistic description and observation and therefore have an uncertain relation to the psychological underpinnings of human language. Researchers interested in describing the psychologically relevant aspects of linguistic form require their own methods and evidence. Furthermore, psychological theories must not only describe the relevant linguistic forms but also the processes that assemble these forms (during language production) and disassemble them (during language comprehension). Such theories should also explain how these forms are associated with a speaker’s (or hearer’s) semantic and contextual knowledge. Here, we review some of what we have learned about the psychology of linguistic form, as it pertains to sounds, words, and sentences.
Sound units. Since the advent of speech research, one of the most intensively pursued topics in speech science has been the search for the fundamental sound units of language. Many researchers have found evidence for phonological units that are abstract (i.e., generalizations across any number of heard utterances, rather than memories of specific utterances) and componential (constituent elements that operate as part of a combinatorial system). However, there is other evidence for less abstract phonological forms that may be stored as whole words. As a result, two competing hypotheses about phonological units have emerged: an abstract componential one vs. a holistic one.
The more widespread view is the componential one. It posits abstract units that typically relate either to abstract versions of the articulatory gestures used to produce the speech sounds (Liberman and Mattingly 1985, Browman and Goldstein 1990), or to ones derived from descriptive units of phonological theory such as the feature (see FEATURE ANALYSIS) an abstract sub-phonemic unit of contrast; the phoneme, an abstract unit of lexical contrast that is either a consonant or a vowel; the phone or allophone, surface variants of the phoneme; the syllable, a timing unit that is made up of a vowel and one or more of its flanking consonants; the prosodic word, the rhythmic structure that relates to patterns of emphasized syllables; or various structures that related to tone, the lexically contrastive use of the voice’s pitch, and intonation, the pitch-based tune that relates to the meaning of a sentence (for reviews see Frazier 1995; Studdert-Kennedy 1980).
In the holistic view, the word is the basic unit while other smaller units are considered to be epiphenomenal (e.g., Goldinger, Pisoni, and Logan, 1991). Instance-specific memory traces of particular spoken words are often referred to as episodes. Proponents of this viewpoint out that while abstract units are convenient for description and relate transparently to segment-based writing systems, such as those based on the alphabet, there is evidence that listeners draw on a variety of highly detailed and instance-specific aspects of a word’s pronunciation in making lexical decisions (for reviews see Goldinger and Azuma 2003; Nygaard and Pisoni 1995).
Some researchers have proposed hybrid models in which there are two layers of representation: the episodic layer in which highly detailed memory traces are stored, and an abstract layer organized into features or phones (Scharenborg, Norris, ten Bosch, and McQueen 2005). The proponents of hybrid models try to capture the instance-specific effects in perception that inspire episodic approaches as well as the highly abstracted lexical contrast effects.
Processes. SPEECH PRODUCTION refers to the process by which the sounds of language are produced. The process necessarily involves both a planning stage, in which the words and other linguistic units that make up an utterance are assembled in some fashion and an implementation stage in which the various parts of the vocal tract, for example, the articulators, execute a motor plan to generate the acoustic signal. See Fowler (1995) for a detailed review of the stages involved in speech production. It is worth noting here that even if abstract phonological units such as features are involved in planning an utterance, at some point the linguistic string must be implemented as a motor plan and a set of highly coordinated movements. This has motivated gestural representations that include movement plans rather than static featural ones (Browman and Goldstein 1990; Fowler 1986, 1996; Saltzman and Munhall 1989; Stetson 1951).
SPEECH PERCEPTION is the process by which human listeners identify and interpret the sounds of language. It too necessarily involves at least two stages: 1) the conversion of the acoustic signal into an electrochemical response at the auditory periphery and 2) the extraction of meaning from the neurophysiological response at the cortical levels. Moore (1989) presents a thorough review of the physiological processes and some of the issues involved in speech perception. A fundamental point of interest here is perceptual constancy in the face of a massively variable signal. Restated as a question: how is it that a human listener is able to perceive speech sounds and understand the meaning of an utterance given the massive variability created by physiological idiosyncrasies and contextual variation? The various answers to this question involve positing some sort of perceptual units, be they individual segments, sub-segmental features, coordinated speech gestures, or higher level units like syllables, morphemes, or words.
It is worth noting here that the transmission of linguistic information does not necessarily rely exclusively on the auditory channel; the visible articulators, the lips and to a lesser degree the tongue and jaw, also transmit information; a listener presented with both auditory and visual stimuli will integrate the two signals in the perceptual process (e.g., Massaro 1987). When the information in the visual signal is unambiguous (as when the lips are the main articulators) the visual signal may even dominate the acoustic one (e.g., McGurk and Macdonald 1976). Moreover, writing systems convey linguistic information albeit in a low-dimensional fashion. Most strikingly, sign languages are fully as powerful as speech-based communication systems and are restricted to the visual domain. Despite the differences between signed and spoken languages in terms of the articulators and their perceptual modalities, they draw on the same sorts of linguistic constituents, at least as far as the higher-level units are concerned: syllable, morpheme, word, sentence, prosodic phrase (e.g., Brentari 1998). Some have also proposed decomposing signed languages into smaller units using manual analogs of phonological features despite the obvious differences in the articulators and the transmission media (for a review see Emmory 2002). The parallel of signed and spoken language structure despite the differences in transmission modalities is often interpreted as evidence for abstract phonological units at the level of the mental lexicon (Meier, Cormier, and Quinto-Pozos 2002).
The history of the debate: Early phonological units. The current debate about how to characterize speech sounds has its roots in research that dates back over a century. Prior to the advent of instrumental and experimental methods in the late 19th century, it was commonly accepted that the basic units of speech were discrete segments that were alphabetic in nature and serially ordered. While it was recognized that speech sounds varied systematically depending on the phonetic context, the variants themselves were thought to be static, allophones, of an abstract and lexically contrastive sound unit, that is, a phoneme. Translating into modern terminology, phonological planning involved two stages: 1) determining the contextually determined set of discrete surface variants given a particular lexical string and 2) concatenating the resulting allophones. The physiological implementation of the concatenated string was thought to result in a series of articulatory steady states or postures. The only continuous aspects of sound production were believed to be brief transitional periods created by articulatory transitions from one state to the next. The transitional movements were thought to be wholly predictable and determined by the physiology of a particular speaker’s vocal tract. Translating again into modern terminology, perception (when considered) was thought to be simply the process of translating the allophones back into their underlying phonemes for lexical access. The earliest example of the phoneme-allophone relationship is attributed to Panini c. 500 BC who’s the sophisticated system of phonological rules and relationships influenced structuralist linguists of the early 20th century as well as generative linguists of the late 20th century (for a review see Anderson 1985; Kiparsky 1979).
The predominant view at the end of the 19th century was typified by Bell’s (1867) influential descriptive work on English pronunciation. In it, he presented a set of alphabet-inspired symbols whose shapes and orientations were intended to encode both the articulatory steady states and their resulting steady-state sounds. A fundamental assumption in the endeavor was that all sounds of human language could be encoded as a sequence of universal articulatory posture complexes whose subcomponents were shared by related sounds. For example, all labial consonants (p, b, m, f, v, w, etc.) shared a letter shape and orientation, while all voiced sounds (b, d, g, v, z, etc.) shared an additional mark to distinguish them from their voiceless counterparts (p, t, k, f, s, etc.). Bell’s formalization of a set of universal and invariant articulatory constituents aligned as an alphabetic string influenced other universal transcription systems such as Sweet’s (1881) Romic alphabet, which laid the foundation for the development of the International Phonetic Alphabet (Passy, 1888). It also foreshadowed the use of articulatory features, such as those proposed by Chomsky and Halle (1968) in modern phonology, in that each speech sound, and therefore each symbol, was made up of a set of universal articulatory components. A second way in which Bell’s work presaged modern research was the connection between perception and production. Implicit in his system of writing was the belief that the perception of speech sounds was the process of extracting the articulations that produced them. Later perceptual models would incorporate this relationship in one way or another (Chistovich 1960; Dudley 1940; Fowler 1986, 1996; Joos 1948; Ladefoged and McKinney 1963; Liberman and Mattingly 1985; Stetson 1951).
The history of the debate: Early experimental research. Prior to the introduction of experimental methods into phonetics, the dominant methodologies were introspection about one’s own articulations and careful but subjective observations of others’ speech, and the measurement units were letter-based symbols. Thus, the observer and the observed were inextricably linked while the resolution of the measurement device was coarse. This view was challenged when a handful of phoneticians and psychologists adopted the scientific method and took advantage of newly available instrumentation, such as the kymograph, in the late 1800s. They discovered that there were no segmental boundaries in the speech stream and that the pronunciation of a particular sound varied dramatically from one instance to the next (for a review of early experimental phonetics see Kühnhert and Nolan 1999; and Minifie 1999). In the face of the new instrumental evidence, some scholars, like Sievers (1876), Rousselot (1897), and Scripture (1902) proposed that the speech stream, and the articulations that produced it, were continuous, overlapping, and highly variable rather than being discrete, invariant, and linear. For them, the fundamental sound units were the syllable or even the word or morpheme. Rousselot’s research (1897-1901) revealed several articulatory patterns that were confirmed by later work (e.g., Stetson 1951). For example, he observed that when sounds that are transcribed as sequential are generated by independent articulators (such as the lips and tongue tip), they are initiated and produced simultaneously. He also observed that one articulatory gesture may significantly precede the syllable it is contrastive in, thereby presenting an early challenge to the notion of sequential ordering in speech.
Laboratory researchers like Stetson (1905, 1951) proposed that spoken language was a series of motor complexes organized around the syllable. He also first proposed that perception was the process of perceiving the articulatory movements that generate the speech signal. However, outside of the experimental phonetics laboratory, most speech researchers, particularly phonologists like Leonard Bloomfield (1933), continued to use phonological units that remained abstract, invariant, sequential, and letter-like. Three events that occurred in the late 1940s and early 1950s changed this view dramatically. The first of these events was the application to speech research of modern acoustic tools such as the spectrogram (Potter 1945), sophisticated models of vocal tract acoustics (e.g., House and Fairbanks 1953), reliable articulatory instrumentation such as high speed X-ray cineflourography (ex: Delattre and Freeman 1968), and electromyographic studies of muscle activation (Draper, Ladefoged, and Whitteridge 1959). The second was the advent of modern perception research in which researchers discovered complex relationships between speech perception and the acoustic patterns present in the signal (Delattre, Liberman, and Cooper 1955). The third was the development of distinctive feature theory in which phonemes were treated as feature matrices that captured the relationships between sounds (Jakobson 1939; Jakobson, Fant, and Halle 1952).
When researchers began to apply modern acoustic and articulatory tools to the study of speech production, they rediscovered and improved on the earlier observation that the speech signal and the articulations that create it are continuous, dynamic, and overlapping. Stetson (1951) can be seen as responsible for introducing kinematics into research on speech production. His research introduced the notion coproduction, in which articulatory gestures were initiated simultaneously, and gestural masking, in which the closure of one articulatory gesture hides another giving rise to the auditory percept of deletion. Stetson’s work provided the foundation for current language models that incorporate articulatory gestures and their movements as the fundamental phonological units (ex: Browman and Goldstein 1990; Byrd and Saltzman 2003; Saltzman and Munhall 1989).
In the perceptual and acoustic domains, the identification of perceptual cues to consonants and vowels raised a series of questions that remain at the heart of the debate to this day. The coextensive and covarying movements that produce the speech signal result in acoustic information that exhibits a high degree of overlap and covariance with information about adjacent units (e.g., Delattre, Liberman, and Cooper 1955). Any single perceptual cue to a particular speech sound can also be a cue to another speech sound. For example, the onset of a vowel immediately following a consonant provides the listener with cues that identify both the consonant and vowel (Liberman, Delattre, Cooper, and Gerstman 1954). At the same time, multiple cues may identify a single speech sound. For example, the duration of a fricative (e.g., “s”), the fricative’s noise intensity, and the duration of the preceding vowel all give information about whether the fricative is voiced (e.g., “z”) or voiceless (e.g., “s”) (Soli 1982). Finally, the cues to one phone may precede or follow cues to adjacent phones. The many-to-one, the one-to-many, and the non-linear relationships between acoustic cues and their speech sounds pose a serious problem for perceptual models in which features or phones are thought to bear a linear relationship to each other. More recently, researchers studying perceptual learning have discovered that listeners encode speaker-specific details and even utterance-specific details when they are learning new speech sounds (Goldinger and Azuma 2003). The latest set of findings pose a problem for models in which linguistic sounds are stored as abstract units.
In distinctive feature theory, each phoneme is made up of a matrix of binary features that encode both the distinctions and the similarities between one class of sounds and the next in a particular language (Jakobson, Fant and Halle 1952; Chomsky & Halle, 1968). The features are thought to be drawn from a language universal set and thus allow linguists to observe similarities across languages in the patterning of sounds. Moreover, segmenting the speech signal into units that are hierarchically organized permits a duality of patterning of sound and meaning that is thought to give the language its communicative power. That is, smaller units such as phonemes may be combined according to language-specific phonotactic (sound combination) constraints into morphemes and words, and words may be organized according to grammatical constraints into sentences. This means that with a small set of canonical sound units, together with recursion, the talker may produce and the hearer may decode and parse a virtually unbounded number of utterances in the language.
In this section, we focus on those representations of form that encode meaning and other abstract linguistic content at the most minimally analyzable units of analysis—namely, words and morphemes. As such, we will give a brief overview of the study of lexical morphology, investigations in morphological processing, and theories about the structure of the mental lexicon.
Lexical form. What is the nature of representation at the level of lexical form? We will limit our discussion here largely to phonological codes, but recognize that a great many of the theoretical and processing issues we raise apply to orthographic codes as well. It is virtually impossible for the brain to store exact representations for all possible physical manifestations of linguistic tokens that one might encounter or produce. Instead, representations of lexical form are better thought of as somewhat abstract structured groupings of phonemes (or graphemes) which are stored as designated units in long term memory, either as whole words or as individual morpheme constituents and associated with any other sources of conceptual or linguistic content encoded in the lexical entries that these form representations map onto. As structured sequences of phonological segments then, these hypothesized representational units of the lexical form must be able to account for essentially all the same meaning-to-form mapping problems and demands that individual phonological segments themselves encounter during the on-line performance, due to idiosyncratic variation among speakers and communicative environments. More specifically, representations of morphemes and words at the level of the form must be abstract enough to accommodate significant variation in the actual physical energy profiles produced by the motor systems of individual speakers/writers at under various environmental conditions. Likewise, in terms of language production, units of the lexical form must be abstract enough to accommodate random variation in the transient shape and status of the mouth of the language producer.
Form and meaning: Independent levels of lexical representation. The description of words and morphemes given above to some degree rests on the assumption that lexical form is represented independently from other forms of cognitive and linguistic information, such as meaning and lexical syntax (e.g., lexical category, nominal class, gender, verbal subcategory, etc.). Many theories of the lexicon have crucially relied on the assumption of separable levels of representation within the lexicon. In some sense, as explained by Allport and Funnell (1981), this assumption follows naturally from the arbitrariness of mapping between meaning and form identified above, and would thus appear to be a relatively non-controversial assumption.
The skeptical scientist, however, is not inclined to simply accept assumptions of this sort at face value without considering alternative possibilities. Imagine, for example, that the various types of lexical information stored in a lexical entry are represented within a single data structure of highly interconnected independent distributed features. This sort of arrangement is easy to imagine within the architecture of a CONNECTIONIST model (McClelland & Rumelhart 1986). Using the lexical entry “cat” as an example, imagine a connectionist system in which all the semantic features associated with “cat,” such as [whiskers], [domestic pet], etc. (which are also shared with all other conceptual lexical entities bearing those features, such as <lion>, <dog>, etc.) are directly associated with the phonological units that comprise its word form /k/, /ae/, /t/ (which are likewise shared with all other word forms containing these phonemes) by means of individual association links that directly tie individual semantic features with individual phonological units (Rueckl et al. 1997). One important consequence of this hypothetical arrangement is that individual word forms do not exist as free-standing representations. Instead, the entire lexical entry is represented as a vector of weighted links connecting individual phonemes to individual lexical-semantic and syntactic features. It logically follows from this model, then, that if all or most of the semantic features of the word “cat,” for example, were destroyed or otherwise made unavailable to the processor, then the set of phonological forms /k/ /ae/ /t/, having nothing to link to, would have no means for mental representation, and would therefore not be available to the language processor. We will present here experimental evidence against this model, which instead, favors models in which a full phonological word (e.g., /kaet/) is represented in a localist fashion, and is accessible to the language processor, even when access to its semantic features is partially or entirely disrupted.
Several of the most prominent theories of morphology and lexical structure within formal linguistics make explicit claims about the modularity of meaning and form (Anderson 1992). Jackendoff (1997), for example, presents a theory that has a tripartite structure, in which words have separate identities at three levels of representation — form, syntax, and meaning — and that these three levels are sufficient to encode the full array of linguistic information each word encodes. Jackendoff’s model provides further details in which it is proposed that our ability to store, retrieve, and use words correctly, as well as our ability to correctly compose morphemes into complex words, derives from a memorized inventory of mapping functions that pick out the unique representations or feature sets for a word at each level and associate these elements with one another in a given linguistic structure.
While most psycholinguistic models of language processing have not typically addressed the mapping operations assumed by Jackendoff, they do overlap significantly in terms of addressing the psychological reality of his hypothetical tripartite structure in the mental lexicon. Although most experimental treatments of the multi-level nature of the lexicon have been developed within models of language production, as will be seen below, there is an equally compelling body evidence for multi-level processing from studies of language comprehension as well.
The most influential lexical processing models over the last two decades make a distinction between at least two levels: the lemma level, where meaning and syntax are stored, and the lexeme level, where phonological and orthographic descriptions are represented. These terms and the functions associated with them were introduced in the context of a computational production model by Kempen and Huijbers (1983) and receive further refinement with respect to human psycholinguistic performance in the foundational lexical production models of Bock (1982), Garrett (1975), and Levelt (1989), Much compelling evidence for a basic lemma/lexeme distinction has come from analyses of naturally occurring speech errors generated by neurologically unimpaired subjects, including tip-of-the-tongue phenomena (Meyer and Bock 1992), as well as from systematic analyses of performance errors observed in patients with acquired brain lesions. A more common experimental approach, however, is the picture-word interference naming paradigm, in which it has been shown that lemma and lexeme level information can be selectively disrupted during the course of speech production (Schriefers, Meyer, and Levelt 1990).
In terms of lexical comprehension models, perhaps the most straightforward sources of evidence for a meaning/form distinction have come from analyses of the performance of brain-damaged patients. A particularly compelling case for the independence of meaning and form might be demonstrated if an individual with acquired language pathology were to show an intact ability to access word forms in his/her lexicon, yet remains unable to access meaning from those form representations. This is precisely the pattern observed in patients designated as suffering from the word meaning deafness. These patients show a highly selective pattern of marked deficit in comprehending word meanings, but with perfect or near-perfect access to word forms. A good example is patient WBN as described in Allen (2005), who showed an entirely intact ability to access spoken word form representations. In an auditory lexical decision task, WBN scored 175/182 (96%) correct, which shows he can correctly distinguish real words from non-words (e.g., flag vs. flag), presumably relying on preserved knowledge of stored lexemes to do so. However, on tasks that required WBN to access meaning from spoken words, such as the picture to word matching tasks, he performed with only 40-60% accuracy (at chance in many cases).
Lexical structure: Complex words. A particularly important issue in lexical representation and processing concerns the cognitive structure of complex words, that is, words composed of more than one morpheme. One of the biggest debates surrounding this issue stems from the fact that in virtually all languages with complex word structures, lexical information is encoded both inconsistent, rule-like structures, as well as idiosyncratic, irregular structures. This issue can be put more concretely in terms of the role of morphological decomposition in single-word comprehension theories within psycholinguistics. Consider the written word wanted, for example. A question for lexical recognition theories is whether the semantic/syntactic properties of this word [WANT, Verb, +Past, …] are extracted and computed in a combinatorial fashion each time wanted is encountered—by accessing the content associated with the stem want- [WANT, Verb] and combining it with the content extracted from the affixed [+Past]—or whether instead a single whole-word form wanted is stored at the lexeme level and associated directly with all its semantic/syntactic content. To understand the plausibility that a lexical system could in principle store whole-word representations such as wanted, one must recognize that in many other cases, such as those involving irregularly inflected words, such as taught, the system cannot store a stem and affix at the level of form, as there are no clear morpheme boundaries to distinguish these constituents, but must instead obligatorily store it as a whole-word at the lexeme level.
Many prominent theories have favored the latter, non-decompositional, hypothesis for all words, including irregular words like taught as well as regular compositional words like wanted (Bybee 1988). Other influential processing models propose that complex words are represented as whole-word units at the lexeme level, but that paradigms of inflectionally related words (want, wants, wanted) map onto a common representation at the lemma level (Fowler et.al. 1985). In addition to this, another class of models, which has received perhaps the strongest empirical support, posits full morphological decomposition at the lexeme level whenever possible (Allen and Badecker 1999). According to these fully decompositional models, a complex word like wanted is represented and accessed in terms of its decomposed constituents want- and -ed at the level of form, such that the very same stem want- is used during the recognition of want, wants, and wanted. According to these models, then, the recognition routines that are exploited by morphological decomposition at the level of form resemble those in theoretical approaches to sentence processing, in which meaning is derived compositionally by accessing independent units of representation of form and combining the content that these forms access into larger linguistic units, according to algorithms of composition specified by the grammar.
While there is compelling empirical support for decompositional models of morphological processing, researchers are becoming increasingly aware of important factors that might limit decomposition. These factors are regularity, formal and semantic transparency, and productivity.
Regularity refers to the reliability of a particular word-formation process. For example, the plural noun kids express noun-plurality in a regular, reliable way, while the plural noun children do not.
Formal transparency refers to the degree to which the morpheme constituents of a complex structure are obvious from its surface form. For example, morpheme boundaries are fairly obvious in the transparently inflected word wanted, compared to those of the opaquely (and irregularly) inflected word taught.
Semantic transparency. Although an irregular form like taught is formally opaque, as defined above, it is nonetheless semantically transparent because its meaning is a straightforward combination of the semantics of the verb teach and the feature [+Past]. In contrast to this, an example of a complex word that is formally transparent, yet semantically opaque is the compound word dumbbell, which is composed of two recognizable morphemes, but the content associated with these two surface morphemes do not combine semantically to form the meaning of the whole word.
Productivity describes the extent to which a word-formation process can be used to form new words freely. For example, the suffix -ness is easily used to derive novel nouns from adjectives (e.g., nerdiness, awesomeness, catchiness), while the ability to form novel nouns using the analogous suffix -ity is awkward at best (?nerdity) if not impossible.
Another phenomenon associated with these lexical properties is that they tend to cluster together in classes of morphologically complex word types across a given language, such that there will often exist a set of highly familiar, frequently used forms that are irregular, formally opaque and non-productive, and also a large body of forms that are morphologically regular, formally transparent, and productive. Given the large variety of complex word types found in human languages with respect to these dimensions of combinability, as well as the idiosyncratic nature of the tendency for these dimensions to cluster together from language to language, it would appear that empirical evidence for morphological decomposition must be established on a “case-by-case” basis for each word-formation type within each language. This indeed appears to be the direction that most researchers have taken.
On the surface, a sentence is a linear sequence of words. But in order to extract the intended meaning, the listener or reader must combine the words in just the right way. That much is obvious. What is not obvious is how we do that in real-time, as we read or listen to a sentence. Of particular relevance to this chapter are the following questions: Is there a representational level of syntactic form that is distinct from the meaning of a sentence? And if so, exactly how do we extract the implicit structure in a spoken or written sentence as we process it? One can ask similar questions about the process of sentence production: When planning a sentence, is there a planning stage that encodes specifically syntactic form? And if so, how do these representations relate to the sound and meaning of the intended utterance?
For purely practical reasons, there is far more research on extracting the syntactic form during sentence comprehension (a process known as parsing; see PARSING, HUMAN) than on planning the syntactic form of to-be-spoken sentences. Nonetheless, research in both areas has led to substantive advances in our understanding of the psychology of sentence form.
Syntax and semantics. A fundamental claim of GENERATIVE GRAMMARS is that syntax and semantics are clearly distinct. A fundamental claim of COGNITIVE GRAMMARS is that syntax and semantics are so entwined that they cannot be easily separated. This debate among linguists is mirrored by a similar debate among researchers studying language processing. A standard assumption underlying much psycholinguistic work is that a relatively direct mapping exists between the levels of knowledge posited within generative linguistic theories and the cognitive and neural processes underlying comprehension (Bock and Kroch 1989). Distinct language-specific processes are thought to interpret a sentence at each level of analysis, and distinct representations are thought to result from these computations. But other theorists, most notably those working in the connectionist framework, deny that this mapping exists (Elman et al. 1996). Instead, the meaning of the sentence is claimed to be derived directly, without an intervening level of syntactic structure.
The initial evidence of separable syntactic and semantic processing streams came from studies of brain-damaged patients suffering from APHASIA, in particular, the syndromes known as Broca’s and Wernicke’s aphasia. Broca’s aphasics typically produce slow, labored speech; their speech is generally coherent in meaning but very disordered in terms of sentence structure. Many syntactically important words are omitted (e.g., the, is), as are the inflectional morphemes involved in morphosyntax (e.g., -ing, -ed, -s). Wernicke’s aphasics, by contrast, typically produce fluent, grammatical sentences that tend to be incoherent. Initially, these disorders were assumed to reflect deficits in sensorimotor function; Broca’s aphasia was claimed to result from a motoric deficit, whereas Wernicke’s aphasia was claimed to reflect a sensory deficit. The standard assumptions about aphasia changed in the 1970s when theorists began to stress the ungrammatical aspects of Broca’s aphasics’ speech; the term “agrammatism” became synonymous with Broca’s aphasia. Particularly important in motivating this shift was evidence that some of Broca’s aphasics have a language comprehension problem that mirrors their speech production problems. Specifically, some Broca’s aphasics have trouble understanding syntactically complex sentences (e.g., John was finally kissed by Louise) in which the intended meaning is crucially dependent on syntactic cues – in this case, the grammatical words was and by (Caramazza and Zurif 1976). This evidence seemed to rule out a purely motor explanation for the disorder; instead, Broca’s aphasia was viewed as fundamentally a problem constructing syntactic representations, both for production and comprehension. By contrast, Wernicke’s aphasia was assumed to reflect a problem in accessing the meanings of words.
These claims about the nature of the aphasic disorders are still quite influential. Closer consideration, however, raises many questions. “Pure” functional deficits affecting a single linguistically defined function are rare; most patients have a mixture of problems, some of which seem linguistic but others of which seem to involve motor or sensory processing (Alexander 2006). Many of the Broca’s patients who produce agrammatic speech are relatively good at making explicit grammaticality judgments (Linebarger 1983), suggesting that their knowledge of syntax is largely intact. Similarly, it is not uncommon for Broca’s aphasics to speak agrammatically but to have relatively normal comprehension, bringing into question the claim that Broca’s aphasia reflects damage to an abstract “syntax” area used in production and comprehension (Miceli, Mazzuchi, Menn, and Goodglass 1983). Taken together, then, the available evidence from the aphasia literature does not provide compelling evidence for distinct syntactic and semantic processing streams.
Another source of evidence comes from NEUROIMAGING studies of neurologically normal subjects. One useful method involves recording event-related brain potentials (ERPs) from a person’s scalp as they read or listen to sentences. ERPs reflect the summed, simultaneously occurring postsynaptic activity in groups of cortical pyramidal neurons. A particularly fruitful approach has involved the presentation of sentences containing linguistic anomalies. If syntactic and semantic aspects of sentence comprehension are segregated into distinct streams of processing, then syntactic and semantic anomalies might affect the comprehension system in distinct ways. A large body of evidence suggests that syntactic and semantic anomalies do in fact elicit qualitatively distinct ERP effects and that these effects are characterized by distinct and consistent temporal properties. Semantic anomalies (e.g., The cat will bake the food …) elicit a negative wave that peaks at about 400 ms after the anomalous word appears (the N400 effect) (Kutas & Hillyard 1980). By contrast, syntactic anomalies (e.g., The cat will eating the food …) elicit a large positive wave that onsets at about 500 ms after presentation of the anomalous word and persists for at least half a second (the P600 effect (Osterhout & Holcomb 1992). In some studies, syntactic anomalies have also elicited negativity over anterior regions of the scalp, with onsets ranging from 100 to 300 ms. These results generalize well across types of the anomaly, languages, and various methodological factors. The robustness of the effects seems to indicate that the human brain does honor the distinction between the form and the meaning of a sentence.
Sentence comprehension. Assuming that sentence processing involves distinct syntactic and semantic processing streams, the question arises as to how these streams interact during comprehension. A great deal of evidence indicates that sentence processing is incremental, that is, that each successive word in a sentence is integrated into the preceding sentence material almost immediately. Such a strategy, however, introduces a tremendous amount of AMBIGUITY – that is, uncertainty about the intended syntactic and semantic role of a particular word or phrase. Consider, for example, the sentence fragment The cat scratched . . . . There are actually two ways to parse this fragment. One could parse it as a simple active sentence, in which the cat is playing the syntactic role of subject of the verb scratched, and the semantic role of the entity doing the scratching (as in The cat scratched the ratty old sofa). Or one could parse it as a more complex relative clause structure, in which the verb scratched is the start of a second, embedded clause, and the cat is the entity being scratched, rather than the one doing the scratching (as in The cat scratched by the raccoon was taken to the pet hospital). The ambiguity is resolved once the disambiguating information (the ratty sofa or by the raccoon) is encountered downstream, but that provides little help for a parser that assigns roles to words as soon as they are encountered.
How does an incremental sentence processing system handle such ambiguities? An early answer to this question was provided by the garden-path (or modular) parsing models developed in the1980s. The primary claim was that the initial parse of the sentence is controlled entirely by the syntactic cues in the sentence (Ferreira
and Clifton 1986). As words arrive in the linguistic input, they are rapidly organized into a structural analysis by a process that is not influenced by semantic knowledge. The output of this syntactic process then guides semantic interpretation. This model can be contrasted with interactive models, in which a wide variety of information (e.g., semantics and conceptual/world knowledge) influences the earliest stages of sentence parsing. Initial results of numerous studies (mostly involving the measurement of subjects’ eye movements as they read sentences) indicated that readers tend to read straight through syntactically simple sentences such as The cat scratched the ratty old sofa but experience longer eye fixations and more eye regressions when they encountered by the raccoon in the more complex sentences. When confronted with syntactic uncertainty, readers seemed to immediately choose the simplest syntactic representation available (Frazier 1987). When this analysis turned out to be an erroneous choice (that is, when the disambiguating material in the sentence required a more complex structure), longer eye fixations and more regressions occurred as they reader attempted to “reanalyze” the sentence.
A stronger test of the garden-path model, however, requires examining situations in which the semantic cues in the sentence are clearly consistent with a syntactically complex parsing alternative. A truly modular, syntax-driven parser would be unaffected by the semantic cues in the sentence. Consider, for example, the sentence fragment The sofa scratched . . . . Sofas are soft and inanimate and therefore unlikely to scratch anything. Consequently, the semantic cues in the fragment favor the more complex relative clause analysis, in which the sofa is the entity being scratched (as in The sofa scratched by the cat was given to Goodwill). Initial results seemed to suggest that the semantic cues had no effect on the initial parse of the sentence; readers seemed to build the syntactically simplest analysis possible, even when it was inconsistent with the available semantic information. Such evidence led to the hypothesis that the language processor is comprised of a number of autonomously functioning components, each of which corresponds to a level of linguistic analysis (Ferreira and Clifton 1986). The syntactic component was presumed to function independently of the other components.
The modular syntax-first model has been increasingly challenged, most notably by advocates of constraint-satisfaction models (Trueswell and Tanenhaus 1994). These models propose that all sources of relevant information (including statistical, semantic and real-world information) simultaneously and rapidly influence the actions of the parser. Hence, the implausibility of a sofa scratching something is predicted to cause the parser to initially attempt the syntactically more complex relative clause analysis. Consistent with this claim, numerous studies have subsequently demonstrated compelling influences of semantics and world knowledge on the parser’s response to syntactic ambiguity (Trueswell et al. 1994).
There is, however, a fundamental assumption underlying most of the syntactic ambiguity research (regardless of theoretical perspective): that syntax always controls combinatory processing when the syntactic cues are unambiguous. Recently, this assumption has also been challenged. The challenge centers on the nature of THEMATIC ROLES, which helps to define the types of arguments licensed by a particular verb (McRae et al. 1997; Trueswell and Tanenhaus 1994). Exactly what is meant by “thematic role” varies widely, especially with respect to how much semantic and conceptual content it is assumed to hold (McRae et al. 1997). For most “syntax-first” proponents, a thematic role is limited to a few syntactically relevant “selectional restrictions”, such as animacy (Chomsky 1965); thematic roles are treated as (largely meaningless) slots to be filled by syntactically appropriate fillers. A second view is that there is a limited number of thematic roles (agent, theme, benefactor, and so on) and that a verb selects a subset of these (Fillmore 1968). Although this approach attributes a richer semantics to thematic roles, the required generalizations across large classes of verbs obscure many subtleties in the meaning and usage of these verbs.
Both of these conceptions of thematic roles exclude knowledge that people possess concerning who tends to do what to whom in particular situations. McRae and others have proposed a third view of thematic roles that dramatically expands their semantic scope: thematic roles are claimed to be rich, verb-specific concepts that reflect a person’s collective experience with particular actions and objects (McRae et al. 1997). These rich representations are claimed to be stored as a set of features that define gradients of typicality (“situation SCHEMAS”) and to comprise a large part of each verb’s meaning. One implication is that this rich knowledge will become immediately available once a verb’s meaning has been retrieved from memory. As a consequence, the plausibility of a particular word combination need not be evaluated by means of a potentially complex inferential process but rather can be evaluated immediately in the context of the verb’s meaning. One might, therefore, predict that semantic and conceptual knowledge of events will have profound and immediate effects on how words are combined during sentence processing. McRae and others have provided evidence consistent with these claims, including semantic influences on syntactic ambiguity resolution.
The most compelling evidence against the absolute primacy of syntax, however, would be evidence that semantic and conceptual knowledge can “take control” of sentence processing even when opposed by contradicting and unambiguous syntactic cues. Recent work by Ferreira (2003) suggests that this might happen on some occasions. Ferreira reported that when plausible sentences (e.g., The mouse ate the cheese) were passivized to form implausible sentences (e.g., The mouse was eaten by the cheese), participants tended to name the wrong entity as “do-er” or “acted-on” as if coercing the sentences to be plausible. However, the processing implications of these results are uncertain, due to the use of post-sentence ruminative responses, which do not indicate whether semantic influences reflect the listeners’ initial responses to the input or some later aspect of processing.
Researchers have also begun to explore the influence of semantic and conceptual knowledge on the on-line processing of syntactically unambiguous sentences. An illustrative example is a recent ERP study by Kim and Osterhout (2005). The stimuli in this study were anomalous sentences that began with an active structure, for example, The mysterious crime was solving . …. The syntactic cues in the sentence require that the noun crime be the Agent of the verb solving. If syntax drives sentence processing, then the verb solving would be perceived to be semantically anomalous, as crime is a poor Agent for the verb solve, and therefore should elicit an N400 effect. However, although crime is a poor Agent, it is an excellent Theme (as in solved the crime). The Theme role can be accommodated simply by changing the inflectional morpheme at the end of the verb to a passive form (“The mysterious crime was solved . . .”). Therefore, if meaning drives sentence processing in this situation, then the verb solving would be perceived to be in the wrong syntactic form (-ing instead of –ed), and should, therefore, elicit a P600 effect. Kim and Osterhout observed that verbs like solving elicited a P600 effect, showing that a strong “semantic attraction” between a predicate and an argument can determine how words are combined, even when the semantic attraction contradicts
unambiguous syntactic cues. Conversely, in anomalous sentences with an identical structure but with no semantic attraction between the subject noun and the verb (e.g., The envelope was devouring . . .”), the critical verb elicited an N400 effect rather than a P600 effect. These results demonstrate that semantics, rather than syntax, can “drive” word combinations during sentence comprehension.
Sentence production. Generating a sentence requires the rapid construction of novel combinations of linguistic units, involves multiple levels of analysis, and is constrained by a variety of rules (about word order, the formation of complex words, word pronunciation, etc). Errors are a natural consequence of these complexities (Dell 1995). Because they tend to be highly systematic, speech errors have provided much of the data upon which current models of sentence production are based. For example, word exchanges tend to obey a syntactic category rule, in that the exchanged words are from the same syntactic category (for example, two nouns have exchanged in the utterance Stop hitting your brick against a headwall). The systematicity of speech errors suggests that regularities described in theories of linguistic form also play a role in the speech planning process.
The dominant model of sentence production is based on speech error data (Dell 1995; Garrett 1975; Levelt 1989). According to this model, the process of preparing to speak a sentence involves three stages of planning: conceptualization, formulation, and articulation, in that order. During the conceptualization stage, the speaker decides what thought to express, and how to order the relevant concepts sequentially. The formulation stage begins with the selection of a syntactic frame to encode the thought; the frame contains slots that act as place holders for concepts and, eventually, specific words. The phonological string is translated into a string of phonological features, which then drive the motor plan manifested in articulation.
This model, therefore, posits the existence of representations of syntactic structure that are distinct from the representations of meaning and sound. Other evidence in support of this view comes from the phenomenon of syntactic priming: having heard or produced a particular syntactic structure, a person is more likely to produce sentences using the same syntactic structure (Bock 1986). Syntactic priming occurs independently of sentence meaning, suggesting that the syntactic frames are independent forms of representation that are quite distinct from meaning.
Collectively, the evidence reviewed above indicates that psychologically relevant representations of linguistic form exist at all levels of language, from sounds to sentences. At each level, units of linguistic form are combined in systematic ways to form larger units of representation. For the most part, these representations seem to be abstract; that is, they are distinct from the motor movements, sensory experiences, and episodic memories associated with particular utterances. However, it is also clear that more holistic (that is, non-decompositional) representations of linguistic form, some of which are rooted in specific episodic memories, also play a role in language processing.
It also seems to be true that linguistic forms (e.g., the morphological structure of a word or the syntactic structure of a sentence) are dissociable from the meanings they convey. At the same time, semantic and conceptual knowledge can strongly influence the processing of linguistic forms, as exemplified by semantic transparency effects on word decomposition and thematic role effects on sentence parsing.
These conclusions represent substantive progress in our understanding of linguistic form and the role it plays in language processing. Nonetheless, answers to some of the most basic questions remain contentiously debated, such as the precise nature of the “rules” of combination, the relative roles of compositional and holistic representations, and the pervasiveness of interactions between meaning and form.
Alexander, M. P. 2006. “Aphasia I: Clinical and anatomical issues.” In Patient-Based Approaches to Cognitive Neuroscience (2nd ed.), ed. M. J. Farah and T. E. Feinberg, 165-182. Cambridge, MA: MIT Press.
Allen, Mark and William Badecker. 1999. Stem homograph inhibition and stem allomorphy: Representing and processing inflected forms in a multilevel lexical system. Journal of Memory and Language 41:105-123.
Allen, Mark D. 2005. The preservation of verb subcategory knowledge in a spoken language comprehension deficit. Brain and Language 95: 255-264.
Allport, D.A. and E. Funnell. 1981. Components of the mental lexicon. Philosophical Transactions of the Royal Society of London B 295: 397-410.
Anderson, Stephen R.1985. Phonology in the Twentieth Century: Theories of rules and Theories of Representations. Chicago, IL: The University of Chicago Press.
Bell, Alexander M. 1867. Visible Speech: The Science of Universal Alphabetics. London: Simpkin, Marshal.
Bloomfield, Leonard. 1933. Language. New York: H. Holt & Co.
Bock, J. K., & Anthony S. Kroch. 1989. “The isolability of syntactic processing.” In Linguistic structure in language processing, ed.. G. N. Carlson and M. K. Tanenhaus, 157-196 Boston: Kluwer Academic.
Brentari, Dianne. 1998. A Prosody Model of Sign Language Phonology. Cambridge, MA: MIT Press.
Browman, Catherine P., and Louis Goldstein.1990. Gestural specification using dynamically-defined articulatory structures. Journal of Phonetics 18: 299-320.
Bybee, Joan. 1988. “Morphology as a lexical organization.” In Theoretical morphology: Approaches in modern linguistics, ed. M. Hammond & M. Noonan, 119-141. San Diego, CA: Academic Press.
Byrd, Dani, and Elliot Saltzman. 2003.The elastic phrase: Modeling the dynamics of boundary-adjacent lengthening, Journal of Phonetics 31: 149–180.
Caplan, David. 1995. Issues arising in contemporary studies of disorders of syntactic processing in sentence comprehension in agrammatic patients. Brain and Language 50: 325-338
Caramazza, Alfonzo and Edgar Zuriff. 1976. Dissociations of algorithmic and heuristic processes in language comprehension: Evidence from aphasia. Brain and Language 3: 572-582.
Chistovich, Ludmilla A. 1960. Classification of rapidly repeated speech sounds. Akustichneskii Zhurnal: 6: 392-398.
Chomsky, Noam. 1957. Syntactic Structures. The Hague: Mouton.
Chomsky, N. 1965. Aspects of the theory of syntax. Cambridge, MA: MIT Press.
Delattre, Pierre, and Donald Freeman. 1968. A dialect study of American R’s by the x-ray motion picture. Linguistics 44: 29-68.
Delattre, Pierre C., Avin M. Liberman, and Franklin S.Cooper. 1955. Acoustic loci and transitional cues for consonants. Journal of the Acoustical Society of America 27: 769-773.
Dell, Gary S. 1995. Speaking and misspeaking. In An Invitation to Cognitive Science: Language. Cambridge, MA: MIT Press.
Draper, M., P. Ladefoged, and D. Whitteridge. 1959. Respiratory muscles in speech. Journal of Speech and Hearing Research 2: 16-27.
Dudley, Homer 1940. The carrier nature of speech. Bell System Technical Journal 14, 495-515.
Elman, Jeffrey L. 1990. “Representation and structure in connectionist models.” In Cognitive models of speech processing, ed. G. T.M. Altmann, 227-260. Cambridge, MA: MIT Press.
Emmory, Karen. 2002. Language, Cognition, and the Brain: Insights from sign language research. Mahwah, NJ: Lawrence Erlbaum Associates.
Ferreira, Fernanda. 2003. The misinterpretation of noncanonical sentences. Cognitive Psychology 47: 164–203.
Ferreira, Fernanda and Charles Clifton, Jr. 1986. The independence of syntactic processing. Journal of Memory and Language 25: 348-368.
Fillmore, Charles. 1968. “The case for case.” In Universals of linguistic theory, ed. E. Bach, 1-80. New York: Holt, Rinehart, & Winston.
Fowler, Carol A. 1986. An event approach to the study of speech perception from a direct-realist perspective. Journal of Phonetics 14: 3-28.
Fowler, Carol A. 1996. Listeners do hear sounds, not tongues. Journal of the Acoustical Society of America 99: 1730-1741.
Fowler, Carol A. 1995. “Speech production.” In Speech, language and communication, ed. J. L. Miller, and P. D.Eimas, 29-61. New York: Academic Press.
Fowler, Carol, Susan Napps, and Laurie Feldman. 1985. Relations among regular and irregular morphologically related words in the lexicon as revealed by repetition priming. Memory and Cognition 13: 241-255.
Franklin, S., J. Lambon Ralph, J. Morris, and P. Bailey, P. 1996. A distinctive case of word meaning deafness? Cognitive Neuropsychology 13: 1139-1162.
Frazier, L. 1987. “Sentence processing: A tutorial review.” In Attention and performance XII: The psychology of reading, ed. M. Coltheart, 3-30. Hillsdale, NJ: Erlbaum.
Frazier, Lyn. 1995. “Representation in psycholinguistics.” In Speech, Language, and Communication, ed. J.L. Miller and P. D. Eimas, 1-27. New York: Academic Press.
Garrett, Merrill F. 1975. “The analysis of sentence production.” In The psychology of learning and motivation, ed. G. Boer, 133-177. New York: Academic Press.
Goldinger, Stephen D., and Tamiko Azuma. 2003. Puzzle-solving science: The quixotic quest for units of speech perception. Journal of Phonetics 31: 305-320.
Goldinger, Stephen D., David B. Pisoni, and John S. Logan. 1991. On the nature of talker variability effects on recall of spoken word lists. Journal of Experimental Psychology: Learning, Memory and Cognition 17: 152-162.
Hall, D.A. and M. J. Riddoch. 1997. Word meaning deafness: Spelling words that are not understood. Cognitive Neuropsychology 14: 1131-1164.
Hillis, Argye E.. 2000. “The organization of the lexical system.” In What Deficits Reveal about the Human Mind/Brain: Handbook of Cognitive Neuropsychology, ed. B. Rapp, 185-210. Psychology Press.
House, Arthur S., and Grant Fairbanks. 1953. The influence of the consonant environment upon the secondary acoustical characteristics of vowels. Journal of the Acoustical Society of America 25: 105-113.
Jackendoff, Ray. 1997. The architecture of the language faculty. Cambridge, MA: MIT Press.
Jakobson, Roman. 1939. Observations Sur le classment phonologic des consonnes. Proceedings of the 3rd International Conference of Phonetic Sciences 34-41. Ghent.
Jakobson, Roman, Gunnar Fant, and Morris Halle. 1952. Preliminaries to speech analysis. Cambridge: MIT Press.
Joos, Martin. 1948. Acoustic Phonetics. Language Monograph 23, Supplement to Language 24: 1-36.
Kempen, Gerard and Pieter Huijbers. 1983. The lexicalization process in sentence production and naming: Indirect election of words. Cognition 14: 185-209.
Kim, Albert, and Lee Osterhout. 2005. The independence of combinatory semantic processing: Evidence from event-related potentials. Journal of Memory and Language 52: 205-225.
Kiparsky, Paul 1979. Panini as a Variationist. Cambridge, MA: MIT Press.
Künhert, Barbara and Francis Nolan. 1999. “The origin of coarticulation.” In Coarticulation: Theory, Data, and Techniques, ed. B. Rapp, 7-30. Cambridge, UK: Cambridge University Press.
Kutas, Marta and Steven A. Hillyard. 1980. Reading senseless sentences: Brain potentials reflect semantic anomaly. Science 207: 203-205.
Ladefoged, P. and N. McKinney. 1963 Loudness, sound pressure, and subglottal pressure in speech. Journal of the Acoustical Society of America 35: 454-460.
Lambon Ralph, M., K. Sage, K., and A. Ellis. 1996. Word meaning blindness: A new form of acquired dyslexia. Cognitive Neuropsychology 13: 617-639.
Levelt, Willem. 1989. Speaking: From intention to articulation. Cambridge, MA: MIT Press.
Liberman, Alvin M., Pierre C. Delattre, Franklin S. Cooper, and Lou J. Gerstman. 1954. The role of consonant-vowel transitions in the perception of the stop and nasal consonants. Journal of Experimental Psychology 52: 127-137.
Liberman, Alvin M., and Ignatius G. Mattingly. 1985. The motor theory of speech perception revised. Cognition 21: 1-36.
Linebarger, Marcia, Myrna Schwartz, and Eleanor Saffran. 1983. Sensitivity to grammatical structure in so-called agrammatic aphasics. Cognition 13: 361-393.
Massaro, Dominic W. 1987. Speech perception by ear and eye: A paradigm for psychological inquiry. Hillsdale, NJ: Erlbaum.
McClelland, James, and David Rumelhart. 1986. Parallel distributed processing: Explorations in the microstructure of cognition, Vol. 1. Cambridge, MA: MIT Press.
McGurk, Harry and John Macdonald. 1976. Hearing lips and seeing voices. Nature 264: 746-748.
McRae, Ken, Todd R. Ferretti, and Liane Amyot. 1997 Thematic roles as verb-specific concepts, Language and Cognitive Processes 12:2 137 – 176
Meier, Richard P., Kearsey Cormier, and David Quinto-Pozos. 2002. Modularity in Signed and Spoken Languages. Cambridge, UK: Cambridge University Press.
Meyer, Antje, and Kathryn Bock. 1992. Tip-of-the-tongue phenomenon: Blocking or partial activation? Memory and Cognition 20: 715-726.
Miceli, G., A., L. Mazzuchi, L. Menn, and H. Goodglass. 1983. Contrasting cases of Italian agrammatic aphasia without comprehension disorder. Brain and Language 19: 65-97.
Miller, George A., and Patricia E. Nicely. 1955. An analysis of perceptual confusions among some English consonants. Journal of the Acoustical Society of America 27: 329-335.
Minifie, Fred D. 1999. “The history of physiological phonetics in the United States.” In A Guide to the History of the Phonetic Sciences in the United States: Issued on the Occasion of the 14th International Congress of Phonetic Sciences, San Francisco, 1-7 August 1999, ed. J. Ohala, A.Bronstein, M. Busà, L. Grazio, J. Lewis, and W. Weigel. Berkeley, CA: University of California, Berkeley.
Moore, Brian C.J. 1989. An Introduction to the Psychology of Hearing. Third edition. London: Academic Press.
Nygaard, Lynn C., and David B. Pisoni. 1995. “Speech Perception: New directions in research and theory.” In Speech, Language, and Communication, ed. J. Miller and P. Eimas, 63-96. New York: Academic Press.
Osterhout, Lee and Philip J. Holcomb. 1992. Event-related brain potentials elicited by syntactic anomaly. Journal of Memory and Language 31: 785-806.
Passy, Paul 1888. Our revised alphabet. The Phonetic Teacher 57-60.
Pinker, Steven. 1999. Words and Rules: The Ingredients of Language. New York, NY: Basic Books.
Potter, Ralph K. 1945. Visible patterns of sound. Science 102: 463-470.
Rousselot, P.-J. 1897-1901. Principes de Phonétique Experimentale. Paris: H. Welter.
Rueckl, Jay, Michelle Mikolinski, Michal Raveh, Caroline Miner, and F. Mars. 1997. Morphological priming, fragment completion, and connectionist networks. Journal of Memory and Language 36: 382-405.
Saltzman, Elliot L., and Kevin G. Munhall. 1989. A dynamical approach to gestural patterning in speech production, Ecological Psychology 1: 333–82.
Saussure, Ferdinand de. 2002 Écrits de linguistique générale edition prepared by Simon Bouquet and Rudolf Engler, Paris: Gallimard. English translation: Writings in General Linguistics, Oxford: Oxford University Press. 2006
Scharenborg, O., D. Norris, L. ten Bosch, and J. M. McQueen, J. M. 2005. How should a speech recognizer work? Cognitive Science 29: 867–918.
Schriefer’s, Herbert, Antje Meyer, and Willem Levelt. 1990. Exploring the time course of lexical access in language production: picture-word interference studies. Journal of Memory and Language 29: 86-102.
Scripture, Edward Wheeler. 1902. The Elements of Experimental Phonetics. New York, NY: Charles Scribner’s Sons.
Sievers, Eduard. 1876. Grundzuge der Lautphysiologie zur Einfuhrung in das Studium der Lautlehere der Indogermanischen Sprachen. Leipzig: Breitkopf and Hartel.
Soli, Sig D. 1982. Structure and duration of vowels together specify fricative voicing. Journal of the Acoustical Society of America 72: 366-378.
Stetson, Raymond H. 1905. A motor theory of rhythm and discrete succession II. Psychological Review 12: 293-350.
Stetson, Raymond H. 1951. Motor Phonetics: A Study of Speech Movement in Action. second edition. Amsterdam: North-Holland Publishing Co.
Studdert-Kennedy, Michael. 1980. Speech perception. Language and Speech 23: 45-65.
Sweet, Henry. 1881. Sound notation. Transactions of the Philological Society 177-235.
Trueswell, John. C. and Michael K. Tanenhaus. 1994. “Toward a lexicalist framework of constraint-based syntactic ambiguity resolution. In Perspectives on Sentence Processing, ed. C. Clifton, L. Frazier, & K. Rayner, 155-180, Hillsdale, NJ: Lawrence Erlbaum Associates.