Tải bản đầy đủ

Ngôn ngữ học khối liệu


The convergence of corpus
linguistics, psycholinguistics and
functionalist linguistics



As we have seen in Chapter 7, functionalist linguistics in the broad
sense (including cognitive linguistics) is increasingly making use of corpusbased methods, and in turn informing the analyses of corpus linguists. In this
chapter, we will show that this phenomenon extends as well to experimental
psycholinguistics. We will also discuss the implications of the rapprochement
of functionalist linguistics and psycholinguistics with corpus linguistics with
regard to the neo-Firthian school of thought which we surveyed in Chapter 6;
we will argue that in the neo-Firthian school, this rapprochement with functional
linguistics has taken a very different form. As we saw in Chapter 6, one of the
bases of the neo-Firthian or so-called ‘corpus-driven’ approach is a rejection
of non-corpus-derived theoretical frameworks. To explicitly adopt a functionalist
theory as the basis for a corpus-driven study would be distinctly peculiar from the

neo-Firthian perspective. Indeed, some of the stronger forms of the neo-Firthian
position – such as that espoused by Teubert, for instance – explicitly reject
the notion of a convergence of neo-Firthian corpus linguistics and functional
or cognitive linguistics, with Teubert (2005: 2) claiming that corpus linguistics
‘offers a perspective on language that sets it apart from received views or the
views of cognitive linguistics, both relying heavily on categories gained from
introspection rather than from the data itself’. Nevertheless, we wish to argue
that such a convergence is in fact taking place, stemming on the neo-Firthian
side from work by Sinclair and others from the 1990s onwards. Our basis for
making this case is that, when we closely examine the findings of the most
extensively developed neo-Firthian theories – in particular, Pattern Grammar and
Lexical Priming – we will find that many of these conclusions have also been
arrived at by one or more branches of functional linguistics or psycholinguistics.
These congruent conclusions stem from wildly different sets of evidence and
are, of course, expressed using very different descriptive apparatus. But certain
fundamental insights – namely, the inseparability of lexis and grammar, and
the nature of grammar as secondary to, and emergent from, lexis – have been
arrived at by both functional linguists and neo-Firthian corpus linguists, largely
independently of one another.


8.2 Corpus methods and psycholinguistics

In this chapter, then, we have two main topics. Firstly, in section 8.2 we will
consider the role of corpora in experimental psycholinguistics, as we considered their role in functionalism in Chapter 7. Psycholinguistics as a discipline
is methodologically rather different to functionalist theoretical linguistics, but it
shows signs of a similar trend with regard to corpus methods – that is, that over
recent years there has been more and more use of corpus data within psycholinguistic research, and a convergence or rapprochement between the findings of
psycholinguistic experiments and of corpus investigations.
Secondly, section 8.3 discuss the convergence of findings, regarding in particular the ontological status of grammar, lexis and language itself, between
neo-Firthian corpus linguistics, functional linguistics and psycholinguistics.


Corpus methods and psycholinguistics

Overlapping cognitive linguistics (which we discussed in the previous
chapter), but in many ways distinct from it, is the field of psycholinguistics –

and in particular that branch of psycholinguistics whose methodology is mainly
experimental. In the latter approach, the primary source of data is various types
of laboratory tests on human subjects (or, as we will see later, computer models). While experimental psycholinguistics is not usually considered a branch
of functional-cognitive linguistics, its fundamental methodological assumption –
that the nature of language in the brain or mind can be investigated in much the
same way that experimental psychology in general looks at other aspects of the
nature of thought – is in accordance with the general tenet of functionalism that
there is no absolute divide between form and function, between language and
non-linguistic cognition.
Psycholinguistics is a very broad field, and there is absolutely no room here
for a full review of it – nor even to treat comprehensively all research which has
linked psycholinguistics with corpus data and methods. We must therefore confine ourselves to an extremely brief and purely indicative survey. To characterise
psycholinguistics in very broad terms, we might say that it is focused on two
primary issues (which are closely interrelated, as Ellis 2002 illustrates): language
learning and language processing. There are other topics of interest of course,
such as the evolution of the language faculty. However, we will limit ourselves
here to looking at how corpora have been used in some psycholinguistic investigations into first language acquisition, second language acquisition and language

Corpus data in experiments on language processing ⅢⅢⅢⅢⅢⅢⅢⅢⅢⅢⅢⅢⅢⅢⅢⅢⅢⅢⅢⅢⅢⅢⅢⅢⅢⅢⅢⅢⅢⅢ

Language processing has been investigated experimentally in a number of ways. Two that are reasonably common are self-paced reading experiments



corpus, psycholinguistics and functionalism

and eye-tracking experiments. Both are means of investigating the speed with
which particular segments of language are processed. In a self-paced reading
experiment, participants work at a computer running a specially designed program. The computer shows one word of a sentence at a time to the participant,
who presses a button to get the next word once they have read the word currently
on screen. The program records the time for each button-press, so that the relative
speed of reading for each word is known. Typically, after each sentence participants have to answer a (very easy) question about the content of the sentence –
this prevents participants from just clicking through sentences without actually
reading for meaning. The results of such an experiment can be used to infer what
elements (morphological, syntactic or semantic) are processed easily, and which
are more difficult and thus require more processing time. This in turn can give
indications about what is actually happening in the brain. Although useful, selfpaced reading experiments may potentially be misleading in that fluent readers
do not typically read one word at a time, in sequence, without ever going back
in the text. In fact, it is known that a reader’s journey through a sentence of
printed text can be quite complex, with multiple movements back and forth. This
type of evidence is gathered in eye-tracking experiments (see Rayner 1998 for a
review). Again, participants are given the task of reading sentences presented on
a computer screen, but this time an entire sentence is presented at one time, and
specialised video equipment records the movements of one of the participant’s
eyes as it looks at different positions in the sentence immediately after the sentence appears on screen. The resulting data is much richer, but correspondingly
rather more difficult to interpret, than self-paced reading data.
These kinds of experiments may seem remote from the concerns of corpus
linguistics. However, there are at least two ways in which corpus data can play
an important role in the design and interpretation of such experiments. Firstly,
corpus data can be used as a check on the naturalness of the language task
that the experiment sets its participants. For instance, Frisson and Pickering
(2001) summarise the results of a series of eye-tracking experiments aimed at
investigating the processing of words which are ambiguous between a literal and
a metaphorical meaning, when the part of the sentence prior to the ambiguous
word does not provide sufficient cues to indicate which meaning is intended. But
Deignan (2005: 114–17), in a review of this study, points out that in fact, such
cases almost never occur in corpora of real usage: in all the examples she looks at,
some aspect of the preceding context – possibly in an earlier sentence – indicates
which meaning is intended. So, for instance, the word campaign literally relates
to warfare and metaphorically relates to politics. In any given real example of
campaign from a corpus, the prior context is overwhelmingly likely to give some
indication whether a military campaign or a political campaign is intended; so by
the time the reader gets up to campaign, it is already effectively disambiguated.
On this basis, Deignan argues that if an experiment presents participants with a
word such as campaign without any indication in the foregoing text as to whether
it is literal or metaphorical, as Frisson and Pickering’s experiment did, then that

8.2 Corpus methods and psycholinguistics

experiment is actually ‘forcing participants to tackle problems that are not faced
in normal discourse’ (Deignan 2005: 117). If this is the case, then it may be
argued that while such an experiment may indeed tell us something interesting
about the processing of ambiguously metaphorical words, it cannot tell us about
the normal processing of language in use. We can see, then, that a corpus-derived
awareness of how words (and other linguistic items) are actually used can serve
as a useful anchoring-point for psycholinguistic experimentation. This is not to
say that unnatural language should never be used in an experiment – there are
cases where non-idiomatic language may itself be the object of study, for instance
Millar’s (2011) study of how errors in collocation, of the type made by non-native
speakers of English, can affect processing speed in self-paced reading. What is
undesirable is a situation where experimental tasks include highly unnatural
language without the experimenter being aware that this is the case.
Secondly, corpus data can be used as a source of frequency data in the construction of test sentences in self-paced reading or eye-tracking experiments. Often,
the test sentences used will not be drawn directly from corpus data, because the
analysis of the resulting data may require certain aspects of the sentences to be
controlled across different examples. For instance, if we are primarily interested
in the time taken to process (say) the verb in a sentence, then we might well
wish to control the length and syntactic structure of the preverbal elements (as
well as, potentially, that of the rest of the sentence). We are unlikely to find such
controlled sentences in a corpus! But even when invented example sentences are
used, it is entirely possible for the creation of the sentences to be informed by frequency data of various sorts extracted from a corpus. The study by Millar (2011)
which we cited above uses this approach: Millar’s test sentences are all fabricated,
but each is built around an observed non-idiomatic collocation extracted from a
learner corpus.
A perhaps more straightforward use of frequency data drawn from corpora is
exemplified by the eye-tracking experiments of McDonald and Shillcock (2003a,
2003b). They investigate whether the co-occurrence frequency of a pair of words
(as established in a large corpus, in this case the BNC) can predict the ease of
processing of the second word in that pair. The co-occurrence frequencies are
expressed, in this case, as transition probabilities; that is, given that the first
word in the pair is X, what is the probability that the second word is Y? In
this case, the probability is equal to the number of times the sequence X-then-Y
occurs in the BNC, divided by the total number of instances of word X – this is
fundamentally very similar to a collocation calculation. McDonald and Shillcock
(2003a) look at the processing of verb–object pairs, contrasting pairs where the
object is probable, given the verb – e.g. avoid confusion – and pairs where it is less
probable – e.g. avoid discovery. The frequencies of these bigrams in the BNC are
50 and 2 respectively, relative to 7,823 instances of the wordform avoid in total.
McDonald and Shillcock’s eye-tracking data showed that participants’ eyes fixed
on the object noun for a shorter time when they were reading a high-probability
transition than when reading a low-probability transition. This suggests that the


Tài liệu bạn tìm kiếm đã sẵn sàng tải về

Tải bản đầy đủ ngay