Effect of repetition of exposure and proficiency level in l2 listening tests

Effect of Repetition of Exposure and Proficiency
Level in L2 Listening Tests
Shinshu University
Nagano, Japan

Second language (L2) listening test developers must take into account
a variety of factors such as the characteristics of the input, the task, and


the test takers (see, e.g., Brindley, 1998; Buck, 2001; Rost, 2002; Thompson,
1995). One such issue to be considered is the number of times a listening
passage should be played, which concerns the characteristics of both the
input and the task in that the number of exposures to a listening passage
is a matter of how the assessment task is administered and at the same
time how it can increase the redundancy of the input in the passage. To
better understand the role of repetition in listening tests, it is important
to examine whether repetition and proficiency levels exhibit any interactional effect because if differential effects of repetition are observed for
various L2 learners, only a portion of the test takers will benefit from the
repeated exposure. This study addresses the issue of the interactional
effect between repetition and proficiency levels.
Although this study focuses primarily on issues related to the testing of listening, repeated presentation of a listening passage is not limited to testing; it is frequently and widely used in listening instruction

(e.g., Harmer, 1998, p. 100). Thus, so far this issue has been addressed
mainly in studies that investigated the effects of repetition on L2 listening comprehension. In general, research conducted to date on the effect
of repeated exposure has shown that repetition may facilitate L2 listening comprehension (e.g., Berne, 1995). However, previous studies that
included listeners’ proficiency levels as an independent variable yielded
mixed results about the interactional effect between repetition and proficiency levels (Cervantes & Gainer, 1992; Chang & Read, 2006; Iimura,
2007; Lund, 1991). On one hand, Lund, and Chang and Read demonstrated supportive evidence for the interactional effect. Lund examined
the effects of repetition and different course levels (i.e., proficiency levels) on listening and reading comprehension in German as an L2. He
found a significant three-way (modality, trial, and course level) interactional effect in the lexical item analysis of the recall protocols. That
is, the improvement of the first-semester and second-semester students
in the listening recall task was about half of the improvement of the
third-semester students in the listening recall task, whereas there was no
difference in the improvement among the students at different proficiency levels in the reading recall task. Therefore, he argued, only thirdsemester students benefited from the repeated exposure in the listening
task. Chang and Read examined the effects of four different types of
listening support: preview of the questions, repetition of the input, provision of the topic knowledge, and vocabulary instruction. They also
investigated their interactional effects with proficiency levels based on
the results of the listening section of the Test of English for International
Communication (TOEIC). Results showed that the effects of the four listening support types differed according to proficiency level. In the condition of repetition of the input and preview of the questions, the high
listening proficiency group outperformed the low listening proficiency


group; in the other two conditions (provision of topic knowledge and
vocabulary instruction), both groups scored similarly. For the high listening proficiency group, repetition of the input and the provision of
background knowledge were more effective than vocabulary instruction;
for the low listening proficiency group, the provision of topic knowledge
was more effective than vocabulary instruction and preview of the questions. Based on these results, Chang and Reed suggested (a) that the
low listening proficiency group benefited less than the high listening
proficiency group from preview of the questions and repetition of the
input, (b) that both groups benefited from the provision of topic knowledge, and (c) that vocabulary instruction was the least effective for both
On the other hand, Cervantes and Gainer (1992) and Iimura (2007)
reported the lack of an interactional effect between repetition and proficiency levels. Cervantes and Gainer examined the effect of input modification (including repetition) on Japanese university students’ listening
comprehension of a lecture in English. Results of the study showed that
both simplification and repetition were more facilitative of comprehension than no modification and that no interactional effect between input
conditions and proficiency levels was observed. Thus, they argued that
repetition augments listening comprehension for both higher and lower
listening proficiency learners. Iimura examined the effect of repeated
exposure, question types, and proficiency levels on the listening comprehension of Japanese senior high school students. The participants were
divided into three listening proficiency groups based on the results of the
listening section of the third-grade level of the Society for Testing English
Proficiency (STEP) test. He found that repetition improved performance
on both question types (local and global questions) irrespective of proficiency levels.
In summary, these four previous studies have produced mixed results
regarding the interactional effect between repetition and proficiency
level. It must be noted that these previous studies used different tasks to
assess listening comprehension: a free written recall task (Lund, 1991), a
partial dictation task (Cervantes & Gainer, 1992), a multiple-choice test
(Chang & Read, 2006), and an open-ended question task (Iimura, 2007).
The choice of tasks seems to be quite important because the effect of repetition may easily be confounded with the effect of the preview of questions. For example, in a research design using multiple-choice tests or
open-ended questions, participants in the repetition condition hear the
passage, read the questions, and answer them; and then the procedure is
repeated. This inevitably forms what Sherman (1997) called the sandwich
version of administering questions. Sherman found that the sandwich version was more effective than the questions–listening–listening condition
or the listening–listening–questions condition. Thus, even if L2 test


takers were exposed to the listening passage the same number of times,
varying the timing of administering questions may lead to different
degrees of listening comprehension.
To avoid the confounding effects of preview of questions, the current
study used free written recall tasks in which, after listening to the
passage(s), test takers were required to write what they understood
(Thompson, 1995, p. 28). Because no intervening elements exist between
the test taker and the text in free written recall tasks (Alderson, 2000,
p. 230), it is possible to isolate the effects of previewing questions from
the effect of repetition. Free written recall tasks have another advantage.
As Alderson put it, “it [the free written recall task] is also claimed to
provide a picture of learner processes” (p. 230). Comparing written protocols with the original text will enable researchers to analyze idiosyncratic recall protocols (i.e., additive information that does not appear in
the original text) and misinterpretations (i.e., incorrect recall protocols)
and obtain useful and detailed information about how learners listen
in the L2.

The following three research questions (RQs) were posited for this
1. Does repetition affect learners’ listening recall performance?
2. Is the effect of repetition on listening comprehension the same for
learners at different proficiency levels?
3. How does repetition affect learners’ production of idiosyncratic
recalls and misinterpretations?
As mentioned earlier, previous studies have supported the effect
of repetition on L2 listening comprehension; nevertheless, only one
study (Lund, 1991) used free written recall tasks. In order to accumulate
empirical evidence regarding the effect of repeated exposure, RQ 1 was
posed. The current study used quantitative and qualitative analyses
in order to investigate the effect of repeated exposure for different listening proficiency groups. RQs 2 and 3 were posited for the respective

The participants in this study were 36 learners of English (6 males and
30 females) from the author’s intact class at the Faculty of Education of a


university in central Japan. All the participants had received formal
instruction of English at junior and senior high schools for 6 years before
entering the university. Of the 36 participants, 16 were second-year students, 16 were third-year students, and 4 were fourth-year students. Two
of the participants reported that they had studied abroad in Englishspeaking countries for about 1 year.
In order to divide the participants into two listening proficiency
groups, the listening sections (k = 60) of three forms (A, B, and C) of the
Michigan English Placement Test (Corrigan, Dobson, Kellman, Spaan, &
Tyma, 1993) were administered to them 1 month before the experimental task. The reliability coefficient (Cronbach’s alpha) was 0.75. The mean
score of 34 was used as the cutoff point. Those who scored 35 or above
were assigned to the higher listening proficiency group (HG); the others
who scored 34 or below were categorized as the lower listening proficiency group (LG). Thus, the HG consisted of 16 participants (M = 39.94;
SD = 4.06), and the LG had 20 participants (M = 29.90; SD = 5.01). The
difference between the two groups was statistically significant: t(34) =
6.639, p < 0.000.

The experiment was carried out in a classroom where listening materials were played on a CD player. The participants were given a blank
sheet of paper. They listened to the first passage and were told to write
down in Japanese everything they understood as extensively and accurately as possible on one side of the sheet after listening. While listening
to the passage, they were not allowed to take notes, but were asked to
concentrate on listening. The allotted writing time was 3 minutes. Then a
second passage was played. After listening to the second passage, the participants were told to write under the recall protocols of the first passage
on the same side of the paper. The allotted writing time was 3 minutes.
For the second trial, the participants were asked to turn over the paper
so that they could not access what they had written for the first listening
trial. The same procedure was repeated. The total time was about 15 minutes. The instructions were provided in Japanese.

The passages derived from past examinations of the presecond grade
of the STEP tests, and the attached CD was used (Obunsha, 2004). The
STEP tests are widely known in Japan as tests of English proficiency. They
come in seven levels: first grade (the highest level), prefirst grade, second


grade, presecond grade, third grade, fourth grade, and fifth grade (the
lowest level). Each grade has its own test aimed at different proficiency
levels. The presecond grade test is targeted at the senior high school level
(for more information about the STEP, see Society for Testing English
Proficiency, n.d.). This grade’s test was chosen because it was considered
not to be so difficult for the participants performing the free written
recall tasks, cognitively demanding tasks in which participants need to listen to the passages, understand the information, and write down the
information that they comprehend.
Both passages were monologue narratives. Monologue narratives
were chosen for this study because they constitute one of the common
text types used on the STEP tests. Because the STEP tests are used
widely in Japan, a large number of test takers encounter similar passages during each test administration. The first passage read by a male
contained 47 words in four sentences, whereas the second read by a
female contained 48 words in four sentences. The recording time for
each passage was 27 seconds and 29 seconds, respectively. Thus, the
reading speeds were 104.4 words per minute and 99.3 words per minute, respectively.
To check the difficulty level of the listening passages, the participants
were asked to underline the unknown words in the scripts of the two passages 2 months after the experiment. This interval occurred because the
summer vacation was between the semesters. Responses from 34 participants (19 from LG and 15 from HG) were examined. Six (3 LG learners
and 3 HG learners) reported that they did not know the word secretary in
the first passage; 2 (1 LG learner and 1 HG learner) reported that they
did not know the word poetry in the second passage. Because the numbers
of the participants reporting unknown words of the two groups were not
so different from each other and because the number of unknown words,
that is, only two words, was small, it is suggested that the passages used for
this study were easy for both groups.

The recall protocols written in the participants’ first language (L1)
were analyzed by idea unit analysis. The passages were divided into idea
units in advance, mainly on the basis of Carrell’s (1985) definition of idea
Basically, each idea unit consisted of a single clause (main or subordinate,
including adverbial and relative clauses). Each infinitival construction,
gerundive, nominalized verb phrase, and conjunct was also identified as a
separate idea unit. In addition, optional and/or heavy prepositional
phrases were also designated as separate idea units. (p. 737)


In addition, to make idea units shorter for the analysis of recall protocols on the listening tests, adverbials and nonheavy prepositional
phrases functioning as adverbials were counted as separate idea units.
Based on these criteria, the two passages for this study were divided into
16 and 12 idea units respectively (see appendix). Then exact recall of
each idea unit was assigned one point. Thus, the highest possible score
was 28.
All the recall protocols were scored by the author. To check the intrarater reliability, they were scored again after a 1-month interval. The
agreement rate was 98.02% (988 out of 1008 idea units). To assess interrater reliability for the scoring, about 20% of the protocols (14 out of the
72 protocols) were scored by another rater. The interrater reliability
agreement rate was 99.49% (390 out of 392 idea units). The test reliability (Cronbach’s alpha) for the first trial was 0.69; the alpha for the second
trial was 0.76.

RQs 1 and 2: Main Effect of Repetition and Interactional
Effect of Repetition and Proficiency Levels
Table 1 shows descriptive statistics of recall performance on the first
listening and the second listening for each group. The results indicate
that repetition facilitated listening comprehension for both groups. For
both groups, the second effort was better than the first effort. Second,
both groups improved to a similar degree: HG improved by 5.88 points,
and LG improved by 5.00 points. It is important to note that HG outperformed LG on the first listening. This result may support the use of the
Michigan English Placement Test for the division of the participants.
A two-way ANOVA was performed with time (first listening and second
listening) being a within-subjects factor and with proficiency level (high
Descriptive Statistics of Recall Performance by Group and Time

(n = 16)
(n = 20)















Note. Dif = 2nd listening − 1st listening; M = mean; SD = standard deviation; SES = standard error
of skewness; SEK = standard error of kurtosis.



and low) being a between-subjects factor. The main effects of time and
proficiency level were significant: F(1, 34) = 82.40, p = 0.000; F(1, 34) =
6.45, p = 0.016. Effect sizes measured as Pearson’s correlation coefficients
(Field, 2005, pp. 514–516) were also calculated. According to Field
(p. 33), a coefficient of 0.50 or above is considered to show a large effect;
a coefficient of 0.30–0.50, a medium effect; and a coefficient of 0.10–
0.30, a small effect. The effect size of time was large, r = 0.84, explaining 70.6% of the total variance; the effect size of proficiency level was
medium, r = 0.40, explaining 16.0% of the total variance. Thus, the results
showed that HG outperformed LG and that the second effort was better
than the first. On the other hand, the interactional effect of time and
proficiency level was not significant: F(1, 34) = 0.53, p = 0.470, r = 0.12.
Thus, the effects of repetition facilitated both HG and LG to a similar

RQ 3: Idiosyncratic Recall Units, Misinterpretation, and
For this study, idiosyncratic recall units were operationalized as recall
protocols of information which the original passages did not contain. In
other words, idiosyncratic recall units are additional information not
found in the passages. The following are examples of idiosyncratic recall
units from the data of this study:
1. She usually has lunch at a noodle shop every day
2. She wanted to become a singer in the future
3. Nancy wants to become an English teacher
The italicized parts were not contained in the original passages (see
appendix); therefore, they were coded as idiosyncratic recall units. The
results show that although more participants in LG produced idiosyncratic recall units on the first listening than HG (12 out of 20, 60.0% for
LG versus 6 out of 16, 37.50% for HG), repeated exposure brought about
improvement for 66.67% of those who had produced idiosyncratic recall
units in both groups (8 out of 12 for LG; 4 out of 6 for HG). In addition,
the number of those who produced new idiosyncractic recall units in LG
(6 out of 20) was larger than HG (1 out of 16). Thus, the results suggest
that less proficient learners produce more idiosyncratic recall units than
more proficient learners at both listening times, that repeated exposure
helps decrease idiosyncratic recall units, and that the effectiveness of
repeated exposure is the same for both groups.
Analysis of misinterpretations, operationalized as incorrect recall protocols of the original texts (“to become a swimmer” for Idea Unit 203


to become a singer), shows that repetition is beneficial for both groups as
well. In order to clarify this point, misinterpretations of Idea Unit 112
(Before she went back) are shown here as an example. In LG, only 1 participant recalled this idea unit correctly on the first listening. Thus, the
other 19 participants did not get points for this idea unit. Of the 19 participants, 12 did not write any protocols for this idea unit; 7 provided
incomplete protocols and were not assigned a point. On the second
listening, of the 12 participants who did not write any protocols on the
first listening, 3 did not produce any protocols; 1 wrote correct protocols;
and 8 gave almost correct protocols but misinterpreted the conjunction
before in this unit (Before she went back), as “after she went back” or “on
the way back.” The 7 participants who provided incomplete protocols
on the first listening did not improve on the second listening. In HG,
the number of participants who produced no protocols on the first listening was smaller (n = 6) than in LG. Of the 6 participants, 2 did not recall
anything on the second listening; 1 produced correct protocols; and
3 gave almost correct protocols but failed to recall the word before. Of
those who wrote almost correct protocols, that is, misinterpretations, on
the first listening (n = 6), 3 produced correct protocols on the second
listening. Thus, this example shows that repetition helped both groups
understand the text further even though the protocols were not assigned
a point.

Guided by three research questions, this study provided the following
findings. First, repetition was shown to have facilitated listening comprehension. Moreover, repetition reduced the production of idiosyncratic
recall protocols. In other words, repetition led to more precise comprehension of the passages. Second, this study did not find any interactional
effect between repetition and proficiency levels. That is to say, repetition
was effective for HG and LG to the same degree. Third, repetition helped
both groups reduce production of idiosyncratic recall protocols, although
LG produced more idiosyncratic recall protocols than HG. Therefore,
this study found that repeated exposure facilitated listening comprehension for both HG and LG and did not support the argument that the
effect of repetition varies according to proficiency level. Analysis of misinterpretations also supported these findings. It is important to note that
the findings of this study should be limited to the case in which learners
have sufficient L2 ability to understand the lexical items of the listening
These findings lend support to the results of Cervantes and Gainer
(1992) and Iimura (2007) and give evidence against the studies of Chang
and Read (2006) and Lund (1991), who argued for the interactional


effect between repetition and proficiency levels. Here it is worthwhile to
examine the results of Chang and Read, in particular in terms of the
effect between repetition and proficiency levels; although Chang and
Read argued for a differential effect of repetition for different proficiency levels, another interpretation seems possible. They argued that
“LLP [low listening proficiency] learners benefited less than HLP [high
listening proficiency] learners from PQ [preview of the questions]
and RI [repetition of the input],” mainly based on the findings that the
high listening proficiency groups scored better than the lower listening
proficiency group in these two conditions (p. 389). Because all four
conditions in Chang and Read’s study included previewing the questions, they stated that “in effect the PQ [preview of the questions] group
was a comparison group to provide a basis for evaluating the enhanced
listening support experienced by the other three groups” (p. 385).
However, they did not make such comparisons in their discussion of the
results. It seems natural that the high listening proficiency group outperformed the low listening proficiency group because the two proficiency groups differed in their proficiency levels. For the condition of
the preview of the questions, the mean scores of the two proficiency
groups were 18.39 and 14.91, respectively. If these mean scores were
treated as a baseline, both proficiency groups in the repetition condition obtained higher means (20.47 and 16.44, respectively) although
the difference between the two conditions was not significant. Thus,
another possible interpretation of Chang and Read’s results is that repetition may have improved the performance of both proficiency groups,
but the changes were not statistically significant. In other words, their
results may not provide evidence that repetition affects different proficiency levels differently.
Therefore, aside from Chang and Read (2006), a statistically significant interactional effect between repetition and proficiency level was
reported only by Lund (1991). It should be noted that he found a statistically significant interactional effect only in one of the two analyses of
the recall protocols, that is, in the lexical item analysis but not in the
idea unit analysis. Although the current study used free written recall
tasks like Lund’s study, it did not carry out lexical item analysis. It is possible that detailed scoring systems may be necessary to detect the differential effect of repetition. If this is correct, it is plausible that some
studies did not find the differential effect of repetition: The studies
(Cervantes & Gainer, 1992; Chang & Read, 2006; Iimura, 2007) used a
partial dictation task, a multiple-choice test, and an open-ended question task, respectively, which require test takers not to understand everything in the passages but to listen to part of the passages. Thus, the
mixed results of the previous studies may be due to different analysis


In conclusion, several limitations and directions for future research are
addressed. First, only one text type, a monologue narrative, was used as
the listening materials. Different text types may yield different results. For
example, redundancy and repetition are already abundant in oral conversations (e.g., Flowerdew & Miller, 2005, p. 51; Ur, 1984, p. 7). It will be
necessary to examine the interactional effect of repetition and proficiency
level with different text types that have varying degrees of redundancy and
repetition inherent in them. Second, this study used only idea unit analysis
for the scoring of recall protocols. Recall protocols can be analyzed on the
basis of idea unit and pausal unit (Alderson, 2000) as well as lexical unit
(Lund, 1991). If detailed scoring analyses, as discussed earlier, are necessary to detect the differential effect of repetition for different proficiency
levels, it is important to examine the effects of different scoring methods for protocol analysis. Third, in this study free written recall tasks were
shown to be a promising research instrument to measure listening comprehension in sufficient detail and to examine the learner’s comprehension
processes by analyzing idiosyncratic recall protocols or misinterpretation;
nevertheless, other testing tasks need to be considered because of the possible differential findings that may be attributed to the use of certain tasks.
Investigating the repetition–proficiency interaction effect using different
tasks will help to situate and evaluate the results of previous studies. For
example, dictation allows researchers to analyze listeners’ comprehension
processes such as free written recall and to score the dictated protocols at
the word level. One disadvantage of dictation is that the task requires test
takers to write in their L2; thus, test takers’ writing ability may confound listening comprehension. However, dictation can avoid the problem of previewing questions in investigations of the effects of repeated exposure.
Although the findings of this study should be interpreted with caution
because of these limitations, this study did not lend support to the argument that the effect of repetition varies according to proficiency level. In
other words, repetition of a listening passage in a listening test may not
lead to differential effects for test takers at different proficiency levels.
The author thanks Brian Wistner, Ken Urano, and Rebecca Ann Marck as well as the
anonymous reviewers for their helpful comments on earlier drafts of this article.

Hideki Sakai is an associate professor in the Faculty of Education at Shinshu University,
Nagano, Japan. His current research interests include second language interactional


studies, classroom second language acquisition, and psycholinguistic aspects of listening and speaking.

Alderson, J. C. (2000). Assessing reading. Cambridge University Press.
Berne, J. E. (1995). How does varying pre-listening activities affect second language
listening comprehension? Hispania, 78, 316–329.
Brindley, G. (1998). Assessing listening abilities. Annual Review of Applied Linguistics,
18, 171–191.
Buck, G. (2001). Assessing listening. Cambridge University Press.
Carrell, P. L. (1985). Facilitating ESL reading by teaching text structure. TESOL
Quarterly, 19, 727–752.
Cervantes, R., & Gainer, G. (1992). The effects of syntactic simplification and repetition on listening comprehension. TESOL Quarterly, 26, 767–770.
Chang, A. C.-S., & Read, J. (2006). The effects of listening support on the listening
performance of EFL learners. TESOL Quarterly, 40, 375–397.
Corrigan, A., Dobson, B., Kellman, E., Spaan, M., & Tyma, S. (1993). English Placement
Test. Ann Arbor: University of Michigan, English Language Institute, Testing and
Certification Division.
Field, A. (2005). Discovering statistics using SPSS (2nd ed.). London: Sage.
Flowerdew, J., & Miller, L. (2005). Second language listening: Theory and practice.
Cambridge University Press.
Harmer, J. (1998). How to teach English. Harlow, England: Longman.
Iimura, H. (2007). The listening process: Effects of question types and repetition.
Language Education & Technology, 44, 75–85.
Lund, R. J. (1991). A comparison of second language listening and reading comprehension. Modern Language Journal, 75, 196–204.
Obunsha. (2004). 2004 nendoban eiken jun 2 kyu zenmodanishu CD (The 2004 collection of
the pre-second grade STEP tests: CD). Tokyo: Author.
Rost, M. (2002). Teaching and researching listening. Harlow, England: Pearson
Sherman, J. (1997). The effects of question preview in listening comprehension tests.
Language Testing, 14, 185–213.
Society for Testing English Proficiency, Inc. (n.d.). EIKEN Test in Practical English
Proficiency. Tokyo: Author. Retrieved May 30, 2009, from http://stepeiken.org/
Thompson, I. (1995). Testing listening comprehension. AATSEEL Newsletter, 37,
Ur, P. (1984). Teaching listening comprehension. Cambridge: Cambridge University

Listening Passages (Obunsha, 2004)
Passage 1 (Mary’s Lunch Time)
(101) Mary works / (102) as a secretary. / (103) She [usually] has lunch / (104) usually / (105)
at a noodle shop / (106) near her office. / (107) But [yesterday] it was closed, / (108) yesterday /
(109) so she went / (110) to an Italian restaurant / (111) and had some spaghetti. / (112)



Before she went back / (113) to her office, / (114) she stopped / (115) at a café / (116) and
had dessert.

Passage 2 (Nancy’s Dream)
(201) When Nancy was a high school student, / (202) she wanted / (203) to become a singer. /
(204) But [while she was at college], she met a great English teacher, / (205) while she was at
college, / (206) Mr. Porter. / (207) He taught her many things, / (208) including how to enjoy
poetry / (209) and write short stories. / (210) Now / (211) Nancy wants / (212) to become a



