This and all episodes of this podcast are available to study as a lesson on LingQ. Try it here.
Steve had a very interesting conversation with Paul Nation, a leading world expert on vocabulary acquisition, and language learning in general. A quick search on Google will show many references to contributions by Paul Nation to the field of language learning.
Steve: We’re talking today with Paul Nation who is a leading expert on language learning and English language learning, in particular, who’s located at the University of New Zealand, I believe.
Paul: Hi, how are you?
Steve: Fine thank you. Is it, in fact, the University of New Zealand?
Paul: No, it used to be about 100 years ago, but now we’re Victoria University of Wellington.
Steve: Okay, I’m sorry.
Paul: Named after Queen Victoria because it was started when she was on the throne.
We last met when I was in Taiwan about two or three years ago and you spoke to an audience of about 100 eager English teachers in Taiwan.
Paul: Oh yeah.
Steve: You explained your four threads and they were very interesting.
Paul: Four strands.
Steve: Four strands, rather, and you were able to refer to your own experience in learning Japanese and I think other languages and so forth.
Maybe we could begin by my asking you to explain the four strands.
The idea behind the four strands is to make sure that there’s a range of opportunities for learning.
It’s really a way of a teacher or a course designer checking a course to make sure there’s a proper range of opportunities for the learners to learn the language.
The four strands are these: the first strand is the strand of meaning-focused input, which Steve Krashen would call comprehensible input and that’s learning through reading and through listening.
There are certain conditions which have to apply for that learning to take place and Stephen Krashen calls it comprehensible input.
From a vocabulary perspective it means that only about 1 in 50 of the running words should be unknown to the people who are doing the reading or the listening.
Steve: 1 in 50?
Paul: 1 in 50.
We’ve done an experiment to show that.
It would be good to have more research on it; in fact, we’ve got a Ph.D.
student who’s going to start doing more research on that, but it actually agrees with a figure that Michael West arrived at almost 80 years ago when he started designing the very first graded readers.
Paul: So that’s the first strand the strand of learning through input.
The second strand is the strand of learning through output.
I call it meaning-focused output where the learners are focusing on conveying messages; getting messages across to listeners or getting messages across to readers through writing and through speaking.
Having to produce language makes you pay attention to input in a different way, but also provides good opportunity for consolidating knowledge that you’ve already got.
The third strand is the strand of language-focused learning.
Rod Ellis and others call it form-focused instruction, but I’m not so happy with that name because it doesn’t actually have to be instruction by a teacher and it can focus on more than form it can focus on meaning as well.
I call it language-focused learning and that means deliberate learning of language features through studying the sound system, the spelling system, vocabulary and so on.
I’m actually very excited about a piece of Ph.D.
research that one of our students has completed.
That student looked to see if you learned vocabulary deliberately on word cards, does this give you the kind of knowledge, the implicit knowledge, which is needed for normal language use.
She actually found that this deliberate learning resulted in both explicit knowledge and implicit knowledge, which is different from the learning of grammar because with the learning of grammar there seems to be evidence that there is no direct route into implicit knowledge through the deliberate learning of grammar.
It’s a rather indirect route, but for vocab it seems different.
Paul: Deliberately-study vocab is available for in-implicit knowledge for normal language use, so that’s the third strand.
Paul: Then the fourth strand is the strand of fluency development and fluency development simply means getting good at using what you already know.
This should be at every level of a language course, so even when you’ve just learned the numbers in a very elementary course you should learn how to recognize those numbers very quickly, so if someone says 98 you can get 98 quickly and you can understand what they say.
The idea with fluency development is you get really quick and fast at using what you’ve already learned.
Steve: Oh, okay.
Paul: Now three of these four strands, meaning focused-input, focused-output and fluency development, are message-focused strands.
They would fit nicely into a communicative approach to language teaching for example.
The language-focused learning strand is a deliberate study strand so it’s a bit different from the other three, but I think you have to have all four in a course and you have to have roughly equal proportions of the four.
You know it’s very interesting because we’re doing what we’re doing with our system and I very much would like to have and I very much appreciate you agreeing to sort of have an interchange with me because I’m not a university professor, I’m someone who has been in business and who has a great interest in languages and we’ve developed this system called LingQ.
But if I listen to the things that you’ve described to me here, I’ll tell you what we do is similar and different.
We place a lot of emphasis on input and 1 in 50 unknown words strikes me as very low and I’ll explain why.
I would agree with you if I were reading a book and I had nothing else to help me.
I have been learning Russian using our system starting from scratch and you can’t be 1 in 50 because initially it’s zero.
Paul: That’s right.
Steve: Because what we do is you read something, we ask people to listen to it three-four-five times and to read it.
They look up unknown words and the words go to a database.
This starts to develop then a database that they can look at in terms of flashcards, it develops statistics, etc.
I listened to call it learner language for a while and read learner language listening 20-30-40 times and then I moved into authentic content and, of course, when I first went into authentic content our system tells me that there’s 40 or 50% new words there for me, so that was very hard going, but I did it.
Now I’m down to about 15 to 20% new words if I’m reading Tolstoy (?????
7:00) or whatever, but our learners say that they like to be between 10 and 20%.
Granted, we’re talking about total words, so we’re not talking about word families.
People are quite comfortable as long as they’re reading on a computer with access to an online dictionary where stuff is going into a database.
Our system highlights words they have previously saved so they can refresh on those, so yes input, yes comprehensible.
We tend to encourage people to deal with something that’s a little more challenging and a lot of our learners say they’re happy doing so.
I’m not sure if that contradicts what you’re saying, but that’s sort of what we’re doing.
Paul: The way I’d look at is that, first of all, the 98% coverage is for unassisted reading.
Steve: Okay, right.
Paul: That means you’re reading without a dictionary and without that help.
I guess it’s sort of – how would you say – almost intuitive personal observation of when in your reading most of your focus is on language features or most of your focus is on getting the message.
If you’re doing assisted reading, as you suggest, that assisted reading could be meaning-focused input as long as the major focus is on the message of the text.
Steve: Let’s put it this way, the way we do it, what we try to suggest and the way I’m motivated is I’m interested in the story.
If we can get our learners, again, to select content that they like, which in my case is 19th century literature, which might be a bit of an esoteric interest you know, I don’t mind going at it with 20% unknown words because I’m interested in what I’m doing.
As I listen to it for the third and fourth time and I read it for the second and third time and then I go through all my flashcards the whole experience is enjoyable for me.
Paul: You could argue that what you’re doing is really covering three strands of the course.
When you start off with your reading you’re clearly doing language-focused learning because there is so much that is unknown.
Paul: We’ve done studies on English and to get 98% coverage of English — where you’re naturally including proper nouns as words — that can be considered as known, so they are part of that 98% coverage.
You actually need 9,000 words or for a novel you’d need about 8,000 words to read a novel with 98% coverage.
Steve: Now you’re talking word families.
Paul: I’m talking 8 to 9,000 word families to get 98% coverage of what we call the running words or tokens.
Steve: But I’ve read some of your material and you suggest a 1 to 1.6 ratio…
Paul: …no, no, no.
Steve: …of word families to total word count.
I read that in something you wrote somewhere.
Paul: Yeah, that’s old.
Steve: Oh, is that old?
Paul: Even though one of the first principles that students have to follow when they enter my course is that you respect age that’s one that I wouldn’t fight for now.
Steve: I’ll stop quoting you then.
Paul: The problem with that figure is it’s dependent on the length of the text because you’re looking at a variant of type-token ratio and type-token ratio is very strongly dependent on text length.
Steve: Oh, I see.
Paul: If you have a really long text then that ratio of actual words to the number of times it’s repeated then can go up.
Steve: I’m not talking about number of times it’s repeated, I’m saying in English where you have “go” and “going”.
Paul: You’re saying how many members in a word family.
Steve: Yeah, because it just so happens that in our system we consider every different form of the word as a different word.
What we do in our system is when you save a word we automatically capture all phrases in all of your content that use this word and “going” is used differently than “go”.
You gather a bunch of phrases where the word “going” is used, so you get credit for “go”, “going” and “when”; they’re all different.
Steve: I understood you in one of your articles to say that the difference between a word…like if you say 9,000 word families, I would say that on our count you need 1,500 words.
Paul: Word types.
Steve: Word types.
Paul: Yeah, okay. Well, there’s a quick and easy way to find this out.
Paul: One way is to go to my website and download the program called The Range Program.
There’s a version there based on the British National Corpus, which goes up to the first 14,000 word families of English.
If you run that program and look at the figures just below the table it will tell you how many families are in each list, which is 1,000, but it will also tell you how many types there are in the same list.
There’s a very, very big difference in the number of types amongst the high-frequency families compared to the lower-frequency families.
Steve: No doubt; far more types in the high-frequency ones.
Paul: Yeah, so if you look at the first 1,000 – I can’t remember the figures – there’d be probably five or seven or eight members of a family; whereas if you look at say the 8 or 9,000 there’s barely two members to a family.
Steve: For sure, yeah, okay.
Paul: So you can’t keep a standard figure for that because it differs so slightly from one frequency level to the other.
The other point I’d raise about that is you say that each word is used differently.
Paul: When you look at production, (…) speaking and writing, then I think the word type is probably the best unit for counting.
But if you’re looking at reception that is understanding through reading and listening, I think the word family is a better unit for counting.
But one of the things we’re trying to do in our system is get people, eventually, to be able to use them.
When they save a word in the form of say “going” they will collect 10 sample phrases of that word in use, which they’re supposed to practice.
Eventually they’ll show up when they go to write, so they’ll find “going” is used differently than “go”.
We don’t necessarily treat it as purely receptive or purely productive.
Paul: That’s fair enough.
Steve: I don’t think I disagree with what you’re saying.
Paul: No, I don’t think we’re disagreeing with each other.
Paul: John Sinclair would agree with you in terms of output because he argues that even different inflections, you know the “ing” or the pluraless, results in a word having different collocates.
Steve: Exactly and that’s what you’re looking for is which words are used with which words.
I find in Russian where there’s a tremendous amount of inflection, I just go about and I save all the different forms of these words.
Our words show up in our vocabulary area sorted in different ways, but if I look at them alphabetically or if I search by roots I’ll find 7-8-10-15 words that are all very similar and then I can just look at them and get a sense of which ones are used how kind of thing.
We like saving individual types, as you call them.
Paul: I think you’re also doing a good thing there.
At the moment, we’ve got a Ph.D.
student just starting who’s looking at if you know the first 1,000 words of English how many lower frequency words does this give you access to.
Say you come across “astonish”, “astonish” actually has the root of “stone”, so when you’re astonished you turn to stone.
Paul: Now maybe that doesn’t help with astonish very much, but what he’s looking at is to see that if you actually know the first 1,000 what roots in that will help you with words from the lower-frequency levels.
No one’s actually quantified that in a very systematic way.
Steve: But you know what, based on my experience with Russian I would look at it quite differently.