Myth #1: Language is composed of words

This is as real as a word ever gets.

So you believe you communicate using words? You may wish to reconsider. For a start, there seems to be no consensus about what a word is. This is the first in a series of posts about doubting the obvious in linguistics.

If nobody is one word, how many words is no one?
How many words are there in don’t know? But then, how many in dunno?
Is walked a form of walk? If yes, how about has walked?
Is better a form of good? If yes, is more interesting also a form of interesting?
If lifestyle is one word, are life-style and life style also one word (variants of lifestyle)?
Is bank one word with two meanings (central bank and river bank) or two words with the same form?
Does that change if the meanings are related, like in tea – the plant, the leaves, the processed leaves, the beverage, the occasion of consuming the beverage?
Is favour a variant of favor, or a separate word?
Is fevor a misspelled variant of the above, or a separate word? Or maybe it is not a word at all?
Is French orange the same word as English orange? If yes, is French pomme the same word as English apple as well? Or perhaps French prune and English prune? Or even French pain and English pain?
Is York a word? If yes, how many words is New York?
Is OK a word? If not, does it become one when spelled okay?
The English vocabulary grows daily, as new stuff is discovered, invented or imported. When exactly does something like kiiking become a word in English?

Linguists’ answers to such questions vary considerably. Internal inconsistency is common too, for instance it is perfectly acceptable to consider river bank and central bank two separate words, but still talk about using THE word in a new meaning (say, battery bank). Which of the two then? It is also normal to consider fevor a nonword in research, but still mark it as a misspelled word when grading student essays. Few translators would consider pomme and apple the same word, but when having to translate the sentence “He used the word apple” they wouldn’t hesitate claiming that he used the word pomme instead. It would be normal for a writing guide to consider life style two words, but still stipulate that IT should be spelled lifestyle. Proper names and abbreviations are words in most contexts, but not in Scrabble, for instance. And so on.

To generalise a bit, there are at least the following issues with defining a word.

  • Length of a word, or the segmentation problem. Speech is a continuous signal with no measurable boundaries between words. In writing, many languages conventionally separate words using spaces, but where to put those spaces is based on convention only. The convention could just as well be different. Across the world’s languages conventions do vary quite a bit, up to the point of not using spaces at all as in many Asian languages, or allowing the space to differentiate meanings as in Estonian.
  • Variation. While writing is more or less consistent, in speech there is no such thing as exactly the same word (if measured accurately enough), even for one speaker in the same situation. There is even more variation across situations, between speakers and over time. Some of the variation has also made it into writing, like in do not knowdon’t knowdunno.
  • Words vs nonwords. The English language contains millions of words, half of which only occur once in a text corpus of any size, and no single person can ever hope to know all of them. There is no obvious way of determining the wordness of a character string: it might really not be a word, but it might also be a well-known technical term in a not so well known specialised field, or a new buzzword that Google’s indexing robots will only pick up tomorrow.
  • To consider or not to consider the meaning. Lexicographic tradition does consider the meaning, and even history of the meaning, so the banks above are separate words with coinciding form (homonyms) while tea is one word with many meanings (polyseme). First, this is way too complicated. In, say, language technology it is unnecessary to know whether the first moneychangers were sitting by the river or not, making the meanings related or unrelated. Even if an archaeologist discovered that they in fact did, this would have no effect on how people communicate or how language technology works. Second, and more importantly, the whole idea that words have meanings is mathematically impossible, but that will be the topic of a future post.

Words might exist as explanatory devices for linguists, like orbits do for astronomers. But to communicate successfully, a person doesn’t need to know anything about words or segmentation or language history, just like planets don’t care about their orbits. They just move in the most natural way. Think of a preschool child you know, to see how humans do the same.

