Dr. Damián Blasi - "Under the Shadow of the (Language) Tree"

Duration: 26 mins 24 secs
Share this media item:
Embed this media item:


About this item
Dr. Damián Blasi - "Under the Shadow of the (Language) Tree"'s image
Description: A recording of Dr. Damián Blasi's (Harvard University) seminar presentation, "Under the Shadow of the (Language) Tree"—part of the "Cultures at the Macro Scale" seminar series.
 
Created: 2021-08-01 14:49
Collection: Culture at the Macro-Scale
Publisher: University of Cambridge
Copyright: Damián Blasi
Language: eng (English)
Distribution: World     (not downloadable)
Keywords: Phylogenetics; Linguistics; Creole; Language change;
Explicit content: No
Aspect Ratio: 4:3
Screencast: No
Bumper: UCS Default
Trailer: UCS Default
Transcript
Transcript:
00:01
Welcome you all to my presentation: "Under the Shadow of the Language Tree". I am Damian Blasi, Harvard Data Science Initiative fellow, and Branco Weiss fellow, currently based at the Human Evolutionary Biology Department at University of Harvard. I'm extremely hyped about this series of meetings. And I've been delighted and impressed by the previous iteration of it. So I really look forward to discussions and interactions with each of you afterwards.

00:28
This is the main thematic axis we participants were given: we hope to address how cultural groups and their boundaries should be modeled, and what implications new or revised methods might have for our theories of culture and cultural change. So I'll be doing something along these lines that is slightly different. First of all, rather than proposing how things should be, I'll tell you about the most influential way in which people think about languages and language groups in the context of cultural evolution at a macroscopic scale. This, I will be arguing, will be a consequence of the famous "tree model", which I'll describe with some details in the first part of the presentation. After that, I'll be giving you just a taste of some interesting phenomena in relation to language that escape that tree model, and which I think deserve to be more widely known for those studying the dynamics of human culture.

01:27
Okay, so let's get started with the first part, which is the description and discussion of the tree model. My object of study in general is linguistic diversity. So I would like to give you some general factoids about it just to get started. There are roughly 6500 languages spoken and signed, in the world today, and about 40% of that diversity remains to be described properly. For many languages, within that 40%, we might have only a minimal sketch of the grammar. And many times we have much less than that, a simple word list, for instance. We continue encountering new languages, and as well as new linguistic features that keep adding to the wealth of diversity we are familiar with. And roughly four languages go extinct every year, but the general trend is for most languages to shrink through time. And this will only get worse, as a handful of languages take over.

02:23
I'm telling you all of this as a way of foreshadowing part of my conclusion, which is that we don't know as much as we think we know about language. And this is a big loss, because we have a better global coverage for languages in spite of what I just said, than for any other complex conventionalized behaviour. So understanding how languages end up being the way they are, is an obvious resource for thinking about other cultural systems more in general.

02:53
So then, how do languages end up being the way they are? Now, regardless of how we try to answer this question, we'll have to deal sooner or later with a tree model. The core idea is very simple, I'm sure most of you will be familiar with it. Languages and their histories can be represented successfully in a tree structure, which as a first approximation simply means that each language descends from at most one other language. This model has more or less dominated the discussion on language history for over 160 years. And I'm showing you here one of the most recent published language trees on Sino-Tibetan, which was a paper that was published last year. And what is for many, the first instance of a tree model based on Indo-European and dating back to 1861.

03:43
But as a side note, it turns out that one could go back further in time. And the earliest attested version of a tree of languages, goes back to the 1800s by some relatively obscure French intellectual 53 years before the publication of Darwin's famous doodle, which is quite interesting, right?

04:05
Now, there are and there have been several ways of producing these trees from data and expert knowledge. But I'll be discussing mainly the principal techniques and ideas and the most influential methods that have been used in relation to macro evolutionary processes and large scale language processes in general. So if you're interested in the more technical aspects of this, I'll recommend you to read our 2017 paper that came out in the Journal of Language Evolution, or there is a very nice recent paper by Paul Heggarty, in The Annual Review of Linguistics, where he goes into detail in relation to the methods, the ideas and these assumptions.

04:49
Now the standard practice, if we want to build a tree, is that we start with a number of historically related languages, and some type of word list which covers similar meaning across these languages—as in the case here we have six different words for six Indo-European languages. Then you identify a set of words across languages for which we can infer a common ancestry by positing a regular sound correspondence between languages. For instance, we have the words for "one" in Hindi and Persian—"ek" and "yek". We can look at the rest of the vocabulary in those languages and find out that words that in Persian start with "e" do not have that speech sound in their Hindi correspondent. So what we suggest is that these two words derived from a common ancestor, because they follow a regular pattern that can only be explained by common ancestry. We can keep doing this across sets of languages and meanings. We call these sets cognate sets. Historically, linguists identified these sets based on extensive knowledge of the languages involved, and the range of possible sound correspondences. But nowadays, these have changed and they're very efficient automatized methods that are able to discover a coordinate set with very little to no supervision.

06:10
Then we aggregate the evidence given by these cognitive sets. And this is exactly the critical part, because we transform shared cognates between languages into history— which is what ultimately we care about. Now as the first order approximation, one could say that the more cognates two languages share, the more recently their common ancestor existed. But not all cognate sets provide the same amount of evidence of shared history. Again, I'm glossing over all of the details, but I hope the main gist of the idea and the method is clear. So we can continue doing this by aggregating increasingly larger sets of languages. And, and we end up with deriving a whole tree that binds all of the languages for which we have evidence of cognancy. And voila! We have the tree.

07:00
Now, this model has brought us immense insights into the nature of language dynamics, and human history. And it's worth noting that a lot of these developments have happened within the last 25 years or so, which is fairly recent, thanks to mainly methodological developments in phylogenetic inference. So combining the tree model philosophy with modern phylogenetic tools, has been used to test and discover the time depth of important events in history like demographic expansions, that peopling of different regions of the world, or the synchronisation of linguistic diversification to the domestication of plants or animals. Once these trees have been obtained through this method, they can be used along with tools from evolutionary biology, for testing linguistic evolution, as it happens within those phylogenies. And this has revealed many many interesting facts about the cultural evolution of language.

07:58
Now, a few decades before these developments, the relevance of a tree model got a huge momentum. From early studies linking culture and genes like the ones pioneered by Cavalli-Sforza, for instance, the idea that was pushed forward is that these languages—this language tree, sorry—capture quite well, the underlying history of the people who use the languages, and not only the history of the languages themselves. And this triggered a peculiar generation of human geneticists, which were interested in learning the old sounds, and Eurasian words for "wheel" among other oddities.

08:36
Now, if we put all these things together, we end up with an effective definition of what thing a language is, when looked through the vantage point of a tree model. And I want to be extremely clear about this. Rather than approaching this in a bottom up manner, thinking hard about what kind of thing a language is. Instead, I'm deriving the notion of language from what it is a very successful model of language history.

09:05
So what is the language according to the language model and its satellites? First of all, languages are entities subject to vertical transmission in the form of a tree, naturally. Second, keeping track of cognates sets—the set of words—is a way of keeping track of the history of the whole language. And third, language history flows in parallel to the genetic history of its associated population. This way of thinking about what a language is, through the tree model has become standard within the wider field of cultural evolution, as illustrated, for instance, in this chapter in the superb book by Alex Mesoudi. And perhaps the most glaring proof of the dominance of this notion of language is that other domains of human culture and behaviour have adopted language trees as the best and sometimes the only available proxy for the history of societies and their peoples. And you see here just a sample of some of the cultural traits that people have investigated with language trees, ranging from kinship to marriage patterns and social stratification.

10:17
But there is another reason why this notion of language has taken over and has been so successful. Let's go back to the figure. Do you notice anything strange in the language tree? If you look closer, you can see that the so called "La Langue Roman" (and I apologise for those who actually speak French)—is derived from Gaulish, Latin and Greek. So the very first depiction of the language tree already violates the condition of "no more than one ancestor"—not the greatest start for the language tree model.

10:53
But this illustrates something very important, which is that even those that use the tree model, know that it doesn't necessarily accommodate all possible cases of language history. But they have dealt again and again with its limitations. And the conclusion of a lot of work that I'm not going to be discussing at all seems to me that this definition of language that we have just discussed—that derived from the tree model—is robust and reliable. And that while exceptions might occur, they are just that exceptions. Those might require other methods and ways of thinking. But the widespread success, the mainstream appeal and the illustrious history of the three model, kept researchers away from those margins. So you have to come across with those cases that lie outside of the scope or tree model just by accident.

11:47
And this is what happened to me many years back, when I started investigating creole languages, I was expecting to find something very, very different to what it ended up being. So, creole languages emerge in highly multilingual societies where different groups of people do not share a common language. Historically common setups for creoles have been slaveries, marooned communities, trade posts, indenture... indentured worker communities, and many, many others. The children of these people grow up together, and in many circumstances, they do not receive a direct instruction in any specific language. So they're just passively exposed to the surrounding languages and the basic communication code that adults use. And I'll tell you a bit more about that just now. But as these children become adults, they will end up developing a new language, the so-called "creole" language.

12:46
Now, very early in the history of the study of creole languages back to the 60s at least, it was noticed that creoles around the world seem to be quite different, linguistically-speaking, from non-creoles and this ended up fueling the notion of the "Creole exceptionalism hypothesis", which suggests creoles are unique because they undergo a transmission bottleneck. This is what the hypothesis more or less says. We start as I said, with at least two mutually unintelligible languages spoken in the context of the creole genesis, there could be many more languages than two involved, of course. For instance, in the sugar cane plantations of Hawaii in the late 19th century, one could find speakers of languages from Europe, the Americas, Eastern Asia, the Pacific and Southeast Asia, all within the same community. Now, these people need to communicate with each other, and they develop a pidgin based mostly on the words and expressions of the dominant language, often the European language in the case of colonial setups. Now, a pidgin is not a full natural language, it is mostly made up of fixed expressions with very limited productivity and with a very narrow scope. So you can't do... you can't refer to an open world with a pidgin you can do much less than that, which is what natural languages allow you to do.

14:11
So this is exactly the so called transmission bottleneck, because the pidgin is unable to preserve the properties that make a language "language": its syntax, morphology, semantics, etc. So you can't retrieve the full fledged scope of natural languages out of just simply a pidgin. And I think a good analogy would be that if you try to reconstruct a language, let's say Spanish out of a bunch of traveller phrases, of course, you can do many things with phrases like "no entiendo" or "no hablo Espanol", but thinking that from there you can derive a whole grammar—that's extremely unfeasible.

14:54
Now, what happens then is that this pidgin is acquired by the local children and transformed into a full fledged language which is the creole language. Just for those of you who have never encountered a Creole before, this is a picture I took about a month ago in San Andres—which is a coral island that belongs to Colombia, just beside Nicaragua. A part of the population speaks San Andres Creole or "creole" directly, as people call it. So, some of the public announcements are in this languages as well, naturally. Obviously, the picture gives away a lot of the message. But if you read through the lines in creole, you'll perhaps recognise many familiar sounding words in English. The "creole exceptionalism hypothesis" would tell us that at some point back in the past, there was a pidgin in this island, and that it was based in English. But the grammar of the San Andres Creole did not come from English and it was created from scratch, so to say, because of the transmission bottleneck we discussed before.

15:56
But you might ask, how, and perhaps more importantly, why all creoles look like if they are created from scratch and presumably from different teachings. So, I will tell you about two types of explanations that have been given in this respect. The most radical proposal is associated mainly with a BBS paper from 1982 authored by Derek Bickerton. And in that paper Bickerton argues that creoles look alike because children enrich the pidgin with a mental blueprint of grammar we all humans come equipped with. So in a way creoles are the default state of language—how languages presumably look like before they were steered away from the biological given grammar.

16:43
A different answer can be exemplified in John McWhorter's latest book, "The Creole Debate" that came out in 2018. The argument goes that creoles end up looking alike not because of a common hardwired bias, but because they are the result of the practical need for communication. If you have to come up with a new language because you just need to talk with other people, you need to communicate, then probably you'll avoid a lot of stuff that is very common in all our languages like conjugation classes, nominal inflection, different verb forms for different tenses and aspects, etc, etc. So in a way, the transmission bottleneck gets rid of all of these unnecessary things. And then humans simply build what they need for communication.

17:27
But there is a problem with this hypothesis. The proponents of this have based their generalisations on the basis of very, very few creoles—as few as five sometimes and they analyse different sets of linguistic traits. And sometimes these generalisations and the proposals they have, they have nothing in common. So everyone agrees on that creole languages are exceptional, and different, and there has been a transmission bottleneck—but they didn't agree in exactly what do they have in common and what is that makes them different from other languages. So in reality, we don't know whether creoles are, after all, a distinguishable group of languages and much less whether this is due to transmission bottleneck plus some innate blueprint—or a tendency towards efficient communication. So we ignore this.

18:13
Now in 2017, me and my collaborators provided an empirical test of the hypothesis, we did so by gathering some data covering as many creoles as we could gather, that you can see as blue dots in the map, and a set of curated non-creole languages for comparison, which are the red dots in the map as well.

18:32
So here's a sketch of what the data look like for each language we have its status: whether it is a creole or not. And then we have a number of linguistic features for instance, the presence of tones, the basic word order of language, whether there is a grammatical gender and language, etc. So we can now test the first half of the hypothesis? Can we discriminate creoles from non-creoles based solely on grammatical structure? What we do is to deploy a number of binary classifiers for this task and you can ask me about details later, if necessary. And the answer that we get is that yes, creoles stand out as the... as a distinguishable group of languages on the basis of their linguistic structure. So, we can efficiently classify whether language is a creole or non-creole just based on the grammatical information that we have for them.

19:25
However, the reason why we can distinguish creoles from non-creoles might not be related to a transmission bottleneck. So, we took a look into the languages are incorrectly classified, and the emerging picture is very suggestive to say the least. Creoles that are classified as non-creoles are found far from the Caribbean and the principal European colonial roots. And the non-creoles that are classified as creoles are languages like English and Spanish, or Yoruba, which is a major with African language. Even more, the features that better allow us to distinguish creoles from non-creoles are features found in Europe and West Africa.

20:05
So the reason why creoles seem to form a consistent group should be extremely clear now: they simply reflect the fact that most creoles emerge in colonial setups in the last 500 years by Western European powers, based on enslaved people from West Africa. That explains the odd features that single out creoles from non-creoles and that misclassify languages as well. The creole languages are resulting from these circumstances are a consistent group of languages, in the same way, say Romance or Germanic languages are—because of a common history. Not a tree history, but a history nonetheless.

20:45
So this finding makes the transmission bottleneck idea unnecessary, which is actually consistent with the fact that for the overwhelming majority of creoles, we don't have any good trace of a pre-existing pidgin.

20:59
Now, if the accounts we receive are accurate, the children that started the creole did not benefit from any directed instruction. And they were sometimes even forced to work and in general, they've withstood horrible upbringing circumstances that one would expect. Now, in spite of all of this, they nevertheless managed to learn extremely complex language structures from the languages that surrounded them. This is not only an incredible feat of human resilience, but it is also a powerful attention call to our narrow conceptualization of language development, which emphasizes explicit instruction and directed speech in contrast to implicit learning, for instance.

21:41
Now, what I did not anticipate when dealing with creoles is that we find this extremely interesting pattern in how creole structure emerges—that clearly reveals that creole languages are not just randomly sampling out of the features of the linguistic ancestry. So, this is the part I find the most interesting when it comes to the discussions relation to cultural transmission. On the one hand, we have the lexifier bundle of features. This... this is that creoles inherit from one language—which is commonly the colonial language in the most typical cases—the majority of the vocabulary, so from this colonial large, powerful language, they get the vocabulary—but also they get some word order features and also a number of speech sounds in their language.

22:35
But then there is this other clear contribution, which is much more diverse. And that is composed by many things that require a non-trivial acquisition, like the production of tones, the subtle semantics involving framing a verb in time, and many other examples that were leveraged from the other substrate languages. And at the same time, the societies that provide these features—the substrates—are the ones that contributes the most genetic makeup of the populations that end up creating creole languages. So this is extremely important.

23:11
Why? Because creole languages are perfectly normal languages in terms of their structure and function—fall outside of the notion of language we discussed before: they have clearly more than one Western ancestor. Keeping track of the history of the words in creoles tells you just a part of the ancestry of the language associated with the colonial language. And this component of history is not parallel to the history of the human populations that use the creole languages, which contribute other linguistic structures, but not necessarily the bulk of the vocabulary, which is what we trace in most of the tree model methods.

23:53
Now, before I wrap up, I want to raise one last point in relation to the typicality of tree histories in contrast to dynamics, like the ones that we observe in creole languages. While European colonialism might be relatively recent, large empires, indentured work, and slavery are unfortunately ancient institutions with worldwide prevalence. So the possibility that many more languages in the past emerge as creoles is, in my opinion, a very real one. In any case, I think this is at least as plausible as thinking that the historical circumstances that have given us the best cases of language trees—which are large and fast demographic expansions over entire continents—is the situations that are typical of most human societies or most of human history.

24:49
So to finalise, I would like to add that the tree model is not alone, in that there are other models in the study of language. Interestingly, also dating from more or less the same time—for which one could make a similar case. And I'm thinking here, for instance, in the idea that sign and meaning are independent from each other, or that the process of language change has remained largely the same over human history—two ideas that are referred to with the labels of "Arbitrariness of the sign" and "Uniformitarian hypothesis". So these ideas are simple—they are very elegant as well. And in principle, they account for many observed phenomena. And because of this, as in the case of the tree model, they have shaped the very notion of what thing a language is. But as with the tree model, they eventually fail when we peruse languages and language families, outside of a few lucky ones that have gotten most of the attention. And this is what I've been trying to show with my own research.

25:51
And this is what I want to tell you. I hope we get to talk more about all of these things and I look forward to discussions.
Available Formats
Format Quality Bitrate Size
MPEG-4 Video 640x360    727.14 kbits/sec 140.60 MB View
WebM 640x360    336.13 kbits/sec 65.00 MB View
MP3 44100 Hz 249.76 kbits/sec 47.81 MB Listen
Auto * (Allows browser to choose a format it supports)