The list of most appropriate concepts was established through careful evaluation of concept lists used in similar studies (, section 3), and lexical cognates were identified by experts in Sino-Tibetan historical linguistics using the comparative method supported by state-of-the-art annotation techniques.

Second, we apply Bayesian phylogenetic methods to these data to estimate the most probable tree, outgroup, and timing of Sino-Tibetan under a range of models of cognate evolution; similar methods have been applied to several other families of languages, including Indo-European (18–20), Austronesian (12), Semitic (21), and Bantu (22).

The past 10,000 y have seen the rise, at the western and eastern extremities of Eurasia, of the world’s two largest language families.

Together, these families account for nearly 60% of the world’s population: Indo-European (3.2 billion speakers) and Sino-Tibetan (1.4 billion).

Based on a dataset of 50 Sino-Tibetan languages, we infer phylogenies that date the origin of the language family to around 7200 B.

A second group presents Sino-Tibetan basal topology as a rake, with Chinese being one of several primary branches (10).

A third group places Chinese in a lower-level subgroup with Tibetan (15, 16).

Third, we examine Sino-Tibetan expansion under the two most probable phylogenetic scenarios through a consideration of the family’s plant and animal domesticates, the regions where they are earliest attested archaeologically, and the distribution of the corresponding cognate sets across the family’s branches.

Of the 3,333 cognate sets distributed over 9,160 lexical items, 90% are shared by fewer than five languages.

