Background: India is a patchwork of tribal and non-tribal populations that speak many different languages from various language families. Indo-European, spoken across northern and central India, and also in Pakistan and Bangladesh, has been frequently connected to the so-called “Indo-Aryan invasions” from Central Asia ~3.5 ka and the establishment of the caste system, but the extent of immigration at this time remains extremely controversial. South India, on the other hand, is dominated by Dravidian languages. India displays a high level of endogamy due to its strict social boundaries, and high genetic drift as a result of long-term isolation which, together with a very complex history, makes the genetic study of Indian populations challenging.

Results: We have combined a detailed, high-resolution mitogenome analysis with summaries of autosomal data and Y-chromosome lineages to establish a settlement chronology for the Indian Subcontinent. Maternal lineages document the earliest settlement ~55–65 ka (thousand years ago), and major population shifts in the later Pleistocene that explain previous dating discrepancies and neutrality violation. Whilst current genome-wide analyses conflate all dispersals from Southwest and Central Asia, we were able to tease out from the mitogenome data distinct dispersal episodes dating from between the Last Glacial Maximum to the Bronze Age. Moreover, we found an extremely marked sex bias by comparing the different genetic systems.

Conclusions: Maternal lineages primarily reflect earlier, pre-Holocene processes, and paternal lineages predominantly episodes within the last 10 ka. In particular, genetic influx from Central Asia in the Bronze Age was strongly male-driven, consistent with the patriarchal, patrilocal and patrilineal social structure attributed to the inferred pastoralist early Indo-European society. This was part of a much wider process of Indo-European expansion, with an ultimate source in the Pontic-Caspian region, which carried closely related Y-chromosome lineages, a smaller fraction of autosomal genome-wide variation and an even smaller fraction of mitogenomes across a vast swathe of Eurasia between 5 and 3.5 ka.