The Unz Review - Mobile
A Collection of Interesting, Important, and Controversial Perspectives Largely Excluded from the American Mainstream Media
 Gene Expression BlogTeasers
Structure in 1000 Genomes South Asian Data

Est100_1

I’m currently trying to figure out how best to integrate 1000 Genomes data, along with Estonian Biocentre, and HGDP. First, I converted the VCF files from the 1000 Genomes into pedigree format. I’ll put that up on GitHub in the next few days. Then I filtered the results for SNPs which are found in the HGDP. Finally, I intersected the results with the Estonian Biocentre data sets. I was left with ~250,000 markers after quality control (e.g., removing markers which are missing in more than 0.1% of the 5000+ samples).

In particular I’m curious about the population structure of the South Asian data. When you sample Chinese or Japanese or the English you need to be geographically diverse, but you don’t have to worry about social stratification too much.* Not so with South Asia. You have to be careful who and where you sample, because the variation doesn’t cleanly follow geography. In the Diaspora for example wealthier and higher status groups tend to be represented. In the mid-2000s Noah Rosenberg’s lab published Low Levels of Genetic Divergence across Geographically and Linguistically Diverse Populations from India. In the paper itself the authors cautioned their samples were from the United States, so one should be careful about accepting the idea that they might represent the geographic variation in South Asia well. In hindsight it seems likely that their selection bias was too great for them to overcome to make robust conclusions, even with over 700 microsatellites.

Above is a PCA plot I generated for South Asians. I’m not quite sure of the coding of some of the Estonian Biocentre populations, so don’t take that as gospel. I was more curious about the distribution of the 1000 Genomes samples, since they are likely to be widely used in the near future.

First, let’s focus on the Bengalis from Bangladesh:

Parents

I was frankly surprise how genetically homogeneous this group is. The two overlapping black dots are my parents. It seems clear that my family comes from a region of Bangladesh which likely has more East Asian ancestry than is the norm. This makes geographic sense, my family’s roots are in the eastern part of eastern Bengal. Though it is hard to see on this plot a small group of Bengali individuals, specifically six, reside in a tight cluster amidst samples from Tamil Nadu. The fact that they aren’t randomly scattered indicates to me that there’s some genuine structure here. I suspect that there is evidence here of a group which has been assimilated, but retained its separate caste-community identity.

But overall there is a major contrast between the Bengali samples from Bangladesh, and the previous Gujarati samples, now also in the 1000 Genomes (ou can see the Patel cluster on the other side of the Bengalis, as it bulges out). The non-Patel Gujaratis were genetically varied, some very similar to individuals from Pakistan. In contrast there isn’t that sort of cline among the Bengali samples (it doesn’t look like they sampled any Bengali Brahmins in this data set, at least those of full heritage). The Punjabi samples were collected from Lahore, and they range from many individuals who are little different from Pathans to some whose genetic background resembles those from middle castes in Southern India. I don’t know what’s going on here, but there has been some back migration of laborers into the Punjab historically. I believe this is the origin of some low caste groups who are now Christian. Both the Telegu and Tamil samples have a few Brahmins in them. This is clear in the following plot:

SIndia

The two Brahmins are from Tamil Nadu. You notice that several of the 1000 Genomes Tamil and Telegu samples are rather close to them. South Indian Brahmins tend to be genetically very similar, so almost certainly that’s what these individuals are if they are placed here on the PCA. Though the Tamil samples are relatively tightly clustered, the Telegu break out into several groups. One of the major 1000 Genomes groups overlaps perfectly with Velamas, a middle caste from Andhra Pradesh. The individuals who are Telegu speakers between the Velamas and Brahmins may be of mixed heritage. I don’t know.

Ultimately I’d like to do some TreeMix and pairwise comparisons between these populations. But to do that I’m going to have to clean them up a bit so that they make sense as…populations.

* The outcaste group in Japan only crystallized during the Tokugawa period. Not long enough to be genetically that distinct from the broader Japanese population.

 
• Category: Science • Tags: 1000 Genomes, Genetics 
Email This Page to Someone

 Remember My Information



=>
Commenters to Ignore...to FollowEndorsed Only
[]
  1. Hi Razib,

    If you don’t mind me asking, how exactly were you able to include your parents in this analysis? Did you convert their 23andme raw data into a different file format?

    -Beowulf

    Read More
    • Replies: @Razib Khan
    yep. to pedigree format.
    ReplyAgree/Disagree/Etc.
    AgreeDisagreeLOLTroll
    These buttons register your public Agreement, Disagreement, Troll, or LOL with the selected comment. They are ONLY available to recent, frequent commenters who have saved their Name+Email using the 'Remember My Information' checkbox, and may also ONLY be used once per hour.
    Sharing Comment via Twitter
    http://www.unz.com/gnxp/structure-in-1000-genomes-south-asian-data/#comment-1004904
    More... This Commenter This Thread Hide Thread Display All Comments
  2. Punjab saw a huge population exchange in 1947. Plus there have been always been migration into Punjab from the east as well as the west. I am not sure if 1000genomes filtered participants based on ethnicity but there are also Pathans in Lahore, recent as well as several generations old.

    BTW I have several Bengali Brahmins in my dataset.

    Read More
    • Replies: @Razib Khan
    yes, that's why i said doesn't look like there are any BB in this data set. knowing bangladesh i wouldn't be surprised if they just screened out any hindus just because....

    re: samples. the description is actually pretty vague: https://catalog.coriell.org/1/NHGRI/Collections/1000-Genomes-Collections/Punjabi-in-Lahore-Pakistan-PJL

    contrast with bangladesh, https://catalog.coriell.org/1/NHGRI/Collections/1000-Genomes-Collections/Bengali-in-Bangladesh-BEB where all four grandparents were identified as bengali

  3. @Beowulf
    Hi Razib,

    If you don't mind me asking, how exactly were you able to include your parents in this analysis? Did you convert their 23andme raw data into a different file format?

    -Beowulf

    yep. to pedigree format.

    Read More
  4. @Zack
    Punjab saw a huge population exchange in 1947. Plus there have been always been migration into Punjab from the east as well as the west. I am not sure if 1000genomes filtered participants based on ethnicity but there are also Pathans in Lahore, recent as well as several generations old.

    BTW I have several Bengali Brahmins in my dataset.

    yes, that’s why i said doesn’t look like there are any BB in this data set. knowing bangladesh i wouldn’t be surprised if they just screened out any hindus just because….

    re: samples. the description is actually pretty vague: https://catalog.coriell.org/1/NHGRI/Collections/1000-Genomes-Collections/Punjabi-in-Lahore-Pakistan-PJL

    contrast with bangladesh, https://catalog.coriell.org/1/NHGRI/Collections/1000-Genomes-Collections/Bengali-in-Bangladesh-BEB where all four grandparents were identified as bengali

    Read More
  5. R.K.

    Can I ask which district your parents came form? Comilla? I ask because the genetic structure of people in Tripura is quite different from the genetic structure of Cachar in India. The east asian ancestry in Bengalis in India is more pronounced in Jalpaiguri and Coochbehar than in Silchar, Cachar and Tripura. The data I have is on paper, an is on mtDna only.

    Read More
    • Replies: @Razib Khan
    homna and chandpur. though my ancestry is a bit 'cosmopolitan', e.g. maternal grandmother's father was from noakhali. re: bengalis in tripura, i don't know the details, but i'd assume they are from *all over* east bengal. so they might be less east asian than people from comilla?

    "fun" fact, my maternal grandmother was almost killed by a stampeding elephant of the maharani of tripura.
  6. Sorry, I pressed enter too fast, but I have another comment. Often, Tamil Brahmins are considered to be representative of Tamils; they are not. Approximately 100 years ago, they were primarily concentrated in 3-4 districts (Tanjavore, Thirunelveli, Palakkad, and South Arcot/Chengalpattu). The Brahmin population in tamil nadu is no more than 1 million in a total of 70 million. The genetic conformity of Brahmin population is attributed to both, a small population, and origins in approximately 800villages (1/2 streets) in 4 districts, and 100% intermarriage. This is comparable, with more scatter in Andhra for telugu brahmins

    In contrast, Andhra represents 84 million people spread over some 275,000 square km. My point here is using TN brahmins, as anything more than a small relocation and admixture of indo-aryans over 10 centuries, is not important to Indian population history. we can drop Brahmin samples, and move forward.

    Read More
    • Replies: @Razib Khan
    yes, i know the distinctiveness of brahmins in south (thanks to zack above and harappa DNA!) though his results suggest that they're all rather close to each other in the south, so there may have been a single founding event.
  7. @Vijay
    R.K.

    Can I ask which district your parents came form? Comilla? I ask because the genetic structure of people in Tripura is quite different from the genetic structure of Cachar in India. The east asian ancestry in Bengalis in India is more pronounced in Jalpaiguri and Coochbehar than in Silchar, Cachar and Tripura. The data I have is on paper, an is on mtDna only.

    homna and chandpur. though my ancestry is a bit ‘cosmopolitan’, e.g. maternal grandmother’s father was from noakhali. re: bengalis in tripura, i don’t know the details, but i’d assume they are from *all over* east bengal. so they might be less east asian than people from comilla?

    “fun” fact, my maternal grandmother was almost killed by a stampeding elephant of the maharani of tripura.

    Read More
  8. @vijay
    Sorry, I pressed enter too fast, but I have another comment. Often, Tamil Brahmins are considered to be representative of Tamils; they are not. Approximately 100 years ago, they were primarily concentrated in 3-4 districts (Tanjavore, Thirunelveli, Palakkad, and South Arcot/Chengalpattu). The Brahmin population in tamil nadu is no more than 1 million in a total of 70 million. The genetic conformity of Brahmin population is attributed to both, a small population, and origins in approximately 800villages (1/2 streets) in 4 districts, and 100% intermarriage. This is comparable, with more scatter in Andhra for telugu brahmins

    In contrast, Andhra represents 84 million people spread over some 275,000 square km. My point here is using TN brahmins, as anything more than a small relocation and admixture of indo-aryans over 10 centuries, is not important to Indian population history. we can drop Brahmin samples, and move forward.

    yes, i know the distinctiveness of brahmins in south (thanks to zack above and harappa DNA!) though his results suggest that they’re all rather close to each other in the south, so there may have been a single founding event.

    Read More
  9. I am not surprised about some punjabi’s being genetically similar to middle cast south indians. I live in Delhi where there are significant number of Punjabi’s who are basically refugees from Pakistan’s Punjab who came here after partition. I can see people with dark complexion and all kinds of features. There has always been lower and middle castes in all parts of India. Even in North India Brahmins form a small percentage and that is the reason political parties have to resort to caste politics because Brahmins don’t fetch enough votes. Except for Jats, some punjabis, rajasthanis and Brahmins rest of them are all between brown and dark brown. On the other side you have lower caste Tamils and people from southern Andhra who are almost black. Rest of them all fall between brown and dark brown. Note: though Karnataka is in South, it has had admixture through west coast (same with Kerala) and Maharashtra (same rulers ruled maharashtra and significant parts of Karnataka for many centuries), Karnataka people look more like Maharashtra. Kerala also has had admixture. You can clearly differentiate between a Tamil and people from Karnataka. I see most genetic studies ignore Karnataka and lower and middle castes of North India. Usually studies only consider Brahmins of UP and lower caste Tamils. I also see that no one has done research on admixture in South India through west coast.

    Read More
  10. example: Aishwarya Rai is from western coast of Karnataka. People from Mangalore in Karnataka have had more admixture and they are lighter than all other south Indians groups and that includes even others in Karnataka.

    Read More

Comments are closed.