Hackers of the genome have demonstrated that no DNA will be anonymous
In 201? a young specialist in computational biology, Yaniv Ehrlich, shocked the research community, 3-3-311. showing r3r372. How can I reveal the identities of people listed in an anonymous genetic database using only 3r339. Internet connection
. Regulators responded by restricting access to anonymous biomedical genetic data sets. Representative of the National Institutes of Health 3r311. reported 3r372. then: “The chances that this will happen are small for most people, although not zero”.
Fast forward in five years, and we find that the amount of information about DNA stored in digital databases has increased explosively, and this growth is not going to slow down. Consumer companies like 23andMe and Ancestry have compiled genetic profiles for more than 12 million people, according to r3r317. recent estimates of 3r372. . Users who have downloaded their information can optionally add it to public genealogical sites, for example, GEDmatch, which gained notoriety this year due to its role in targeting the police to the suspect in the “killer state of the Golden State”.
Study 3r372. Ehrlich, published in the journal Science in October 201? more than 60% of Americans with European roots can be identified by their DNA, using open genealogical databases, regardless of whether they sent their DNA there.
“The result is that it doesn’t matter whether you have taken the analysis or not,” says Erlich, chief researcher at MyHeritage, the third largest consumer genetic company, after 23and Me and Ancestry. “You can be identified because the databases already cover most of the US, especially of European origin.”
To derive these estimates, Erlich and his colleagues from Columbia University and Hebrew University in Jerusalem analyzed the MyHeritage database, which contains ??? million anonymous users, mostly white-skinned, like the vast majority of the world's genetic databases. Regarding each user as a “target,” they counted the number of his relatives with large proportions of matching DNA, and found that 60% of the search queries found at least his second cousin. Investigators for the search for "the killer of the Golden State" and the disclosure of another 17 cases required only a level of kinship known in law enforcement as "searching for distant relatives." To confirm their discovery, the Ehrlich team uploaded 30 genetic profiles to GEDmatch and found similar results - 76% of search queries issued relatives not further than second cousins.
The analysis produced a list of approximately 850 people, depending on the fruitfulness of the ancestors of the object. From this starting point, you can quickly reduce basic demographic information. Public archives, from which the place of residence of a person with an accuracy of 160 km, reduce the recruitment of candidates in half. Age to five years excludes 9 out of 10 people. Gender, which can be established on the basis of genetics, cuts the list down to about 16 people. Exact year of birth can leave you one or two candidates.
To demonstrate the ease of the process, the researchers chose an anonymous woman from 1000 Genomes Project - a project with open genomic codes - who was married to a man, whom Erlich had previously identified in his popular 2013 work. They reformatted the data on her DNA so that they resembled the data of a typical online service client and uploaded it to GEDmatch. The service found two relatives, one in North Dakota and one in Wyoming. From coincidence followed their distant kinship, within 4-6 generations. After an hour combing through the public archives, the team discovered their husband and wife. Starting from this, the researchers traced the genealogies of hundreds of descendants and calculated the identity of their goals. It all took one day.
Erlich believes that the day is not far off when such a search can be carried out on any person who has left his DNA somewhere. The study found that when the genetic database covers about 2% of the adult population of any ethnic population, a match not further than second cousins can be found for almost any person. The sample base is richer for people whose ancestors were Americans or Europeans, and for them this milestone can be reached within a few years if interest in entertaining DNA tests is maintained at the same level. According to the latest US census, two percent of the population will be just four million.
Such a resource will seriously increase the number and variety of suspects, to the data of which there will be access to law enforcement agencies during investigations. The databases of violators of the law, in which the police store DNA of almost 17 million people, are convicted criminals, and in some states, and just people who have been arrested, they mainly contain data on blacks and Latin Americans. From the early days of DNA testing, technological incompatibilities of different methods have created a wall between databases of criminals and databases of people who donate DNA for entertainment or research purposes. Militiamen collect and analyze highly variable non-coding parts of the genome, counting the number of repetitions of the "junk" parts of DNA. It is, in fact, just a sequence of numbers, and it does not say anything about a person’s personality. However, it is unique to each person, something like a barcode or fingerprint. Also this method is quick and cheap - ideal for police purposes.
Medical and entertaining DNA records include a complete transcript or arrays of genotypes - a set of changes that occur in one place of a gene. This is
single nucleotide polymorphism (SNP), and it is he who is responsible for the fact that you have green eyes or curly hair, or a predisposition to heart disease. It is also much more useful for finding relatives. Since these two types of databases are not related to each other, in the case of the “Golden State Killer”, I had to extract DNA from old samples, create a SNP profile and upload it to GEDmatch. But now even this is no longer necessary.
In another paper published in October in the journal Cell, for the first time it was demonstrated how to search for distant relatives on the basis of data from criminal databases. Noah Rosenberg Group at Stanford University 3r3-361. already showed
that you can link the records in these two bases by comparing the nearest SNPs with non-coding repeats. The work was published last year, and did not attract much attention. “Silence,” says Rosenberg. But his latest work, studying the cross-compatibility of two databases, has already received a new meaning in the light of the case of the “killer of the Golden State”.
“This way can expand the reach of forensic genetics, and potentially help uncover even more old cases,” says Rosenberg. “At the same time, he will disclose the data of the participants in these databases during searches related to the investigation of crimes, which they probably did not expect.”
Legal experts consider as a bigger problem the fact that Rosenberg’s work implies that the DNA profile stored in police databases contains more information than previously thought. It can be used to accurately predict the coding regions of the genome - those associated with green eyes, curly hair and heart problems. “All decisions of the Supreme Court that the existing databases of criminals do not violate 3r371. Fourth Amendment 3r372. , based on the assumption that nothing can be extracted from this junk DNA, ”says Andrea Roth, director of the Center for Jurisprudence and Technology at the University of California at Berkeley. "And now it all comes to dust."
Rosenberg did not release any software with the work, so it will take some time to perform real calculations. But he says that anyone with access to several databases has all the necessary information in order to start using this technology. This means that built-in privacy protection can fall pretty quickly. The work is conceived as a warning to show regulators the capabilities of modern technology, and Rosenberg hopes that it will launch a long overdue discussion about the storage and use of genetic information.
Ehrlich and co-authors went even further in developing recommendations for the changes necessary for GEDmatch-type resources, which provide an important service for people searching for missing relatives, and for adoptive children who are looking for biological parents, to stay online and be safe. They called on the United States Department of Health and Human Services to review the personal health information framework and include impersonal genomes. They described an encryption strategy that can create a chain of information security so that databases can mark users trying to analyze other people's genetic data. But even if absolutely all companies providing services related to genomes are dragged into this system, this may not be enough.
“I think the result is that now all people will be under the hood of genetic surveillance, if we do not regulate the government’s ability to conduct genetic searches,” says Roth. He proposes a system similar to California’s regulation of the more traditional search for relatives in criminal databases. They can only be used to investigate violent crimes - murder, violence - and the scope of the search is limited, so as not to involve information about hundreds of innocent people. There are supervisory commissions that can prevent the careless disclosure of sensitive information, if, say, someone’s father turns out to be not a biological father. “This is all irony,” says Roth. - If your relative is in the CODIS[база преступников]database. , you have a lot more genetic privacy rights than if you have a relative in GEDMatch. ” But with enough of your DNA, it doesn't matter if you want to be found or not. Waivers are no longer accepted. 3r33939.
It may be interesting