[FORBES] Blockchain: The Missing Link Between Genomics and Privacy?

One major concern is over privacy and security of genomic data. For instance, policymakers in Washington recently threatened to make genomic data necessarily available to employers. Complicating the privacy issue is the fact that there’s no clear legal owner of genomic data; that data was found to be unpatentable and, because it lacks authorship or a creator (legally), cannot be copyrighted.

It seems the only way to have any control over your genomic data is to simply not get tested, keeping that data hidden “in” one’s own body. But this defeats the entire point of most modern medical genomic technology.

Philosopher David Koepsell has been pushing the frontiers of ownership and protection of genomic data for nearly a decade. In 2009, his book Who Owns You foreshadowed the lawsuit, a couple months later, by the ACLU and other parties against Myriad Corporation based on its patents on the BRCA genes. Those patents would have allowed Myriad to effectively monopolize the testing market for breast and ovarian cancer.

Myriad eventually lost their patents; the US and Australian Supreme Courts invalidated them on the grounds that the patents were improperly applied to naturally occurring things rather than inventions. But these cases still left something critical unresolved, which Dr. Koepsell has been working on: Given that individuals do not legally “own” their genomic information today, how can we best ensure the security and privacy of our personal genomic data?

Recently, Koepsell and his long-time collaborator and partner, Dr. Vanessa Gonzalez—having together authored many publications about genomic data and privacy—started a software company to solve the problem. It’s unusual for academics, especially a philosopher, to stray this far outside their comfort-zones, but they have a plan to build a technical solution based on blockchain technology.

In the interview below, Koepsell elaborates on the ethical problems, which have been a roadblock to scientific progress:


A: I’ve been an academic for more than a decade and became familiar with the typical academic trajectory of research and publication about issues both abstract and practical. I got very lucky that a book I wrote—about what I considered an interesting abstract and metaphysical problem—came out very shortly before a lawsuit was filed to actually address what many considered to be injustices relating to gene patents. I was thrust into the world of policy as a result, and the outcome was extremely satisfying, as well as an educational experience in itself.

When the non-patentability of genes was resolved by the US Supreme Court’s unanimous Myriad decision, there were still a number of unresolved issues that promised to hinder genomic science and medicine. I did as usual: I wrote about them, gave lectures and courses, and little more. But during that research and teaching, I realized that I had also discovered a rather exciting solution, one that was not dependent upon policy, and one that could be solved with blockchain technology.

At first, I considered the usual academic route: to apply for grants and hire postdocs, then write articles and books. I found some collaborators, including my wife, and we began researching and writing. But we quickly realized we could not create the solution without a lot of grant money, which would take at least a year and was by no means certain. Meanwhile, the architecture of the solution we imagined was clear to us. So we found seed money, hired a developer, and got to building it. To do this in a reasonable way, we needed to form a company, which we did: Encrypgen, LLC, incorporated in Florida.

A: At my previous job as a professor at the Delft University of Technology in Holland, our section on technology and values was deeply involved in research on “designing for values” sometimes also referred to as “value-sensitive design.” The traditional engineering idea that ethics and technology are two entirely separate fields is mistaken. When we design objects and services we are always incorporating some value of some kind into them.

Blockchain cooks in the values of privacy, security, and ownership. This is the insight that economist Hernando de Soto had when he began to embrace blockchain solutions for tracking land titles. We realized we could use the same technology already built around privacy, security, and ownership—the values that are also necessary for genomic data protection—without blocking scientific progress. Indeed, blockchains can safely hide your genomes from prying eyes and, at the same time, allow scientists to gather anonymized information about large populations—and only then use that data with the donor’s permission.

A: Genomic data is extremely sensitive. Most people are not aware that your DNA contains information about your life expectancy, your proclivity to depression or schizophrenia, your complete ethnic ancestry, your expected intelligence, maybe even your political inclinations. Within a decade or two your genome will likely reveal even more.

For example, you may be denied health insurance because you carry genes linked to breast cancer. Or maybe eventually you may be denied promotion because, according to your DNA, your skills set does not fit with the expected job profile. Your genomic data can be misused by potential employers, unscrupulous corporations, and governments in any number of unexpected ways. It must therefore be kept private.

But anonymized genomic data is also vital for scientific progress. Personalized medicine became possible only by analyzing genomic data from thousands of donors. Genomic data help scientists create better treatments for specific groups, understand the role of genetics in diseases and immunities, and do some important research. Personalized medicine may be possible when we know a person’s genetic makeup, meaning we can tailor our treatments to an individual specifically. This cuts costs and increases efficiency, and targets the treatment to the person in the best way possible.

The apparent conflict between privacy and scientific progress makes the promise of genomic science and medicine particularly tricky. This conflict will intensify as costs go down and more secrets are unlocked. The question will soon be very pressing: How can we store our data for personal, medical use safely and without risking its misuse, especially as hackings of sensitive data are so common? Luckily, blockchains tailored to these values can reconcile this conflict.

A: It has become increasingly apparent that blockchains offer powerful applications well beyond cryptocurrencies, which is where much of the public attention is presently focused. The reason bitcoin is so useful and valuable is that the blockchain creates a distributed ledger: an immutable, distributed record which is also nearly impossible to hack. The owner of a bitcoin account has absolute control over their asset. This is a perfect solution for storing any kind of highly sensitive data, and there is a great deal of interest in applying the tech to applications other than cryptocurrencies.

For example, DARPA is looking into using blockchain for protecting nuclear weapons data. Blockchain solutions are being developed for tracking diamonds, intellectual property rights, and for real-world logistics. People have been thinking about using blockchain for genomics for a while; we just wanted to do it a certain way and first to maximize the ethical safeguards we think are most important. The major obstacle for everyone has been dealing with such large datasets, and keeping the functioning of a large blockchain above a slow crawl given how the processing of blockchain data typically works.

A: Well, for individuals, it will be a safe place to store their genomic data. If you get tested and want to always be able to access that data, then store it on the Gene-Chain free of charge. Your data will be more secure than most other solutions, including carrying around a USB stick which you could lose. On the Gene-Chain, your data is virtually unhackable, encrypted, and you can provide a time-limited key to your doctor or others with whom you want or need to share the data.  You can also choose exactly what part of your genome to share. We can also track misuse of that data, which has a unique signature.

For scientists, they will have access to the metadata—the data about the data, such as age, ethnicity, gender, etc.—and can search for potential subjects whose data might be useful and interesting for their studies. That search reveals nothing specific or personal about the donor, and it does not give them access to the genomic data itself. But they can make a request to the donor, who can then choose whether to allow its use, negotiate terms for getting paid for that use (possibly), and finally go through the ethical consent procedure necessary for their particular jurisdiction.

All of this will mean a real revolution in genomic science, as well as provide donors of data with greater protection than ever. Blockchain also enables a lot of that functionality because one of its strengths, besides extremely strong encryption, is managing transactions. Research institutes and companies with large amount of genomic data can purchase licenses to store their data without worrying about the ethics, which is baked into the product and its transactions, allowing them to instead focus on the science.

A: Your bank has an encrypted database containing the authoritative and only copy of your current balance. Although the bank backs up and secures the data, it can still be hacked or manipulated—which happens a lot more often than banks care to reveal. The TV series Mr. Robot tells how a group of determined hackers can cause a major financial-data Armageddon. That is virtually impossible on a blockchain, such as the one that supports bitcoin. Hack one node and the remaining tens of thousands of nodes will immediately reject the manipulated records.

This is why Bank of America, Merrill Lynch, Santander, Royal Bank of Canada, PwC, and many other banks, insurers, and other financial institutions are conducting intensive research in blockchain technology. It’s because blockchains are much more difficult to hack.

A mathematical theorem, the Byzantine Generals theorem, proves that it takes coordination between at least one third of the nodes to successfully attack a blockchain network. It doesn’t mean that data is 100% secure, but to attack the chain itself successfully requires orders of magnitude more time and resources than any other security measure. That’s another reason military and banks are interested in blockchains.

Another benefit of this parallelism is that blockchains are essentially built as a peer-to-peer technology: available everywhere, always. Blockchains have limitations too, mostly relating to size and speed, but they offer unprecedented levels of privacy, security, and ownership.

A: Absolutely. Storing and sharing genomic data is a technical problem that many are trying to solve, and computation has been a great boon for the research. A raw genomic data file is about 5 to 6 gigabytes of information given that the human genome is 3 billion base pairs long, and there is a lot of important associated tagging of that data done when a genome is sequenced.

So far, doing whole-genome sequencing has been very expensive, and so the amount of such data has been pretty unmanageable. The 1000-Genome Project has put the results of their full genome sequencing on the web, and you can download those sequences and play around with them for free, but managing and sifting through the data is unwieldy. They use a reference-file method to compress their data, because human genomes are 99.5% similar, so we can basically ignore all the stuff that is the same from person to person and just focus on the differences, which is a really good way to make the useful datasets smaller and easier to work with.

Even so, once you get a lot of these files, you are still talking about a significant amount of data to process, so we have to come up with better ways to compress the data without loss. In our case, we are using “deep learning” techniques, which essentially use artificial agents to do the math for us.

A: Legal protections for genomic data mean two things: ensuring the greatest possible form of protection for the data by data curators, and punishing those who misuse the data. Ensuring the greatest means of protecting the data is largely a technical problem, and so far the solutions out there are inadequate. There may well be a social need to punish those who hack and misuse the data, but until the technology is adequate to the task given the risks, we cannot expect the law alone to properly protect people.

Also, because the Myriad case made owning genomic data more or less impossible, a technical solution may be the best way to give some sort of property protection to people for their data. The law and technology can work hand-in-hand to solve these types of problems, but sometimes technology may have to lead the way.

A: Some is stored publicly, like the 1000-Genomes Project data, although it is de-identified to the extent that names have been removed. But a recent Harvard study showed that public data could be pretty easily re-identified and the donors could be determined for about half the data. That should definitely concern people who thought their donated data would aid science without revealing sensitive data about themselves.

Commercial entities and research institutes relying on privately storing data use anything from cloud-based Oracle databases to locking hard-drives in safes or filing cabinets, which may satisfy current ethics guidelines but certainly won’t remain state-of-the-art as blockchain surpasses it, nor are they ideal for use in science. We should certainly demand better solutions for all our important personal data, and genomics, if it is to achieve its scientific and medical potential, should be at the forefront of our privacy and security demands.

Patrick Lin , I write about technology, ethics and philosophy.
Opinions expressed by Forbes Contributors are their own.