I uploaded my whole genome sequence data to the cloud.

i-got-genomedI got genomed by Illumina.

In March 2014, my wife and I “got genomed” by enrolling in Illumina’s Understand Your Genome (UYG) program. UYG requires participants to order this whole genome sequence (WGS) test from their physicians due to uncertainties surrounding the delivery of genomic results in the U.S. Illumina is careful to point out that the service “…has not been cleared or approved by the U.S. Food and Drug Administration” and “you will not receive medical results, or a diagnosis, or a recommendation for treatment.” Our family physician signed the request in November 2013, and we received our results in February. Fortunately, no surprises, but the UYG program only covers these Mendelian disorders for now. We flew to San Diego a few weeks later to listen to talks by genomic researchers and discuss our results with genetic counselors. As part of this one-day seminar, we each received an iPad Mini that was pre-loaded with our results, as well as a portable hard drive that contained our raw sequence data.

illumina-wgs-hard-drive I received my WGS data on this encrypted hard drive (about 100GB).

After we arrived home, the next step was to find a public “home” for my sequence data (to share without restrictions). What I learned is that uploading your genome anywhere is a challenge, mostly because the dataset is so big.

I looked at DropboxEvernote and Figshare, but their storage models do not scale well for genomic data. I tried Sage Bionetworks, but the BAM file was too large to upload. I settled on Amazon Web Services (AWS) and created an anonymous FTP server using the Amazon Elastic Compute Cloud (EC2). (I spent a bunch of time working with Amazon’s Simple Storage Service (S3) using this article, but the 5GB file size limit of s3fs nixed that.)

About my whole genome sequence data

My genome data and results are now in the public domain, freely available to download under a Creative Commons (CC0) license. Uploading the data took two days over a 3Mbps connection, so you may want to read the clinical report and sample report instead.

  • BAM file checksum: 2529521235 (78.1GB uncompressed)
  • VCF file checksum: 4165261022 (2.4GB gzip compressed)

Questions about FTP? See this FAQ.

Now that I have my genome in the cloud, I’ll start playing with analysis tools like STORMSeq. Stay tuned!

A step forward: Consent for Clinical DNA Sequencing at the Iowa Institute of Human Genetics

During a recent podcast on Mendelspod.com, Colleen Campbell at the Iowa Institute of Human Genetics (IIHG) described the process of introducing pharmacogenomic testing and clinical exome sequencing at the University of Iowa. The project started small, but included pharmacogenomic testing for clopidogrel, as well as whole exome sequencing (WES). At IIHG, WES is intended for diagnostic odyssey patients; patients with a large list of differential diagnoses (where WES is more economical than multiple, individual genetic tests); and patients with atypical presentations of disease. (Today, WES provides a diagnostic answer about 25% of the time.)

As part of the process, patients complete this plain language informed consent form that explains the benefits and risks associated with genetic testing. The form lets patients decide how to receive information about incidental and secondary findings. More importantly, the consent form lets patients easily contribute their health information for future research. Unless patients opt-out, DNA samples and genetic data can be:

  • Compared with genetic information from others to improve future tests
  • Stored for future studies
  • Placed in a national repository (without identifying information)
  • Used to develop future products and services
  • Published in research studies (results, without identifying information)
  • Made into cell lines (from the DNA blood sample)

The consent form also includes lets the patient opt-in so that IIHG can use patients’ genetic information in future research studies (beyond the original purpose for the test).

IIHG has done an exemplary job involving an entire community to integrate genomics into clinical practice. By educating hospital staff, patients and the community, genomic medicine will slowly begin to take root.

Note: I would not be surprised to see IIHG presenting their results at conferences over the next year, including AHIMA, AMIA, ANIAASHG and HIMSS.

Paper: Big Desire to Share Big Health Data

health-data-sharing-model

Today I presented this paper about sharing personal health data at the 2014 AAAI Spring Symposium Series, hosted at Stanford University. The paper, co-authored with Melanie Swan, summarized the results of an online survey to gauge consumer attitudes toward sharing health information. Here’s the abstract:

Sharing personal health information is essential to create next generation healthcare services. To realize preventive and personalized medicine, large numbers of consumers must pool health information to create datasets that can be analyzed for wellness and disease trends. Incorporating this information will not only empower consumers, but also enable health systems to improve patient care. To date, consumers have been reluctant to share personal health information for a variety of reasons, but attitudes are shifting. Results from an online survey demonstrate a strong willingness to share health information for research purposes. Building on these results, the authors present a framework to increase health information sharing based on trust, motivation, community, and informed consent.

The take-home messages from the paper are:

  1. Consumers are willing to share health data under the right conditions.
  2. Education seems to play a strong role.
  3. Consumers want to be connected to their data.
  4. Develop models to encourage sharing. 

My favorite part of the talk was explaining how I repeated the survey using an online market research tool. Our respondents were extremely educated — 59% had a Master’s level education or higher — so I wondered if education played a role in their willingness to share. In less than two hours, I posted the survey and received 100 responses (compared with the nine months it took to receive 128 IRB-consented responses). This time, about 20% of the respondents had a Master’s level education or higher, still higher than the US average of 10%, according to the US Census Bureau. Nevertheless, overall attitudes toward sharing were similar. In particular, respondents who were not willing to share their health information tended to have little or no college experience. Although both surveys operated on convenience samples, the results suggest that education plays a role, perhaps because education can change our perception of the risks and benefits associated with sharing health data. Interestingly, these results and conclusions were similar to those found in a recent report published by the Health Data Exploration project sponsored by the Robert Wood Johnson Foundation. More information about this project:

The survey is ongoing! It takes just five minutes, so please add your voice here.

Autism Hackathon in San Francisco

This weekend I collaborated with Melanie Swan at the Autism Hackathon in San Francisco. Sponsored by Twilio and supported by Autism Speaks, this hackathon brought together 50+ developers and designers who created prototype applications for the autism community. At the end of the 24-hour event, a dozen teams presented 5-minute “pitches” for their ideas.

More here: http://www.autismspeaks.org/news/news-item/autism-speaks-and-twilio-team-hacking-autism

Our entry, “MindFlower,” is an “eLabor Marketplace for ASD Solvers.” Think about getting paid for solving puzzles like the ones in FoldIt–that’s the idea.

For more information about MindFlower, see these slides on slideshare.net

Note: MindFlower is just a concept, not an actual business or organization.

Image      (Image credit: Kimberly Pickard)

Restless Legs Syndrome and Niacin Study #2: Quantified Self Meetup in San Francisco

I will be presenting results from my second self-tracking study at the Quantified Self San Francisco meetup at Microsoft later tonight in San Francisco.

Experiment

By participating in this crowdsourced study on Genomera, I tested niacin supplementation as a potential treatment for Restless Legs Syndrome (RLS).

Methods

This experiment had two main differences from the first one. First, I tapered off my current medication, clonazepam, after ramping up with niacin. Second, I increased the daily niacin dose from 500mg to 2000mg, which meant that the ramp-up was also much longer.

RLS-Study2-Pickard.xls

I recorded some sliding scale measurements of RLS sensation, leg jerks, etc. in a spreadsheet (see above). Aggregated measurements are also available to Genomera’s members.

Results

Like the last experiment, niacin did not improve my RLS symptoms, even at the higher dose. However, RLS severity was less after tapering off clonazepam, perhaps due to the niacin. Since the first experiment, I also started taking an iron supplement to increase my ferritin level, which might also account for diminished RLS severity. As before, I saw my doctor after the experiment to discuss the results. We changed my medication to Mirapex, which is also commonly used to treat RLS. Compared to clonazepam, I feel more alert. The RLS symptoms remain under control, and amazingly, feeling returned to my sciatic nerve about one month ago–I can feel it all the way down to the top of my left big toe. I am unsure what this means, but after injuring my back 30 years ago it seems significant.

Finally, I wanted to mention that my psoriasis flared once I started taking niacin at 2.0g/day. Subsequently, I read several articles discouraging psoriatics from taking large doses of niacin.

Overall, this QS journey has been worth it. I learned more about my RLS, but more importantly, how to ask better questions that improved my health.

Link to slides on slideshare.net

 

Sage Synapse: A home for open medical data

Synapse

I just posted my 23andMe data to Sage Synapse, a collaborative space that allows scientists to share and analyze data together. After authenticating with Synapse, you can access the data here: https://synapse.sagebase.org/#Synapse:syn1444765

Here’s a short video introduction to the Synapse platform:

I will be adding more data to Synapse in the near future.

Trust, but verify

Working with 23andMe exome data: my CF allele and the need for verification

This informative blog post from Dr. Jung Choi at Georgia Tech discusses how to use free, publicly available bioinformatics tools to interpret new exome sequence data from 23andMe. The post includes a response from 23andMe in the comments.

Some of the bioinformatics tools that Dr. Choi uses are:

The post highlights the challenges of mapping gene-protein interactions when reporting results.

Jc_cftr_rpt