Monthly Archives: September 2014

I uploaded my whole genome sequence data to the cloud.

i-got-genomedI got genomed by Illumina.

In March 2014, my wife and I “got genomed” by enrolling in Illumina’s Understand Your Genome (UYG) program. UYG requires participants to order this whole genome sequence (WGS) test from their physicians due to uncertainties surrounding the delivery of genomic results in the U.S. Illumina is careful to point out that the service “…has not been cleared or approved by the U.S. Food and Drug Administration” and “you will not receive medical results, or a diagnosis, or a recommendation for treatment.” Our family physician signed the request in November 2013, and we received our results in February. Fortunately, no surprises, but the UYG program only covers these Mendelian disorders for now. We flew to San Diego a few weeks later to listen to talks by genomic researchers and discuss our results with genetic counselors. As part of this one-day seminar, we each received an iPad Mini that was pre-loaded with our results, as well as a portable hard drive that contained our raw sequence data.

illumina-wgs-hard-drive I received my WGS data on this encrypted hard drive (about 100GB).

After we arrived home, the next step was to find a public “home” for my sequence data (to share without restrictions). What I learned is that uploading your genome anywhere is a challenge, mostly because the dataset is so big.

I looked at DropboxEvernote and Figshare, but their storage models do not scale well for genomic data. I tried Sage Bionetworks, but the BAM file was too large to upload. I settled on Amazon Web Services (AWS) and created an anonymous FTP server using the Amazon Elastic Compute Cloud (EC2). (I spent a bunch of time working with Amazon’s Simple Storage Service (S3) using this article, but the 5GB file size limit of s3fs nixed that.)

About my whole genome sequence data

My genome data and results are now in the public domain, freely available to download under a Creative Commons (CC0) license. Uploading the data took two days over a 3Mbps connection, so you may want to read the clinical report and sample report instead.

  • BAM file checksum: 2529521235 (78.1GB uncompressed)
  • VCF file checksum: 4165261022 (2.4GB gzip compressed)

Questions about FTP? See this FAQ.

Now that I have my genome in the cloud, I’ll start playing with analysis tools like STORMSeq. Stay tuned!