In 2014, I uploaded my WGS data to the cloud and made it publicly available. In a previous post, I explained why I moved my WGS data from DNAnexus to Amazon. In this post, I explain the final step: attaching the S3 bucket to a web server. The goal was to replace the ftp server with a web server and make it easier to download my whole genome sequence data.
TL;DR: My genome is now available at http://genome.startcodon.org
I launched my first cloud server literally while in the clouds in May 2014. Cloud computing has changed so much, it’s unbelievable. Back then, I had to patch the Linux kernel by hand so that the ftp server would work on AWS. Today, uploading your genome using Amazon’s command line interface (CLI) to an AWS S3 storage bucket is relatively easy. Understandably, Amazon makes it challenging (but doable) to make your storage publicly available. I used the Apache Web Server and s3fs to share this information.
My first cloud server
Step 1. Install Apache
Depending on your flavor of Linux, your commands may vary. I am using Ubuntu 18.04 LTS running on a t2.micro EC2 server. Here are the commands I used to install the Apache HTTP Server.
Step 2. Install s3fs
s3fs allows allows you to mount an S3 bucket via FUSE. s3fs preserves the native object format for files, allowing use of other tools like AWS CLI. Again, your commands may vary depending on your flavor of Linux. Here are the commands I used to install s3fs.
About my whole genome sequence data
My genome data and results are now in the public domain, freely available to download under a Creative Commons (CC0) license with a HIPAA waiver. I have not converted my BAM files to CRAM yet, so you may want to read the clinical report and sample report to save bandwidth.
- BAM file checksum: 2529521235 (78.1GB uncompressed)
- VCF file checksum: 4165261022 (2.4GB gzip compressed)
Note: I decommissioned the ftp server after 6 years of faithful service.