With the recent announcement of the University of Michigan Research Computing Package (UMRCP), the BRCF Advanced Genomics Core (AGC) has been working with Information and Technology Services (ITS) to update our data delivery strategy. The Michigan Medicine version of the RCP has been one of the AGC’s data delivery mechanisms for over a year now. We have identified ways to improve data storage and access for the AGC research community as we transition to the UMRCP as our primary deliver method. Specifically, we will deliver our researchers’ AGC generated data directly to their RCP Data Den archive storage space while also delivering a subset of smaller files via U-M Dropbox for faster, easier access on your computer.
Data Delivered and Automatically Archived (for Reproducible Research)
We at the BRCF Advanced Genomics Core know that lots of time, effort, and money was spent generating the samples you sent to us for sequencing, so losing the resulting data after it was delivered to you and then not being able to publish would be disastrous. With our first data delivery improvement, we hope to further reduce the risk of researchers experiencing catastrophic data loss by archiving our researchers’ data at delivery and thereby promoting reproducible research.
Over the last year, an option for AGC data delivery has been to “push” data directly to some of our Michigan Medicine researchers’ 10 TB RCP Turbo Research Storage space. “Turbo” is a high-capacity, reliable, secure, and fast storage solution for the short-term storage and analysis of larger (> 1 megabyte) files. While this has been very successful, the RCP Turbo space still requires researchers to copy their data into a larger, archival storage space for security and to open storage space for new data and analyses. Additionally, researchers who use external devices as back-ups still risk losing all their data when the external device fails.
By delivering AGC data directly to our researchers’ 100TB Data Den RCP storage allocation, the data will already be archived and replicated, so that it can always be recovered. However, Data Den is an archive storage service and not meant to replace active storage services, like Turbo. It is best used for data that is set aside and is not being used at all. Therefore, researchers will need to copy their data from Data Den to an active storage allocation for analysis, such as “scratch” or Turbo, but the original files will remain on Data Den, so you can always reproduce your analyses using the original files as new analytical methods, software, data, or references become available.
U-M Dropbox for “Personal Computer” sized files
While Data Den is a great storage solution for larger files that researchers need long term for publishing, it is not ideal for accessing summary and report files used to determine data quality and for secondary analyses. Since these smaller, highly informative files can be viewed on an average PC, we will also deliver them to you via U-M Dropbox for immediate access, in addition to archiving them in Data Den.
With sequencing data deliveries averaging 250 GB for all files, we cannot deliver all the run data using Dropbox. But we can use Dropbox for copies of web summary files, loupe browser files, and other small files, depending on the service provided. These “PC-sized” files will be made available using the Dropbox Transfer feature, allowing us to send them to the researcher and PI who submitted the service request in MiCores. To reiterate, you also get these same files in the complete delivery package on Data Den, but Dropbox makes it convenient for you to look at these small files without figuring out how to Globus transfer them from your Data Den area to your laptop.
We will be rolling out this new “laptop consumable files delivery using Dropbox” over the next several months, going service by service. We’ll be starting with our Single Cell service lines and expanding from there.
At the BRCF Advanced Genomics Core, we anticipate using the UM-RCP will standardize our data delivery process, increase research reproducibility, and make collaborating easier for researchers across the University of Michigan. We will still use Globus to deliver your data to Data Den, so our researchers will need to follow instructions from the AGC and ARC to set up a space on Globus and grant the AGC “write” permission. However, delivering AGC data directly to Data Den also makes setup easier for our researchers, in terms of creating their push-to target.
If you’re already using Turbo as your “push-to target”, we recommend working with ARC to create a new directory and Globus share in Data Den. Then send email@example.com an email with the new Globus name or id of the share, and we’ll update our database. We’ll post help guides on our website (michmed.org/agc-data), or if you get really lost, we can jump on a Zoom to help get you set up.
Questions, comments, or concerns?
Contact the AGC Data Team at firstname.lastname@example.org.