By Ian Lumb
Bright Computing offers a broadly applicable cluster management solution for infrastructure both on the ground and in the cloud, as described in this article. (Incidentally, Bright Computing received a “Best of Show” award at Bio-IT this year.)
The Intersection of the Life Sciences and IT Infrastructure
That DNA is a molecule capable of codifying genetic instructions has been known for about six decades (e.g., ). For almost this entire time, researchers in the Life Sciences have been the victims of their own informatics-driven success. Resoundingly demonstrated on the global stage during the throes of the Human Genome Project (HGP), self-induced success was arguably inevitable with the structural decomposition of the molecule by Watson and Crick . Initially on its own innovation track, with progress charted by Moore’s Law  and the like, the HGP-mediated intersection between the Life Sciences and IT infrastructure continues to define an advancing front of mutual challenge and opportunity.
True in the past and, regrettably, still the case today, internal silos for IT infrastructure mimic the departmental and divisional lines within a Life Sciences organization. Because the operational efficiencies and effectiveness of a pan-organization IT infrastructure were not a priority, it is typical that underutilization of compute resources is as much of a concern as is overutilization. Even in cases where a pan-organizational IT infrastructure has been made a priority, the reality is that very few organizations are able to leverage the potential efficiencies and effectiveness in practice.
Ironic, for a discipline that is otherwise rife with progress, the opportunity cost to organizations within the Life Sciences arena can be significant. For example, anecdotal evidence suggests that innovative Life Science projects can be rendered non-starters – i.e., they may be postponed or even cancelled as early on as the proof-of-concept phase. Whereas short-term impacts might be confined to a single, innovative project, the cumulative effect of repeating this innovation-stifling cycle bespeaks systemic organizational issues. In the fullness of time, such systemic issues can erode a scientist’s morale, ultimately undercut an organization’s competitive edge, thus rendering it intellectually dormant.
Up to this point, advocating on behalf of the Life Sciences researcher within an organization has been the emphasis here. It turns out, however, that the scientists themselves are exacerbating the situation significantly by assuming the IT role of systems administrator – in addition to their science-oriented role. Unlikely to be `professionalized’ in systems administration as well as science, otherwise well-intended scientists can significantly multiply the inefficiencies and ineffectiveness of operational IT infrastructure in addition to the opportunity cost of time spent not doing science. Years ago, the HGP established the notion of scale for IT infrastructures within the Life Sciences   . As those involved in both the private and public efforts to sequence the Human Genome bear testament, scale is a consideration in and of itself. Expecting scientists to grapple with the scale of IT infrastructure common to the datacenters of today’s Life Sciences organizations is simply impractical.
IT Infrastructure On-The-Ground
Leaning once again on anecdotal evidence, forward-thinking organizations have decided it is time to recontextualize their operational mindsets. Employing creative strategies that seek to reframe organizational weaknesses as strengths, bold questions are being posed:
- Is there a benefit to making systems administration easy?
- Is there a benefit to having unlimited IT resources? Unlimited and readily usable IT resources?
With researchers right-focused on science, and (effectively) limitless, usable IT resources at their disposal, there exists the potential for a sea change to the practice of business. With scientists and IT in lockstep, reduced time-to-discovery becomes the new normal – i.e., a core competence that significantly improves the fortunes and stature of a Life Sciences organization.
As noted previously, coincident with the undertaking of the HGP, organizations in the Life Sciences rapidly developed significant expertise in the management of their on-premise IT resources. For all of the reasons noted previously, from science-focused scientists to the daunting challenge of scale, organizations in the Life Sciences have eased the operational burden of systems administration via the introduction of management software (e.g., Bright Cluster Manager, ). Ideally, such management software makes turnkey the matter of provisioning, monitoring and managing the large-scale IT infrastructures common to the datacenters of organizations in the Life Sciences. By delivering a single-pane-of-glass view into the IT infrastructure (see, e.g., Figure 1), organizations in the Life Sciences continue to capitalize on systems administration as a core competence.
Figure 1: The single-pane-of-glass view provided by management solutions for IT infrastructure. This example is taken from Bright Cluster Manager .
Extending IT Infrastructure Into The Cloud
Those with domain expertise in the Life Sciences, typically acknowledge the following requirements for cloud computing:
- Organizations in the Life Sciences seek to easily incorporate cloud capabilities into their existing IT infrastructure
- Organizations in the Life Sciences seek to control and/or manage the computational workloads that are ultimately executed in the cloud
- Organizations in the Life Sciences seek to provision, monitor and manage a scalable compute cluster in the cloud
- Organizations in the Life Sciences seek to easily manage clusters regardless of location – on-premise in their corporate datacenters or off-premise in the cloud
Thus organizational needs demand solutions that can extend IT infrastructure from corporate datacenters into the cloud. Forecasted as the fastest growing segment , these so-called hybrid clouds [ continue to spark interest and uptake in the Life Sciences. In one example, Bright Cluster Manager enables the extension of on-premise IT resources through the creation of instances in Amazon EC2 . Solutions like this allow use cases common to the Life Sciences  to be addressed; for example:
- Complementing on-site resources at large pharmaceutical companies with those available via the cloud. In this scenario, computational workloads that exceed site capacity are diverted to the cloud.
- Acquiring’ for a biotech start-up an off-site datacenter via the cloud. In this scenario, computational workloads execute in the cloud almost exclusively
Inherent in the ability to address such use cases are technical underpinnings that might include:
- Locality-based scheduling – Business, compliance, pricing and/or regulatory logic that determines where (i.e., on-site versus in-the-cloud) workloads should be executed, plus integrated workload managers that ensure this is indeed the case at run time. Data-aware scheduling – Provisioning of compute resources in the cloud that is cognisant of data locality (e.g., ).
- Persistence in the cloud – An organization’s `own’ resources that exist to deliver services such as in-the-cloud storage, in-the-cloud provisioning, etc., on an ongoing basis.
- Monitoring and managing the entire IT infrastructure – Because cloud instances are merely extensions of the IT resources available locally, it should possible to monitor and manage the cumulative infrastructure on an operational basis.
With solutions available today, these and other Life Science use cases are making good on the promise of being able to easily incorporate cloud-based IT infrastructures into resources available on site.
From all perspectives, it is today possible to seamlessly incorporate IT infrastructure from the cloud into on-premise resources. From the end-users perspective, cloud-based resources (e.g., compute servers) are provisioned, monitored and managed as bona fide resources. From the sysadmin’s perspective, the administrative aspects are greatly simplified – e.g., cloud-access specifics are entered once only, and provisioning of cloud nodes can be based on standard images.
In closing, it takes scientifically smart people to engage in the Life Sciences. If those same people make use of solutions that enable the extension of on-premise IT infrastructure into the cloud, then we certainly do have the case of Cloud Computing for Smart People.
For Further Reading. B. Maddox, “DNA’s double helix: 60 years since life’s deep molecular secret was discovered”, http://www.guardian.co.uk/science/2013/feb/22/watson-crick-dna-60th-anniversary-double-helix. . J.D. Watson, J.D., F.H. Crick, “A structure for deoxyribose nucleic acids,” Nature, (1953); 171(4356): 737–38. . Intel Corporation, “Moore’s Law Inspires Intel Innovation”, http://www.intel.com/content/www/us/en/silicon-innovations/moores-law-technology.html. . E. Uberbacher, Computing the Genome, Oak Ridge National Laboratory Review, (1997); 30(3 & 4), http://www.ornl.gov/info/ornlreview/v30n3-4/genome.htm. . Wellcome Trust Sanger Institute, “The Human Genome Project”, http://www.sanger.ac.uk/about/history/hgp/. . Celera Genomics, “Our History”, https://www.celera.com/celera/history. . Bright Computing, Inc., “Bright Cluster Manager – Advanced Linux Cluster Management Software”, http://www.brightcomputing.com/Bright-Cluster-Manager.php. . North Bridge, ” Future of Cloud Computing 2012″, http://northbridge.com/2012-cloud-computing-survey. . B.P. Rimal, E. Choi, I. Lumb, A Taxonomy, Survey, and Issues of Cloud Computing Ecosystems, Cloud Computing: Principles, Systems and Applications, N. Antonopoulos & L. Gillam (eds.), (2012) 21-46, http://dx.doi.org/10.1007/978-1-84996-241-4_2. . Gartner Inc., “Cloud Computing | Technology Research”, http://www.gartner.com/technology/topics/cloud-computing.jsp. . Bright Computing, Inc., “Bright Cluster Manager – Cloud Bursting”, http://www.brightcomputing.com/Linux-Cluster-Cloud-Bursting.php. . I. Lumb, “Extending into the Cloud: Two Use Cases from Bio-IT World”, http://info.brightcomputing.com/On-the-Bright-Side/bid/177402/Extending-into-the-Cloud-Two-Use-Cases-from-Bio-IT-World. . R. Stober, “How to submit an SGE job to the cloud using the Bright CMSUB command”, http://info.brightcomputing.com/Cluster-Management-Tech-Tips/bid/175312/How-to-submit-an-SGE-job-to-the-cloud-using-the-Bright-CMSUB-command.
Ian Lumb is Solutions Architect with Bright Computing, Inc.
Ian holds a B.Sc. from Montreal’s McGill University as well as an M.Sc. in Physics & Astronomy from York University.
As Solutions Architect with Bright Computing, Inc., Ian is involved in crafting solutions to improve the efficiency and effectiveness organizations as they grapple with the challenges and opportunities of managing their on-premise resources, as well as their extended resources in the cloud.