Posts Categorized:

Enabling the Life Sciences: Cloud Computing for Smart People

By Ian Lumb

Bright Computing offers a broadly applicable cluster management solution for infrastructure both on the ground and in the cloud, as described in this article. (Incidentally, Bright Computing received a “Best of Show” award at Bio-IT this year.)


The Intersection of the Life Sciences and IT Infrastructure

That DNA is a molecule capable of codifying genetic instructions has been known for about six decades (e.g., [1]). For almost this entire time, researchers in the Life Sciences have been the victims of their own informatics-driven success. Resoundingly demonstrated on the global stage during the throes of the Human Genome Project (HGP), self-induced success was arguably inevitable with the structural decomposition of the molecule by Watson and Crick [2]. Initially on its own innovation track, with progress charted by Moore’s Law [3] and the like, the HGP-mediated intersection between the Life Sciences and IT infrastructure continues to define an advancing front of mutual challenge and opportunity.

True in the past and, regrettably, still the case today, internal silos for IT infrastructure mimic the departmental and divisional lines within a Life Sciences organization. Because the operational efficiencies and effectiveness of a pan-organization IT infrastructure were not a priority, it is typical that underutilization of compute resources is as much of a concern as is overutilization. Even in cases where a pan-organizational IT infrastructure has been made a priority, the reality is that very few organizations are able to leverage the potential efficiencies and effectiveness in practice.

Ironic, for a discipline that is otherwise rife with progress, the opportunity cost to organizations within the Life Sciences arena can be significant. For example, anecdotal evidence suggests that innovative Life Science projects can be rendered non-starters – i.e., they may be postponed or even cancelled as early on as the proof-of-concept phase. Whereas short-term impacts might be confined to a single, innovative project, the cumulative effect of repeating this innovation-stifling cycle bespeaks systemic organizational issues. In the fullness of time, such systemic issues can erode a scientist’s morale, ultimately undercut an organization’s competitive edge, thus rendering it intellectually dormant.

Up to this point, advocating on behalf of the Life Sciences researcher within an organization has been the emphasis here. It turns out, however, that the scientists themselves are exacerbating the situation significantly by assuming the IT role of systems administrator – in addition to their science-oriented role. Unlikely to be `professionalized’ in systems administration as well as science, otherwise well-intended scientists can significantly multiply the inefficiencies and ineffectiveness of operational IT infrastructure in addition to the opportunity cost of time spent not doing science. Years ago, the HGP established the notion of scale for IT infrastructures within the Life Sciences [4] [5] [6]. As those involved in both the private and public efforts to sequence the Human Genome bear testament, scale is a consideration in and of itself. Expecting scientists to grapple with the scale of IT infrastructure common to the datacenters of today’s Life Sciences organizations is simply impractical.

IT Infrastructure On-The-Ground

Leaning once again on anecdotal evidence, forward-thinking organizations have decided it is time to recontextualize their operational mindsets. Employing creative strategies that seek to reframe organizational weaknesses as strengths, bold questions are being posed:

  • Is there a benefit to making systems administration easy?
  • Is there a benefit to having unlimited IT resources? Unlimited and readily usable IT resources?

With researchers right-focused on science, and (effectively) limitless, usable IT resources at their disposal, there exists the potential for a sea change to the practice of business. With scientists and IT in lockstep, reduced time-to-discovery becomes the new normal – i.e., a core competence that significantly improves the fortunes and stature of a Life Sciences organization.

As noted previously, coincident with the undertaking of the HGP, organizations in the Life Sciences rapidly developed significant expertise in the management of their on-premise IT resources. For all of the reasons noted previously, from science-focused scientists to the daunting challenge of scale, organizations in the Life Sciences have eased the operational burden of systems administration via the introduction of management software (e.g., Bright Cluster Manager, [7]). Ideally, such management software makes turnkey the matter of provisioning, monitoring and managing the large-scale IT infrastructures common to the datacenters of organizations in the Life Sciences. By delivering a single-pane-of-glass view into the IT infrastructure (see, e.g., Figure 1), organizations in the Life Sciences continue to capitalize on systems administration as a core competence.

Figure 1: TEXT
Figure 1: The single-pane-of-glass view provided by management solutions for IT infrastructure. This example is taken from Bright Cluster Manager [7].

Extending IT Infrastructure Into The Cloud

Those with domain expertise in the Life Sciences, typically acknowledge the following requirements for cloud computing:

  • Organizations in the Life Sciences seek to easily incorporate cloud capabilities into their existing IT infrastructure
  • Organizations in the Life Sciences seek to control and/or manage the computational workloads that are ultimately executed in the cloud
  • Organizations in the Life Sciences seek to provision, monitor and manage a scalable compute cluster in the cloud
  • Organizations in the Life Sciences seek to easily manage clusters regardless of location – on-premise in their corporate datacenters or off-premise in the cloud

Thus organizational needs demand solutions that can extend IT infrastructure from corporate datacenters into the cloud. Forecasted as the fastest growing segment [8], these so-called hybrid clouds [9][[10] continue to spark interest and uptake in the Life Sciences. In one example, Bright Cluster Manager enables the extension of on-premise IT resources through the creation of instances in Amazon EC2 [11]. Solutions like this allow use cases common to the Life Sciences [12] to be addressed; for example:

  • Complementing on-site resources at large pharmaceutical companies with those available via the cloud. In this scenario, computational workloads that exceed site capacity are diverted to the cloud.
  • Acquiring’ for a biotech start-up an off-site datacenter via the cloud. In this scenario, computational workloads execute in the cloud almost exclusively
  • .

Inherent in the ability to address such use cases are technical underpinnings that might include:

  • Locality-based scheduling – Business, compliance, pricing and/or regulatory logic that determines where (i.e., on-site versus in-the-cloud) workloads should be executed, plus integrated workload managers that ensure this is indeed the case at run time. Data-aware scheduling – Provisioning of compute resources in the cloud that is cognisant of data locality (e.g., [11][13]).
  • Persistence in the cloud – An organization’s `own’ resources that exist to deliver services such as in-the-cloud storage, in-the-cloud provisioning, etc., on an ongoing basis.
  • Monitoring and managing the entire IT infrastructure – Because cloud instances are merely extensions of the IT resources available locally, it should possible to monitor and manage the cumulative infrastructure on an operational basis.

With solutions available today, these and other Life Science use cases are making good on the promise of being able to easily incorporate cloud-based IT infrastructures into resources available on site.

Figure 2: Resource locality versus organization size in two use cases
Figure 2:Resource locality versus organization size in two use cases [12]. In both scenarios, on-site IT infrastructure is extended into an Amazon EC2 cloud instance via the Bright Cluster Manager.

Click to enlarge

From all perspectives, it is today possible to seamlessly incorporate IT infrastructure from the cloud into on-premise resources. From the end-users perspective, cloud-based resources (e.g., compute servers) are provisioned, monitored and managed as bona fide resources. From the sysadmin’s perspective, the administrative aspects are greatly simplified – e.g., cloud-access specifics are entered once only, and provisioning of cloud nodes can be based on standard images.

In closing, it takes scientifically smart people to engage in the Life Sciences. If those same people make use of solutions that enable the extension of on-premise IT infrastructure into the cloud, then we certainly do have the case of Cloud Computing for Smart People.

For Further Reading

[1]. B. Maddox, “DNA’s double helix: 60 years since life’s deep molecular secret was discovered”, http://www.guardian.co.uk/science/2013/feb/22/watson-crick-dna-60th-anniversary-double-helix.

[2]. J.D. Watson, J.D., F.H. Crick, “A structure for deoxyribose nucleic acids,” Nature, (1953); 171(4356): 737–38.

[3]. Intel Corporation, “Moore’s Law Inspires Intel Innovation”, http://www.intel.com/content/www/us/en/silicon-innovations/moores-law-technology.html.

[4]. E. Uberbacher, Computing the Genome, Oak Ridge National Laboratory Review, (1997); 30(3 & 4), http://www.ornl.gov/info/ornlreview/v30n3-4/genome.htm.

[5]. Wellcome Trust Sanger Institute, “The Human Genome Project”, http://www.sanger.ac.uk/about/history/hgp/.

[6]. Celera Genomics, “Our History”, https://www.celera.com/celera/history.

[7]. Bright Computing, Inc., “Bright Cluster Manager – Advanced Linux Cluster Management Software”, http://www.brightcomputing.com/Bright-Cluster-Manager.php.

[8]. North Bridge, ” Future of Cloud Computing 2012″, http://northbridge.com/2012-cloud-computing-survey.

[9]. B.P. Rimal, E. Choi, I. Lumb, A Taxonomy, Survey, and Issues of Cloud Computing Ecosystems, Cloud Computing: Principles, Systems and Applications, N. Antonopoulos & L. Gillam (eds.), (2012) 21-46, http://dx.doi.org/10.1007/978-1-84996-241-4_2.

[10]. Gartner Inc., “Cloud Computing | Technology Research”, http://www.gartner.com/technology/topics/cloud-computing.jsp.

[11]. Bright Computing, Inc., “Bright Cluster Manager – Cloud Bursting”, http://www.brightcomputing.com/Linux-Cluster-Cloud-Bursting.php.

[12]. I. Lumb, “Extending into the Cloud: Two Use Cases from Bio-IT World”, http://info.brightcomputing.com/On-the-Bright-Side/bid/177402/Extending-into-the-Cloud-Two-Use-Cases-from-Bio-IT-World.

[13]. R. Stober, “How to submit an SGE job to the cloud using the Bright CMSUB command”, http://info.brightcomputing.com/Cluster-Management-Tech-Tips/bid/175312/How-to-submit-an-SGE-job-to-the-cloud-using-the-Bright-CMSUB-command.

Contributor

Ian LumbIan Lumb is Solutions Architect with Bright Computing, Inc.

Ian holds a B.Sc. from Montreal’s McGill University as well as an M.Sc. in Physics & Astronomy from York University.

As Solutions Architect with Bright Computing, Inc., Ian is involved in crafting solutions to improve the efficiency and effectiveness organizations as they grapple with the challenges and opportunities of managing their on-premise resources, as well as their extended resources in the cloud.

Read more…

About the Newsletter

The IEEE Life Sciences Newsletter is a new initiative to bring forth interesting articles and informative interviews within the exciting field of life sciences every month. Please subscribe to the Newsletter to receive notification each month when new articles are published.

April 2013 Contributors

Nitish V. ThakorNitish V. Thakor is a Professor of Biomedical Engineering at Johns Hopkins University, Baltimore, USA, as well as the Director of the newly formed institute for neurotechnology, SiNAPSE, at the National University of Singapore. Read more

Ian LumbFor about eight years, Ian Lumb had the good fortune to engage with customers and partners at the forefront of HPC plus Grid and Cloud computing. For all but one of those eight years, Ian was employed by Platform Computing Inc... Read more

Dan HousmanDan Housman is a software veteran with a demonstrated track record of providing valuable and innovative decision support systems to large, complex organizations... Read more

Phil Heyneker is Director of Solutions Marketing at Okta, with an extensive background in bringing SaaS solutions to market, and in helping enterprises adopt new technologies and products. Read more