The past two decades have seen substantial growth in the size of biological data sets that are being used to create vaccines and medicines. To give you a sense, these databases vary enormously in size and data complexity, from small that contain a few million entries to large that are petabytes in size. The need to effectively deal with this massive amount of data has led to the sophistication of the methods employed in computational molecular biology.
The latest impact on this evolving field is, of course, the current Covid-19 crisis. Everywhere in the world, institutions are throwing vast amounts of money and compute resources at problems in order to both understand and model behaviour, and to develop a vaccine against the virus. Here in the UK, Imperial College London and Oxford University, are already only a few steps away from starting human trials of the vaccine.
To get to this stage researchers will have had to collate and manipulate an array of real-time data, statistical indicators, and other types of data that are relevant to the coronavirus. In fact, we are living through a time where this process – now referred to as vaccinomics, or “the performance of large-scale, hypothesis-free, data-driven and holistic investigations” is predominately based on leveraging big data from across many sources, including even things like social behaviour data.
The sheer amount of information available on molecular structure and dynamics affords great opportunities for insight, but it also creates new issues with handling, processing and moving data sets that can reach petascale. The traditional big data challenges of enormous volume, velocity, and high variety of data all come into play.
Covid-19 research is experiencing the same data handling issues that arise in cancer research; “Cancer research in the era of big data presents a major challenge: we need to collect vast amounts of data to understand the complexities of individual cancer patients but, the more data we collect, the more we actually limit the number and diversity of researchers who can access and interpret the data because big data questions require big data infrastructure.”
Cloud computing resources and the databases that leverage them are helping to solve many of these problems by offering near-unlimited and scalable compute and storage capacity. But having the data collated isn’t enough. Making the data give us the insight we need to extrapolate is based on our ability, not just to gather, but also to interact with data. This opens up completely new avenues and requirements for distribution and sharing.
Part of that infrastructure challenge is how to provide insights through visualisation of the data. Visualisation becomes important because it is one of the best ways, and sometimes maybe even the only way, to discover crucial patterns when working at speed to solve the problems the teams in Oxford and London are dealing with.
However, existing approaches to the GPU infrastructure and the approaches used to deliver 3D visualization is currently reliant on old technology which directly impacts our ability to solve those challenges.
The traditional method of streaming visual information as video ties up valuable cloud-GPU resources that could otherwise be better used on the actual computational workloads. Existing streaming solutions used to deliver those complex real-time 3D visualization programmes across teams use the same approach that is used for streaming movies. Turning actions into a video of what you’ve done and sending them back to you. While standard streaming works brilliantly for traditional activities, it doesn’t work well for these new emerging medical technologies and case studies.
The specific cloud GPU compute needed has surprisingly limited availability and is not horizontally scalable and the end result is a process that is prohibitively expensive. Furthermore, in the current lockdown situation, you’re vying against others trying to gain access to the same GPU resource. As we’ve seen a surge of online gamers who will be trying to gain access to their entertainment via the same limited GPU resource.
We need to start approaching the problem by not re-using old technology but by fundamentally changing our thinking. One of the key benefits of the cloud is its elasticity, and with new composable architectures we can open up completely new ways of streaming and distributing interactive visual information.
It is cost and time effective to use commodity compute in the cloud to be fully utilised to work on the problems at hand. But with only limited numbers of costly cloud-based GPUs available for visualisation, our options are severely reduced. The amount of cloud GPU compute resources required, let alone the cost would render it unusable for large collaborative efforts in particular.
A much more powerful and elastic solution is streaming the visualisation without the use of cloud GPU resources, to a client device where its GPU renders the result locally at interactive speeds. This is where Polystream’s Command Streaming technology comes in.
Unlike traditional approaches, the opportunities for sharing now scales horizontally as a practically unlimited number of “visualisation windows” into the cloud compute workload can coexist. Utilising the scalable compute resources of the cloud also opens up for near-real-time interactivity of vast molecular dynamics simulations – such as CSynth and BioBlox – visualised directly in 3D on any laptop or PC.
There are significant benefits to be had from changing from using cloud GPU resources to local GPU resources that will have a direct impact on the world of visualisation. By using a cloud GPU we are stuck using traditional video streaming with its limitations on visual quality (i.e. compression artefacts). But by instead performing the rendering locally we are able to deliver visuals at perfect fidelity. As our requirements become more and more restrictive and the details required to reason about complex visual data grows, this difference in quality will become more and more important.
And large collaborative efforts is what we need right now.
This example demonstrating the need for the effective use of resources is just one of many. To take us to the next level in the use of 3D interactive content and applications we must start fundamentally changing the conversations and building new pathways. To discover more read how this is similar to the IoT challenge.