Tuesday, 1 May 2012
How visualisation uncovers the big picture of ‘Big Data’
Dr Rupert Ogilvie outlines how visualising your data can be the key to successfully managing it
Dr Rupert Ogilvie, Optimisation Consultant at Intergence, an IT optimisation consultancy based in Cambridge, has outlined how data visualisation can help manage Big Data that is often too unwieldy to capture, manage, and process within a reasonable amount of time.
According to Gartner, Big Data is “…the volume, variety and velocity of structured and unstructured data pouring through networks into processors and storage devices, along with the conversion of such data into business advice for enterprises.”
A recent report from the Centre for Economics and Business Research (CEBR) 1, suggests that improved use of this Big Data could add £216 billion to the UK economy and create 58,000 jobs. Data visualisation can be a key tool in helping users explore and communicate data through graphic representations – enabling collaborating, inferring connections and drawing conclusions that benefit business’ bottom line.
Big Data is the convergence of three v’s: volume, variety and velocity and standard data management techniques can be appropriate for data which reflects just one of the v’s – for example, enormous datasets can be elegantly handled by well configured relational databases and variety and velocity can be handled by good process management and conventional BI practices. However, Big Data management has to juggle the convergence of all three.
‘The Cloud’ is often talked about in the same breath as Big Data. But what is it about the cloud that makes it so appealing to those looking to utilise their Big Data? [It is important to remember that the notion of a cloud does not necessarily mean a public cloud such as Amazon EC2 or a SaaS service like SalesForce - but can be applied to any internal shared resource platform (private cloud) or mixture of the two (hybrid)].
Scalability is a big plus point for Big Data and the cloud – if the real-time feeds providing data suddenly rocket in volume due to an external event, the cloud can provision and utilise resources at speed, minimising the risk of data loss. Although all the data can in theory be stored in the cloud, the organisation using it can choose how much it needs to pull back for presentation and further analysis.
This flexibility in resource usage can be a problem for organisations – either when planning their upgrade path or when budgeting for their next cloud bill. Visualisation can help these organisations look at what was used and when, as well as tracking the usage trends over time for the future proofing of their private clouds.
Data Visualisation is all about telling a story – and Big Data visualisation is no different. As should be clear by now, any effort at tackling Big Data can potentially involve billions of data points which need to be woven together into business stories.
Tackling the 3 Vs
Often what is valuable in the data isn’t just the hard numbers, but the trends – how they change over time. Visualisation is an invaluable tool in identifying trends within massive data sets, spotting anomalies as well as outliers and providing a common framework in which to view the data from the many different data sources.
Visualisation allows the user to cut into and move between different granularities of data. From a high level overview the user can drill down to those nuggets of data - which previously would have been discarded – to searched out answers and perform deeper analysis on the data. Subsetting and grouping can reduce the data density allowing rapid summaries of different sections of the data, helping the user find the right level of information. Once they have found the main data set they required, they can manipulate it and drill down, identifying the underlying raw data and its sources.
Do you need static and real-time views on your data? Visualisation can show the state of a network or process at a single point, as well as stream the data to you in real time. Using advanced visualisation techniques it is possible to replay/rewind data to hunt back looking for the root cause of problems and how trends shift over time. If a process has multiple inputs from different data sources it is possible to quickly see if the various inputs in a process are being updated with sufficient regularity. When planning new processes and thresholds, the ability to pull up views showing the velocity of the needed sources can provide a valuable insight into the amount of work required to scrub and clean the data.
Having a common view on the data removes the worries associated with the different structured and semi-structured raw data. Having this template allows an organisation to have confidence that as new data sources become available, the data that they provide will fit seamlessly with minimum change required. As well as assisting with this common framework visualisation will allow an organisation to overlay and combine data from different sources in different views for different levels of an organisation and departments. Having a common visualisation tool for the whole organisation provides a solid collaboration and communication platform helping improve user work flows.
Intergence is exhibiting at the IDC event ‘Evolution of the Datacentre Conference’ on Tuesday 22 May, 2012. For more information, click here.