Today, we are living through an explosion in the amount and quality of all kinds of available information. Society is facing a deluge of data and there isn’t even a slight indication that this information glut will soon be halted.
The below statistics on data creation evidently highlight the fact that no one is going to stop creating information.
As of 2013, experts believed that 90% of the world’s data was generated from 2011 to 2012.
In 2018, more than 2.5 quintillion bytes of data were created every day.
At the beginning of 2020, the digital universe was estimated to consist of 44 zettabytes of data.
By 2025, approximately 463 exabytes would be created every 24 hours worldwide.
As of June 2019, there were more than 4.5 billion people online.
80% of digital content is unavailable in nine out of every ten languages.
In 2019, Google processed 3.7 million queries, Facebook saw one million logins, and YouTube recorded 4.5 million videos viewed every 60 seconds.
Netflix’s content volume in 2019 outnumbered that of the US TV industry in 2005.
By 2025, there would be 75 billion Internet-of-Things (IoT) devices in the world.
By 2030, nine in every ten people aged six and above would be digitally active.
Source: SeedScientific
In the US, private companies now collect and sell as many as 75,000 individual data points about the average American consumer. And that number is miniscule compared with future expectations.
Why so much interest in customer data? Because the right data can tell business decision makers which customers to avoid and which they can exploit based on the company’s strategy and its stated objectives.
While it’s important to appreciate the benefits of data, we also need to acknowledge and respond to its drawbacks.
Just as people often confuse credit cards for currency, information alone is futile. The process of creating intelligence is not simply a question of access to information.
Rather, it is about asking the right questions, and collecting the right data.
You need a lot of pixels in a photo in order to be able to zoom in with clarity on one portion of it. Similarly, you need a lot of observations in a dataset in order to be able to zoom in with clarity on one small subset of that data.
Source: Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are by Seth Stephens-Davidowitz.
Your business performance will not improve through Big Data alone. You need Rich Data. Deep Data. Even if it comes in the form of Small Data.
The biggest reason investments in data analytics fail to payoff, though, is that most companies are choking on data. They have lots of terabytes but few critical insights.
Instead of being adequately informed, they are exceedingly informed because they are taking much of what they already have for granted.
We are exceptional at storing information but fall short when it comes to retrieving the same information. As a result, we get overloaded.
Some important questions to consider before investing in new data:
- Is more information necessarily good?
- Does it really improve the decision-making process?
- Can you extract value from the information you already have?
- Are you overwhelmed but underserved by today’s information sources?
- How much of the data under your possession is useful, and how much of it gets in the way? That is, what is your data’s Signal-to-Noise ratio?
What are the ensuing problems of information overload?
- Indecisiveness due to paralysis by analysis. The endless analysis is so overwhelming making it difficult to know how and when to decide.
- Endless argumentation. In the era of limitless data, there is always an opportunity to crunch some numbers, spin them a bit and prove the opposite.
- A total reliance on evidence-based decision making can undermine logical approaches to deliberation and problem solving. The solution is not always Big Data. The judgement of humans and small data is often necessary to help. We cannot just throw data at any question. Data, whether Big or small, and humans compliment each other.
The growth in the amount of data without the ability to process it is not useful in and of itself. Once data has been analyzed, it needs to be summarized in an easy-to-understand way and presented visually to enable decision makers apply their own expertise and make their own judgements.
Although Big Data offers us an opportunity to analyze new kinds of information and identify trends that have long existed but we hadn’t necessarily been aware of, there are a few things that it does not do well.
- Data analysis is quite bad at narrative and emergent thinking.
- It fails to analyze the social aspects of interaction or to recognize context. Human beings are undoubtedly good at telling stories that incorporate multiple causes.
- Big Data also fails to identify which correlations are more or less likely to be false. The larger and more expansive the datasets, the more correlations there are, both false and true.
Correlation versus Causality is a huge issue in data analysis. The mere fact that two random variables are correlated does not automatically imply causation.
To test for causality, not merely correlations, randomized, controlled experiments (also called A/B Testing) are necessary.
- People are divided into two groups.
- The treatment group is shown and asked to do or take something.
- For the control group, the status quo is maintained.
- Each group is monitored how it responds.
- The difference in the outcomes between the two groups is the causation.
Undertaking controlled experiments help us learn interventions that work and those that do not, and ultimately improve our decision making.
As you can see, the power of data lies in what business teams do with it. Clearly define enterprise data-use cases, aligning them with business strategy.
You don’t always need plenty data to create key insights that inform decision making. You need the right data blended with other insights and observations gathered offline.
Information is now plentiful and inexpensive to produce, manipulate, and disseminate. Almost anyone can add information. The big question is how to reduce it and base critical decisions on only a tiny sampling of all available data.