Data Visualization – A Look into the Biases, Questions & Types
“By visualizing information, we turn it into a landscape that you can explore with your eyes, a sort of information map.” – David McCandless
As times go by we have become proficient at collecting data, but have we become better at explaining what it all means. That’s still a question that we are yet to find an answer. So let’s divulge deeper, which do you consider most important in data visualization : the Data or the Audience ?? Doesn’t it resound with the chicken & egg question because, without any data, there’s nothing to visualize and thus no audience? But without an audience, why would we make a graph in the first place?
What is Data Visualization ?
Data Visualization can be defined as the technique of converting raw data into graphical representations. This makes understanding of complex relationships within the data much easier. It’s also a component of the broader discipline of data presentation architecture (DPA), which seeks to identify, locate, manipulate, format, and present data in the most efficient way.
Every day we produce about 2.5 quintillion bytes of data and the last two years have been the creator of 90% of all the data. With so much data, it’s become increasingly difficult to manage and make sense of it all. It would be impossible for any single person to wade through data line-by-line and see distinct patterns and make observations. This makes data visualization so much more significant.
Behind the scenes – Heuristics & Biases
The Human brain processes information in two ways according to Daniel Kahn
System I ( Unconscious Mind ) – Fast, automatic, and unconscious.
System II ( Conscious Mind ) – Slow, logical, infrequent, and calculating
Humans struggle to think in terms of statistics. The unconscious mind is more often influenced by heuristics & biases to handle the volume of stimuli it encounters daily. The following are the most common biases :
Anchoring – A tendency to be swayed by irrelevant numbers.
● Availability – The frequency at which events occur in our mind are not accurate reflections of the actual probabilities. Its an assumption to think that events remembered are more likely to occur
● Substitution – This refers to our tendency to substitute difficult questions with simpler ones.
● Optimism and loss aversion – Optimism and loss aversion give us the illusion of control because we tend to deal only with the possibility of known outcomes that have been observed.
● Framing – Framing refers to the context in which choices are presented.
● Sunk cost – This bias is often seen in the investing world when people continue to invest in an under-performing asset with poor prospects instead of getting out of the investment and into an asset with a more favorable outlook.
With Systems I and II ( Unconscious & Conscious Mind), along with biases, in mind, we should seek to ensure that data is presented in a way that correctly communicates to our System I thought process. This allows our System II thought process to analyze data accurately. Our unconscious System I has the ability to process about 11 million pieces of information/second vs. our conscious, which can process only 40 pieces of information/second.
Since our subconscious system processes more information through vision, data visualization is a perfect solution to communicate patterns and insights from data sets. When someone sees a visualization of data, it will take less than 500 milliseconds for the eye and the brain to process what is called preattentive visual properties of an image.
According to Colin Ware’s Information Visualization: Perception for Design, he defines four preattentive visual properties:
- Spatial positioning
These four components make up the composition of each data visualization and should be carefully considered for presentation.
Questions,Types & Situations:
Visualization is ultimately a great way to let your data speak. It is like a joke, if u have to explain them then they have failed. Depending on what data you have to explore or explain and what analytical questions you want it to answer, you can pick one approach or another.
With dozens of chart types available, each known to suit one particular purpose or another, it is extremely important to choose the right chart type. This can be done by asking the correct questions when choosing the type of visualization.
In fact, picking the wrong type is one of the most common — and
critical — visualization mistakes.
So let’s look at the questions that need to be asked to decide on the choice of visualization types, the corresponding tasks and also the situations that they can be used.
★ Do u want to compare values ?? (Ranking type)
○ Bar Charts (and column charts) — for a straightforward comparison of quantitative values by category.
■ Products on Revenue
○ Stacked charts — to add a look at the composition.
■ Sales Figures by product and by region
○ Radar charts — for a comparison of cyclic data.
■ air temperature by month
★ Do you want to explore the Composition and Part-to-Whole Relationships of something ?? (Composition
○ Pie charts (and donut charts) — for a basic look at the percentage composition of a value.
■ Web traffic by user age
○ Pyramid charts — to explore the composition of hierarchical data.
■ Employer salary
○ Treemap charts — to look into complex hierarchical data.
■ Export directions by country of destination
Funnel charts — to measure stages of processes and discover
■ Sales funnel
★ Do you want to track data over time?? ( Time Series
○ Line Charts (and spline charts) — for a basic view revealing trends, peaks, and so on.
■ Visualize sales or web traffic
○ Area charts — as another option, e.g. for cumulative data.
○ Stock charts — for big data sets such as financial and stock market data.
■ Stock price change
○ Candlestick charts (and OHLC charts) — to add a look at the distribution of values within each period of time.
■ stock price with a look into value fluctuation ranges within each of the numerous time periods.
Sparkline charts — for a quick representation of the big picture,
with no axes
■ overview of the sales performance within the last 12 months or
football season win/loss results;
★ Do you want to analyze Data Distribution ?? (Nominal Comparison type)
Distribution charts help you to understand outliers, the normal tendency and the range of information in your value.
○ Dot (Scatter) charts — to examine trends in distribution and correlation between two variables.
■ visualize system interruptions by waiting time and by
duration, or results of an experiment
○ Bubble charts — to consider three dimensions of data.
■ training data by sportsman, power, and pulse
○ Box-and-whisker charts — for a look at main distribution ranges and median values.
■ destinations by flight delay duration
○ Error charts — to inspect error distribution.
■ variability of product sales
○ Heat map charts — for a colored matrix-based view of multiple subcategories.
■ risk matrix
○ Range charts — to find a range between the maximum and minimum
■ air temperature or processor downtime;
○ Polar charts — for multivariate data from a spatial perspective.
■ radio signal distribution
★Do you want to examine Project Data ?? (Project
○ Gantt chart — to keep an eye on activities on a project schedule.
■ to visualize project activities on a schedule or planned vs. actual
○ Resource chart — to review resource occupancy.
■ server status
★ Do you want to make sense of geographical data??
(Geographical Composition Type)
○ Choropleth maps — to identify differences across geographical areas.
■ visualize systems of government across the globe
○ Dot maps — to understand geographical distribution trends.
■ points of sale or airplane crash locations
○ Bubble maps — to add a size variable into a visual.
■ earthquakes by magnitude
○ Connector maps — for a look at geographical connections.
■ airline routes
○ Flow maps — to explore how objects move between locations when the direction is important.
■ export directions
★ Data Visualization & AI : Is it the way for the future ?
Data Visualization can be used to build complex AI systems. It helps the organization
make better business decisions likewise it can also help users analyze AI model
results in the ways explained below. These tools are critical for increasing trust in
● Kaggle Notebooks which shows data scientist’s working progress is
full of data visualizations.
● Decision Trees & Forests can be visualized in a simplified way which
helps users understand how the model works
● The predictions made by AI can be explained by data Visualization.
● AI models need to be audited at various levels which makes the
visualization of models a great help.
Over time, there will be an increasing volume of data visualizations supported by AI, such as with AI systems that can draw realistic images based on text descriptions and other data.
Thus Data Visualization can be powerful, versatile and informative. As we move into a future of interactivity, higher production value and new methods of exploring data sets, data visualization will only become more important over time. Data visualization can lead to positive change throughout an organization when executed correctly. However, its success is dependent on both the creators and users being careful, thorough, and accurate in their analyses. Thus in short Data Visualization is like a story that the creator = author interprets and the user = reader interprets.