Timer: 00:00

Organizing Data

Visual Analytics and Distribution Analysis

Dr. Samir Orujov

ADA University - School of Business

Fall 2025

Chapter Overview

2.1 Variables and Data

Classification of data types and their importance in statistical analysis

2.2 Organizing Qualitative Data

Frequency distributions, pie charts, and bar charts

2.3 Organizing Quantitative Data

Histograms, dotplots, and stem-and-leaf diagrams

2.4 Distribution Shapes

Modality, symmetry, and skewness analysis

2.5 Misleading Graphs

Identifying and avoiding deceptive visualizations

Learning Objectives

📊

Data Classification

Distinguish between qualitative/quantitative and discrete/continuous data types

📈

Visualization Techniques

Create and interpret various graphical displays of data

🔍

Distribution Analysis

Identify distribution shapes, modality, and symmetry patterns

⚠️

Critical Evaluation

Recognize and avoid misleading graphical representations

Case Study: World's Richest People

Forbes 2013 Billionaires List

Each year, Forbes magazine publishes a comprehensive list of the world's richest people. In 2013, over 50 reporters worked to compile the 27th anniversary World's Billionaires rankings.

Methodology

  • Individual asset valuation
  • Debt accounting
  • Public and private company stakes
  • Real estate, yachts, art, and cash
  • Immediate family wealth inclusion

Top 5 Richest (March 2013)

1. Carlos Slim Helu - $73B (Mexico)
2. Bill Gates - $67B (United States)
3. Amancio Ortega - $57B (Spain)
4. Warren Buffett - $53.5B (United States)
5. Larry Ellison - $43B (United States)

Variables and Data Types

Variable

A characteristic that varies from one person or thing to another

Variables

Qualitative

Non-numerical values

Examples:
  • Gender (Male, Female)
  • Blood type (A, B, AB, O)
  • Political affiliation

Quantitative

Numerical values

Discrete

Countable values

Examples:
  • Number of siblings
  • Cars owned
  • Finishing place
Continuous

Measurable values

Examples:
  • Height
  • Weight
  • Time

Interactive Classification Exercise

Classify the following variables:

Marathon finishing place (1st, 2nd, 3rd...)

Marathon finishing time (hours and minutes)

Number of TVs per household

Frequency Distributions

Frequency Distribution

A table that lists distinct values and their frequencies (counts)

Frequency: The number of times a particular value occurs

Example: Political Party Affiliations

Party Frequency Relative Frequency Percentage
Democratic 13 0.325 32.5%
Republican 18 0.450 45.0%
Other 9 0.225 22.5%
Total 40 1.000 100.0%

Relative Frequency Formula

\[ \text{Relative Frequency} = \frac{\text{Frequency}}{\text{Total Number of Observations}} \]

Pie Charts

Pie Chart

A circular graph divided into wedge-shaped pieces proportional to the relative frequencies

Construction Steps

  1. Obtain relative-frequency distribution
  2. Calculate angles: \( \text{Angle} = \text{Relative Frequency} \times 360° \)
  3. Draw wedges proportional to angles
  4. Label slices with values and percentages

Interactive Pie Chart

Bar Charts

Bar Chart

A graph that displays categories on horizontal axis and frequencies on vertical axis using separated bars

Pie Charts vs Bar Charts

Pie Charts

  • Show parts of a whole
  • Best for relative comparisons
  • Circular representation
  • Limited to one dataset

Bar Charts

  • Show individual values
  • Easy to compare magnitudes
  • Rectangular bars
  • Can display multiple datasets

Interactive Bar Chart

Grouping Quantitative Data

Key Guidelines for Grouping

1. Appropriate Number of Classes

Use 5-20 classes for effective summary while preserving data characteristics

2. Mutually Exclusive Classes

Each observation belongs to one and only one class

3. Equal Class Widths

Whenever feasible, all classes should have the same width

Three Grouping Methods

Single-Value Grouping

Each class represents one possible value

Best for: Discrete data with few distinct values

Example: Number of TVs (0, 1, 2, 3, 4, 5, 6)

Limit Grouping

Classes defined by lower and upper limits

Best for: Whole numbers with many distinct values

Example: 30-39, 40-49, 50-59 days

Cutpoint Grouping

Classes defined by cutpoints (boundaries)

Best for: Continuous data with decimals

Example: 120-under 140, 140-under 160 lbs

Histograms

Histogram

A graph that displays classes on horizontal axis and frequencies on vertical axis with touching bars

Types of Histograms

  • Frequency Histogram: Uses raw frequencies
  • Relative-Frequency Histogram: Uses relative frequencies
  • Percent Histogram: Uses percentages

Interactive Histogram Builder

Sample Data: Days to Maturity

70, 64, 99, 55, 64, 89, 87, 65, 62, 38, 67, 70, 60, 69, 78, 39, 75, 56, 71, 51, 99, 68, 95, 86, 57, 53, 47, 50, 55, 81, 80, 98, 51, 36, 63, 66, 85, 79, 83, 70
7

Dotplots

Dotplot

A graph where each observation is plotted as a dot above a horizontal axis, with equal values stacked vertically

Advantages of Dotplots

  • Shows individual data values
  • Reveals patterns and clusters
  • Easy to construct and interpret
  • Good for comparing datasets
  • Works well with small to medium datasets

Interactive Dotplot

DVD Player Prices ($)

210, 219, 214, 197, 224, 219, 199, 199, 208, 209, 215, 199, 212, 212, 219, 210

Stem-and-Leaf Diagrams

Stem-and-Leaf Diagram

Each observation is separated into a stem (all but rightmost digit) and a leaf (rightmost digit)

Construction Steps

  1. Separate each observation into stem and leaf
  2. List stems vertically from smallest to largest
  3. Write leaves for each stem to the right
  4. Arrange leaves in ascending order

Interactive Stem-and-Leaf Builder

Sample Data: Days to Maturity

70, 64, 99, 55, 64, 89, 87, 65, 62, 38, 67, 70, 60, 69, 78, 39, 75, 56, 71, 51
Stem | Leaves

Distribution Shapes: Modality

Modality

The number of peaks (highest points) in a distribution

Interactive Shape Generator

Distribution Shapes: Symmetry and Skewness

Symmetric

Can be divided into two mirror-image pieces

Bell-shaped

Uniform

Triangular

Skewed

Not symmetric; has one tail longer than the other

Right Skewed

Right tail is longer

Left Skewed

Left tail is longer

Interactive Skewness Analyzer

0 (Symmetric)

Real Data Analysis: Household Sizes

U.S. Household Size Distribution

Based on U.S. Census Bureau data from Current Population Reports

Household Size Relative Frequency Percentage
10.27327.3%
20.34034.0%
30.15815.8%
40.13713.7%
50.0626.2%
60.0212.1%
7+0.0090.9%

Distribution Analysis

Shape Analysis

  • Modality: Unimodal (peaks at 2 people)
  • Symmetry: Right skewed
  • Tail: Long right tail extends to large household sizes

Misleading Graphs: Truncated Graphs

The Problem with Truncated Graphs

When the vertical axis doesn't start at zero, small differences can appear dramatically large

Unemployment Rate Example

Misleading (Truncated)

Y-axis starts at 7.0%

Accurate (Full Scale)

Y-axis starts at 0%

Impact Analysis

Actual Change

7.9% → 7.6% = 0.3 percentage points

Perceived Change

Truncated graph suggests ~33% decrease

Reality

Less than 4% relative decrease

Other Misleading Graph Techniques

Improper Scaling

Using different scales or inappropriate intervals

3D Effects

3D perspective can distort relative sizes

Pictograms

Using areas/volumes instead of heights

Cherry-Picking Data

Selecting only favorable time periods or data points

How to Detect Misleading Graphs

Check the Axes

Ensure axes start at zero or note truncation symbols

Examine Scales

Look for consistent and appropriate intervals

Question the Context

Consider the full time period and data range

Look for Symbols

Warning symbols (⚡) indicating axis modifications

Using Technology for Data Analysis

Statistical Software Options

Excel

  • Accessible and familiar
  • Good for basic analysis
  • Built-in chart tools
  • XLSTAT add-in for advanced features

Minitab

  • User-friendly interface
  • Comprehensive statistical tools
  • Excellent graphing capabilities
  • Educational focus

R / Python

  • Open source and free
  • Extremely powerful
  • Extensible with packages
  • Industry standard

Typical Analysis Workflow

1

Data Import

Load data from files (CSV, Excel, databases)

2

Data Cleaning

Handle missing values, outliers, formatting issues

3

Exploratory Analysis

Create frequency distributions, histograms, summary statistics

4

Visualization

Generate charts and graphs for interpretation

5

Interpretation

Analyze patterns, trends, and distribution shapes

Practice Problem 1: Data Classification

World's Highest Temperatures

The World Meteorological Association data on highest recorded temperatures:

Rank Continent Location Temperature (°F)
1N. AmericaDeath Valley, CA134
2AfricaKebili, Tunisia131
3AsiaTirat Tsvi, Israel129
4AustraliaOodnadatta, Australia123

Classification Tasks

Question 1: Classify the data type for "Rank"

Question 2: Classify the data type for "Continent"

Question 3: Classify the data type for "Temperature"

Practice Problem 2: Frequency Distribution

TV Sets per Household

Data for 50 randomly selected households:

1, 1, 1, 2, 6, 3, 3, 4, 2, 4, 3, 2, 1, 5, 2, 1, 3, 6, 2, 2, 3, 1, 1, 4, 3, 2, 2, 2, 2, 3, 0, 3, 1, 2, 1, 2, 3, 1, 1, 3, 3, 2, 1, 2, 1, 1, 3, 1, 5, 1

Interactive Solution Builder

TVs Frequency Relative Frequency

Case Study Analysis: Forbes Billionaires

Citizenship Distribution

Key Insights

  • United States dominates with 11 billionaires (44%)
  • Europe represented by 4 countries
  • Asia-Pacific shows strong presence (6 billionaires)
  • Geographic diversity reflects global economic power

Age Distribution Analysis

Mean Age

67.4 years

Youngest

39 years (Sergey Brin)

Oldest

93 years (Karl Albrecht)

Wealth Distribution

Distribution Characteristics

  • Shape: Right-skewed distribution
  • Peak: Most billionaires in $20-30B range
  • Outliers: Carlos Slim Helu at $73B
  • Concentration: 60% below $30B threshold

Chapter Summary

Key Concepts Mastered

Data Classification

  • Qualitative vs. Quantitative
  • Discrete vs. Continuous
  • Importance for method selection

Qualitative Data Organization

  • Frequency distributions
  • Pie charts and bar charts
  • Relative frequency calculations

Quantitative Data Organization

  • Grouping methods (single-value, limit, cutpoint)
  • Histograms and dotplots
  • Stem-and-leaf diagrams

Distribution Analysis

  • Modality identification
  • Symmetry and skewness
  • Shape classification

Critical Evaluation

  • Identifying misleading graphs
  • Truncation effects
  • Proper visualization principles

Skills Assessment Checklist

Congratulations!

Organizing Data - Complete

📊

Data Organization Expert

Ready for Descriptive Statistics and Numerical Measures

What's Next in Your Statistical Journey

Chapter 3

Numerical Descriptive Measures

Mean, median, mode, standard deviation

Chapter 4

Probability Concepts

Basic probability rules and applications

Chapter 5

Discrete Probability Distributions

Binomial, Poisson, and other distributions

Practice

Real-world applications

Business cases and data projects

Remember: Effective data organization is the foundation of all statistical analysis!

Questions & Discussion

Ready to tackle textbook exercises 2.1-2.5 and explore real datasets!

Office Hours: D312, by appointment

Email: sorujov@ada.edu.az