Data

Levels




πŸͺŸπŸͺŸπŸͺŸπŸͺŸπŸͺŸπŸͺŸπŸͺŸπŸͺŸπŸͺŸπŸͺŸπŸͺŸπŸͺŸπŸͺŸπŸͺŸπŸͺŸπŸͺŸπŸͺŸπŸͺŸ
  • Four measurement scales - ways to categorize different types of variables and choose the right statistical test, visualization technique, and guide data analysis.
    • nominal - names/ labels
    • ordinal - order is important
    • interval - space between/ tell us about order and the value between each item
    • ratio - ultimate order, interval values, plus the ability to calculate ratios since a true zero can be defined

Qualitative Data πŸŸ₯πŸŸ₯πŸŸ₯πŸŸ₯πŸŸ₯πŸŸ₯πŸŸ₯πŸŸ₯

  • Nominal Variables - values are not ordered like nationality, gender, etc.
    • Nominal scales are used for labeling variables without any quantitative value.
    • They could simply be called labels
    • nominal sounds like names and these scales are like names or labels.
    • At this level, you can not do any quantitative mathematical operations like addition or division.
    • You can do basic counts using pandas' value _counts method
    • graphs like bar charts, and pie charts.
  • Ordinal Variables - 
    • the order of the values is important and significant but the differences between each one are not known.
    • typically measures of non-numeric concepts like satisfaction, happiness, discomfort, etc.
    • Ordinal sounds like order and it is the order that matters and that is all you really get.
    • We can do basic counts as we do with nominal data and have comparisons and orderings.
    • graphs like bar and pie charts but now we can calculate medians and percentiles
    • with medians and percentiles stem and leaf plots as well as box plots are possible.

Quantitative Data🟦🟦🟦🟦🟦🟦🟦🟦🟦


  •  Two types of Quantitative variables
    • Discrete Variables - their values are countable and can only assume certain values with no intermediate values like the number of heads in 10 coin tosses
    • Continuous Variables - can assume any numerical value over a certain interval or intervals example the height of a person.
Interval

  • numeric scales where we know both the order and the exact differences between the values.
  • Celsius temperature is an example because the difference between each value is the same.
  • The histogram - visualizes buckets of quantities and shows the frequencies of these buckets and we can use scatter plots - where we can graph two columns of data on our axes and visualize data points as literal points on the graph.
  • Don't have a true zero - there is no such thing as no temperature. Negative numbers also have a meaning.
  • We can add and subtract but can not multiply or divide.

Ratio

  • tell us about order, exact value between units, and have an absolute zero.
  • height and weight are examples of this.
  • They can be added, subtracted, multiplied, and divided.
  • Central tendency can be measured by mode, median, or mean
  • Measures of dispersion such as standard deviation and coefficient variation can be calculated from ratio scales.


 πŸŒ‘️🌑️🌑️🌑️🌑️🌑️🌑️🌑️🌑️🌑️🌑️🌑️🌑️🌑️🌑️🌑️🌑️

 References

 https://medium.com/@rndayala/data-levels-of-measurement-4af33d9ab51a

Stats Can Be Sexy

                                        

Visualizing Data for the Masses



πŸͺŸπŸͺŸπŸŒ‘️🌑️🌑️πŸŸ₯πŸŸ₯πŸŸ₯▶️▶️

Wells's 1903 argument

  • Physical science and advanced thinking require mathematical analysis skills
  • Soon, citizen competence will include the ability to compute, analyze averages,, and understand extremes.

Wilks's 1951 simplification


  • "Statistical thinking will be essential for citizenship as reading and writing" (Marriott, 2014).


Wilks's breakdown of statistical thinking according to Marriot (2014).

  • Six core concepts
    • Expectation and variance - understanding averages, maximums, and minimums.
    • Distribution - Recognizing patterns in data variation
    • Probability - Assessing the likelihood of events
    • Risk - Evaluating potential costs or dangers
    • Correlation - Identifying relationships between variables
Basically, both thinkers highlight the need for data literacy in a world increasingly driven by information and analysis.
πŸͺŸπŸͺŸπŸͺŸπŸͺŸπŸͺŸπŸͺŸπŸͺŸπŸͺŸπŸͺŸπŸͺŸπŸͺŸπŸͺŸπŸͺŸπŸͺŸπŸͺŸ

Marriot (2014) argues that the traditional definition of statistical thinking needs to be expanded to include three new concepts which are data, cognition, and visualization.


Data

Data is the lifeblood of statistics, but it's not explicitly included in the current definition. Marriot (2014) highlights the risk of big data and data science, suggesting that statisticians risk being left behind if they do not embrace data in all its forms. (Marriot, 2014)

Marriot (2014) states that adding data to the definition of statistical thinking will not solve the problem on its own, but it will send an important message that statisticians are the original data scientists and embrace data in all its forms. 


Cognition🟦🟦🟦🟦🟦🟦



The human ability to think statistically is limited and  Kahneman's book exposes cognitive errors made by people and statisticians according to Marriot (2014).

Dual system thinking - Marriot (2014) states that Kahneman proposes two thinking systems:
  1. System 1- fast, intuitive, prone to biases
  2. System 2 - slow, logical, effortful
  • Statistical thinking relies heavily on system 2
  • Despite our natural cognitive limitations, Marriot (2014) reminds us that Kahneman offers strategies to mitigate errors, encouraging the conscious engagement of System 2 in statistical reasoning, since System 1's instinctive responses can lead to erroneous judgments.


VisualizationπŸŸ₯πŸŸ₯πŸŸ₯πŸŸ₯▶️


  • Statisticians excel at visualization tools like histograms, and box plots but at the same time struggle with effective communication through visuals.
  • Including visualization in the definition of statistical thinking emphasizes statisticians' ability to analyze and communicate data effectively.
  • Statisticians should embrace collaboration with other professionals like graphic designers and neuroscientists to keep up with evolving data trends and expertise.



πŸŸ₯πŸŸ₯πŸŸ₯πŸŸ₯πŸŸ₯πŸŸ₯πŸŸ₯πŸŸ₯πŸŸ₯πŸŸ₯πŸŸ₯πŸŸ₯


                                                            References

Marriott, N. (2014), The future of statistical thinking. Significance, 11: 78-80. https://doi.org/10.1111/j.1740-9713.2014.00787.x

Featured Blog Post

Amphetamines: A History of Abuse and Addiction

 Amphetamines have a long and complex history, dating back thousands of years (Rosenthal, 2022). Originally they were used for medicinal pur...

Popular Posts