Pages

2024/01/21

Data Detective:

 Cracking the Case of Interval and Ratio Data!!!

🧊🧊🧊🧊🧊🧊🧊🧊🧊🧊🧊🧊🧊🧊🧊🧊🧊🧊🧊🧊🧊🧊🧊🧊🧊🧊🧊
Remember the thrilling mystery novels where clues whisper secrets and numbers hold hidden truths? Well, buckle up, fellow data detective, because this time we are cracking the case of interval and ratio data!



Gone are the days of simple yes or no answers - nominal data- or rankings without measurements - ordinal data. Now we are dealing with numbers that sing, dance, and reveal fascinating secrets about the world around us.



Interval data 


Imagine a thermometer. It displays degrees, from freezing cold to scorching hot, but there's no true zero. Zero on a thermometer does not mean the absence of heat, just some arbitrary starting point. That is interval data: numbers with equal differences, but no absolute reference point.


Think of it like a ruler. Each centimeter is the same, but you would not say a book measured at zero centimeters is nonexistent. It just starts at a different point than your ruler's zero.

🌑️🌑️🌑️🌑️🌑️🌑️🌑️🌑️🌑️🌑️🌑️🌑️🌑️🌑️


Ratio data


Now, picture a fancy scale, measuring your weight with a precise zero. This my friends is ratio data. It has all the benefits of interval data - equal differences - but with an added superpower: a true absolute zero. ero weight means no weight at all, not just some starting point in a system.

πŸ‘€πŸ‘€πŸ‘€πŸ‘€πŸ‘€πŸ‘€

Think of time: zero seconds is truly the absence of time, not just a starting point for your stopwatch. 

So what is the difference????

Imagine a race: With interval data, you know who came first, second, and third, but not their exact times. Ratio data reveals everyone's exact finishing times, allowing you to calculate speeds, and gaps, and even predict future winners!

Why does it matter???πŸŸ₯πŸŸ₯πŸŸ₯πŸŸ₯πŸŸ₯πŸŸ₯

Choosing the right data type is like picking the perfect tool for the job. Using interval data for calculations that require a true zero can lead to skewed results, like trying to hammer a nail with a spoon.

So the next time you encounter a set of numbers, don't just stare blankly. Put on your detective hat and ask Interval or ratio? Numbers hold the key to understanding temperature changes, predicting economic trends, and even measuring the speed of that falling toast-ratio
data, by the way.







 


Let's Sort

 Sorting it out: A Guide to Ordinal and

Nominal Data

πŸͺŸπŸͺŸπŸͺŸπŸͺŸπŸͺŸπŸͺŸπŸͺŸπŸͺŸπŸͺŸπŸͺŸπŸ§ͺπŸ§ͺπŸ§ͺπŸ§ͺπŸ§ͺπŸͺŸπŸͺŸπŸͺŸπŸͺŸπŸͺŸπŸͺŸπŸͺŸπŸͺŸπŸͺŸπŸͺŸπŸͺŸ

Data, data, everywhere! Numbers dance across spreadsheets, charts bursting with colorful bars...but not all data is created equal. Ordinal and nominal, ever heard of them?



Imagine a movie theatre:
  • The seat numbers tell you where to sit, but there is no inherent order or comparison between them. You would not claim that one seat is better than the other. This is basically nominal data.
  • Picture the rows: front row, middle row! The front row sits closer to the screen, the back row farther, and the middle row falls somewhere in between. Each level has a definite rank compared to the others. This is ordinal data.
πŸ§ͺπŸ§ͺπŸ§ͺπŸ§ͺπŸ§ͺπŸ§ͺπŸ§ͺπŸ§ͺπŸ§ͺπŸ§ͺπŸ§ͺπŸ§ͺπŸ§ͺπŸ§ͺπŸ§ͺπŸ§ͺπŸ§ͺπŸ§ͺπŸ§ͺπŸ§ͺπŸ§ͺπŸ§ͺπŸ§ͺπŸ§ͺπŸ§ͺπŸ§ͺ

OK what is the difference?

  • Nominal data:
    • Think categories, not ranks. Imagine hair color, political party, or music genre. These are nominal-just labels that group things together without implying any order or inherent relationship.
    • Surveys and questionnaires love them. Ask "What is your favorite color? and you will get nominal data like blue, green, and purple without inherent order just individual categories.
    • Counting and percentages are their forte. We can count how many people like each color, but we cannot say blue is greater than green.
  • Ordinal data
    • Ranks matter! Think movie rows, exam grades, or clothing sizes. These levels have a clear order, each higher than the one below.
    • They tell you more than or less than. A student with an A outperformed someone with a C. A large shirt is bigger than a medium.
    • But beware of stretching the order!! Ordinal data does not always allow for equal intervals between levels. A B student is not necessarily twice as good as a D student.
πŸ§ŠπŸ§ŠπŸ§ŠπŸ§ŠπŸ§ŠπŸ§ŠπŸ§ŠπŸŒ‘οΈπŸŒ‘οΈπŸŒ‘οΈπŸŒ‘οΈπŸŒ‘οΈπŸŒ‘οΈπŸŒ‘οΈπŸŒ‘οΈπŸŒ‘οΈπŸŒ‘οΈπŸŒ‘οΈπŸŒ‘οΈπŸŒ‘οΈπŸŒ‘οΈπŸŒ‘οΈπŸŒ‘οΈπŸŒ‘οΈπŸŒ‘οΈπŸ‘€πŸ‘€πŸ‘€πŸ‘€πŸ‘€πŸ‘€πŸ‘€πŸ‘€πŸ‘€πŸ‘€πŸ‘€πŸ‘€πŸ‘€
Why does it matter??

Choosing the right data type is crucial for accurate analysis and meaningful conclusions. Using nominal data for calculations that assume order can lead to misleading results. Conversely, forcing ordinal data into strict mathematical operations might not make sense.


Data is like legos: different pieces fit together in different ways. Knowing which type you're holding is key to building something insightful and robust.

So, next time you see data dancing around, do not be afraid to ask: ordinal or nominal to unlock a hidden story within the numbers. 

πŸ³οΈβ€πŸŒˆπŸ³οΈβ€πŸŒˆπŸ³οΈβ€πŸŒˆπŸ³οΈβ€πŸŒˆπŸ³οΈβ€πŸŒˆπŸ³οΈβ€πŸŒˆπŸ³οΈβ€πŸŒˆπŸ³οΈβ€πŸŒˆπŸ³οΈβ€πŸŒˆπŸ³οΈβ€πŸŒˆπŸ³οΈβ€πŸŒˆπŸ³οΈβ€πŸŒˆ

There are other data types out there, like interval and ratio data, each with their own quirks and strengths.

2023/12/30

Data

Levels




πŸͺŸπŸͺŸπŸͺŸπŸͺŸπŸͺŸπŸͺŸπŸͺŸπŸͺŸπŸͺŸπŸͺŸπŸͺŸπŸͺŸπŸͺŸπŸͺŸπŸͺŸπŸͺŸπŸͺŸπŸͺŸ
  • Four measurement scales - ways to categorize different types of variables and choose the right statistical test, visualization technique, and guide data analysis.
    • nominal - names/ labels
    • ordinal - order is important
    • interval - space between/ tell us about order and the value between each item
    • ratio - ultimate order, interval values, plus the ability to calculate ratios since a true zero can be defined

Qualitative Data πŸŸ₯πŸŸ₯πŸŸ₯πŸŸ₯πŸŸ₯πŸŸ₯πŸŸ₯πŸŸ₯

  • Nominal Variables - values are not ordered like nationality, gender, etc.
    • Nominal scales are used for labeling variables without any quantitative value.
    • They could simply be called labels
    • nominal sounds like names and these scales are like names or labels.
    • At this level, you can not do any quantitative mathematical operations like addition or division.
    • You can do basic counts using pandas' value _counts method
    • graphs like bar charts, and pie charts.
  • Ordinal Variables - 
    • the order of the values is important and significant but the differences between each one are not known.
    • typically measures of non-numeric concepts like satisfaction, happiness, discomfort, etc.
    • Ordinal sounds like order and it is the order that matters and that is all you really get.
    • We can do basic counts as we do with nominal data and have comparisons and orderings.
    • graphs like bar and pie charts but now we can calculate medians and percentiles
    • with medians and percentiles stem and leaf plots as well as box plots are possible.

Quantitative Data🟦🟦🟦🟦🟦🟦🟦🟦🟦


  •  Two types of Quantitative variables
    • Discrete Variables - their values are countable and can only assume certain values with no intermediate values like the number of heads in 10 coin tosses
    • Continuous Variables - can assume any numerical value over a certain interval or intervals example the height of a person.
Interval

  • numeric scales where we know both the order and the exact differences between the values.
  • Celsius temperature is an example because the difference between each value is the same.
  • The histogram - visualizes buckets of quantities and shows the frequencies of these buckets and we can use scatter plots - where we can graph two columns of data on our axes and visualize data points as literal points on the graph.
  • Don't have a true zero - there is no such thing as no temperature. Negative numbers also have a meaning.
  • We can add and subtract but can not multiply or divide.

Ratio

  • tell us about order, exact value between units, and have an absolute zero.
  • height and weight are examples of this.
  • They can be added, subtracted, multiplied, and divided.
  • Central tendency can be measured by mode, median, or mean
  • Measures of dispersion such as standard deviation and coefficient variation can be calculated from ratio scales.


 πŸŒ‘️🌑️🌑️🌑️🌑️🌑️🌑️🌑️🌑️🌑️🌑️🌑️🌑️🌑️🌑️🌑️🌑️

 References

 https://medium.com/@rndayala/data-levels-of-measurement-4af33d9ab51a

Stats Can Be Sexy

                                        

Visualizing Data for the Masses



πŸͺŸπŸͺŸπŸŒ‘️🌑️🌑️πŸŸ₯πŸŸ₯πŸŸ₯▢️▢️

Wells's 1903 argument

  • Physical science and advanced thinking require mathematical analysis skills
  • Soon, citizen competence will include the ability to compute, analyze averages,, and understand extremes.

Wilks's 1951 simplification


  • "Statistical thinking will be essential for citizenship as reading and writing" (Marriott, 2014).


Wilks's breakdown of statistical thinking according to Marriot (2014).

  • Six core concepts
    • Expectation and variance - understanding averages, maximums, and minimums.
    • Distribution - Recognizing patterns in data variation
    • Probability - Assessing the likelihood of events
    • Risk - Evaluating potential costs or dangers
    • Correlation - Identifying relationships between variables
Basically, both thinkers highlight the need for data literacy in a world increasingly driven by information and analysis.
πŸͺŸπŸͺŸπŸͺŸπŸͺŸπŸͺŸπŸͺŸπŸͺŸπŸͺŸπŸͺŸπŸͺŸπŸͺŸπŸͺŸπŸͺŸπŸͺŸπŸͺŸ

Marriot (2014) argues that the traditional definition of statistical thinking needs to be expanded to include three new concepts which are data, cognition, and visualization.


Data

Data is the lifeblood of statistics, but it's not explicitly included in the current definition. Marriot (2014) highlights the risk of big data and data science, suggesting that statisticians risk being left behind if they do not embrace data in all its forms. (Marriot, 2014)

Marriot (2014) states that adding data to the definition of statistical thinking will not solve the problem on its own, but it will send an important message that statisticians are the original data scientists and embrace data in all its forms. 


Cognition🟦🟦🟦🟦🟦🟦



The human ability to think statistically is limited and  Kahneman's book exposes cognitive errors made by people and statisticians according to Marriot (2014).

Dual system thinking - Marriot (2014) states that Kahneman proposes two thinking systems:
  1. System 1- fast, intuitive, prone to biases
  2. System 2 - slow, logical, effortful
  • Statistical thinking relies heavily on system 2
  • Despite our natural cognitive limitations, Marriot (2014) reminds us that Kahneman offers strategies to mitigate errors, encouraging the conscious engagement of System 2 in statistical reasoning, since System 1's instinctive responses can lead to erroneous judgments.


VisualizationπŸŸ₯πŸŸ₯πŸŸ₯πŸŸ₯▢️


  • Statisticians excel at visualization tools like histograms, and box plots but at the same time struggle with effective communication through visuals.
  • Including visualization in the definition of statistical thinking emphasizes statisticians' ability to analyze and communicate data effectively.
  • Statisticians should embrace collaboration with other professionals like graphic designers and neuroscientists to keep up with evolving data trends and expertise.



πŸŸ₯πŸŸ₯πŸŸ₯πŸŸ₯πŸŸ₯πŸŸ₯πŸŸ₯πŸŸ₯πŸŸ₯πŸŸ₯πŸŸ₯πŸŸ₯


                                                            References

Marriott, N. (2014), The future of statistical thinking. Significance, 11: 78-80. https://doi.org/10.1111/j.1740-9713.2014.00787.x

Featured Blog Post

Amphetamines: A History of Abuse and Addiction

 Amphetamines have a long and complex history, dating back thousands of years (Rosenthal, 2022). Originally they were used for medicinal pur...

Some Popular Posts from my blog