The Three Musketeers of Math: Mean, Median, and Mode

 Mean, Median, and Mode




๐ŸŸฐ๐ŸŸฐ๐ŸŸฐ๐ŸŸฐ๐ŸŸฐ๐ŸŸฐ๐ŸŸฐ๐ŸŸฐ๐ŸŸฐ๐ŸŸฐ๐ŸŸฐ๐ŸŸฐ๐ŸŸฐ๐ŸŸฐ๐ŸŸฐ๐ŸŸฐ๐ŸŸฐ๐ŸŸฐ๐ŸŸฐ๐ŸŸฐ๐ŸŸฐ๐ŸŸฐ๐ŸŸฐ๐ŸŸฐ๐ŸŸฐ๐ŸŸฐ๐ŸŸฐ
Statistics is very intimidating to me and is kicking my ass this semester, so writing these blogs and relating them to something fun really helps me commit it to memory. So today I am introducing the Three Musketeers of Math: Mean, Median, and Mode. These swashbuckling statistics will help you understand any dataset like Zorro deciphers a secret message.

๐ŸŒก️๐ŸŒก️๐ŸŒก️๐ŸŒก️๐ŸŒก️๐ŸŒก️๐ŸŒก️๐ŸŒก️๐ŸŒก️๐ŸŒก️๐ŸŒก️๐ŸŒก️๐ŸŒก️

Meet the Crew

  • The Average Avenger: Mean is the sum of all the values of your data divided by the number of values. Think of it as sharing a pizza equally among your friends. Everyone gets a slice. 
  • The Middle Mastermind: Median is the value that splits your data in half when ordered from least to greatest. Imagine lining up your friends by height. The median friend is smack dab in the middle, not the shortest or the tallest.
  • The Most Popular Posse: Mode is the value that appears most often in your data. It's like the friend who always shows up to parties, the life of the statistical soiree.

When to Call on Each Musketeer

Each Musketeer has their strengths and weaknesses. Mean is great for normally distributed data - think bell curve, but gets thrown off by outliers- think your friend who brought three extra pizzas - skewing the average. The median shines when you have skewed data or outliers, but it doesn't consider all the values like the mean does. Mode is all about popularity, but it can be unreliable is there's no clear favorite value- think of friends who are all equally awesome in their own way.

 ๐ŸŸฅ๐ŸŸฅ๐ŸŸฅ๐ŸŸฅ๐ŸŸฅ๐ŸŸฅ๐ŸŸฅ๐ŸŸฅ๐ŸŸฅ๐ŸŸฅ


The Musketeers in action

Let's say you're tracking your video game scores: 10, 20, 30,30,40,50


  • The mean:( 10 +20+30+30+40+50) /6 = 30
  • The Median: Order the scores (10,20,30,30,40,50) the middle value is 30.
  • Mode: 30 appears twice, making it the most popular score.
 

The Mean, Median, and Mode are not rivals, they're complementary! Use them together to paint a richer picture of your data,

The Incredible Experiment:

 A Superhero Guide to Independent and Dependent Variables


๐ŸงŠ๐ŸงŠ๐ŸงŠ๐ŸงŠ๐ŸงŠ๐ŸงŠ๐ŸงŠ๐ŸงŠ๐ŸงŠ๐ŸงŠ๐ŸงŠ๐ŸงŠ๐ŸงŠ๐ŸงŠ๐ŸงŠ๐ŸงŠ๐ŸงŠ๐ŸงŠ๐ŸงŠ๐ŸงŠ๐ŸงŠ๐ŸงŠ๐ŸงŠ๐ŸงŠ๐ŸงŠ๐ŸงŠ

Ever dreamed of being a scientist, wielding potions and peering through microscopes? Let us embark on a scientific adventure, unraveling the secrets of independent and dependent variables, your super tools for understanding the world around you!


Imagine a superhero lab



  • Professor Potential who is our wise mentor, mixing bubbling concoctions and spouting scientific wisdom.
  • Experiment X who is our trusty robot sidekick, ready for any test.
  • The question is will Professor Potential's new super-strength serum actually work on experiment X


The Key Players

  • Independent Variable - aka The Twister: This is the variable we change or control in our experiment. Just like Professor Potential changing the formula of the serum, the independent variable gets twisted and turned to see its effect.\
  • Dependent Variable - aka the Detector: This is the variable we measure or observe to see how it reacts to the changes in the independent variable. Experiment X will lift weights to see if his strength increases - that is the dependent variable, the detector of the serum's power.

The Big Showdown:


Professor Potential whips up different serums, changing the amount to a special ingredient which is the independent variable. Experiment X gulps them down and lifts weights with all his might. We measure how much he lifts which is the dependent variable - does it skyrocket with each new formula? 

The Reveal:


If Experiment X is suddenly bench pressing cars after the super-strength serum, it means the independent variable which is the serum formula has a clear effect on the dependent variable which is the weight lifted. If he is still struggling with tiny dumbbells, well, back to the lab!


  • The Independent variable controls the show, the variable we twist and turn.
  • The dependent variable watches the results, the variable that changes or doesn't in response.






Are Werewolves Real??

 Investigating the Moon and Relapse with Statistics

๐Ÿงช๐Ÿงช๐Ÿงช๐Ÿงช๐Ÿงช๐Ÿงช๐Ÿงช๐Ÿงช๐Ÿงช๐Ÿงช๐Ÿงช๐Ÿงช๐Ÿงช๐Ÿงช๐Ÿงช๐Ÿงช๐Ÿงช๐Ÿงช๐Ÿงช๐Ÿงช๐Ÿงช๐Ÿงช๐Ÿงช๐Ÿงช๐Ÿงช๐Ÿงช๐Ÿงช

Ever heard that full moons bring out the crazies? Or maybe just a few extra patients in the emergency room? While the image of howling werewolves might be exaggerated, the question of a lunar influence on human behavior persists. Today, we'll put on our lab coats and use statistics to investigate the fascinating and often debated connection between moon phases and relapse rates.

๐ŸŒก️๐ŸŒก️๐ŸŒก️๐ŸŒก️๐ŸŒก️๐ŸŒก️๐ŸŒก️๐ŸŒก️

The Suspect: Luna, the Earth's Satellite



Lunar lore spans centuries, with beliefs linking the moon to everything from tides to fertility. But for our investigation, the focus is on potential changes in human physiology or psychology based on the moon's phases. Some theories suggest gravitational or electromagnetic forces might play a role, while others point to altered sleep patterns or increased suggestibility under a full moon.

Gathering Evidence: The Power of Data

To test these theories, we need data. Lots of it. This means tracking relapse rates for specific conditions like addiction, and mental health episodes over time, alongside the corresponding lunar phases.

The Statistical Sleuthing:


Here's where the real fun begins! We can use various statistical tools to examine the data and see if there is any connection between phases and relapse. Let's explore some possibilities:
  • Chi-square test: This tests whether the observed distribution of relapses across lunar phases is different from what we would expect by chance.
  • Correlation coefficient: This measures the strength and direction of any relationship between lunar phases and relapse rates.
  • Regression analysis: This allows us to control for other factors that might influence relapses
     such as seasonality or weather, and see if the moon effect remains significant.
Even if some studies show a faint lunar link, it's crucial to remember correlation does not equal causation because there could be other, unknown factors at play.

So, are werewolves real? Based on current research, probably not. But the moon's influence on human behavior remains a captivating mystery. Statistical analysis helps us piece together the clues, but until the evidence speaks louder, we should maintain a healthy dose of skepticism and keep exploring.

Remember, science is a journey, not a destination. And in the realm of lunar mysteries, every full moon might just bring us a new chapter
in the story.



Data Detective:

 Cracking the Case of Interval and Ratio Data!!!

๐ŸงŠ๐ŸงŠ๐ŸงŠ๐ŸงŠ๐ŸงŠ๐ŸงŠ๐ŸงŠ๐ŸงŠ๐ŸงŠ๐ŸงŠ๐ŸงŠ๐ŸงŠ๐ŸงŠ๐ŸงŠ๐ŸงŠ๐ŸงŠ๐ŸงŠ๐ŸงŠ๐ŸงŠ๐ŸงŠ๐ŸงŠ๐ŸงŠ๐ŸงŠ๐ŸงŠ๐ŸงŠ๐ŸงŠ๐ŸงŠ
Remember the thrilling mystery novels where clues whisper secrets and numbers hold hidden truths? Well, buckle up, fellow data detective, because this time we are cracking the case of interval and ratio data!



Gone are the days of simple yes or no answers - nominal data- or rankings without measurements - ordinal data. Now we are dealing with numbers that sing, dance, and reveal fascinating secrets about the world around us.



Interval data 


Imagine a thermometer. It displays degrees, from freezing cold to scorching hot, but there's no true zero. Zero on a thermometer does not mean the absence of heat, just some arbitrary starting point. That is interval data: numbers with equal differences, but no absolute reference point.


Think of it like a ruler. Each centimeter is the same, but you would not say a book measured at zero centimeters is nonexistent. It just starts at a different point than your ruler's zero.

๐ŸŒก️๐ŸŒก️๐ŸŒก️๐ŸŒก️๐ŸŒก️๐ŸŒก️๐ŸŒก️๐ŸŒก️๐ŸŒก️๐ŸŒก️๐ŸŒก️๐ŸŒก️๐ŸŒก️๐ŸŒก️


Ratio data


Now, picture a fancy scale, measuring your weight with a precise zero. This my friends is ratio data. It has all the benefits of interval data - equal differences - but with an added superpower: a true absolute zero. ero weight means no weight at all, not just some starting point in a system.

๐Ÿ‘€๐Ÿ‘€๐Ÿ‘€๐Ÿ‘€๐Ÿ‘€๐Ÿ‘€

Think of time: zero seconds is truly the absence of time, not just a starting point for your stopwatch. 

So what is the difference????

Imagine a race: With interval data, you know who came first, second, and third, but not their exact times. Ratio data reveals everyone's exact finishing times, allowing you to calculate speeds, and gaps, and even predict future winners!

Why does it matter???๐ŸŸฅ๐ŸŸฅ๐ŸŸฅ๐ŸŸฅ๐ŸŸฅ๐ŸŸฅ

Choosing the right data type is like picking the perfect tool for the job. Using interval data for calculations that require a true zero can lead to skewed results, like trying to hammer a nail with a spoon.

So the next time you encounter a set of numbers, don't just stare blankly. Put on your detective hat and ask Interval or ratio? Numbers hold the key to understanding temperature changes, predicting economic trends, and even measuring the speed of that falling toast-ratio
data, by the way.







 


Let's Sort

 Sorting it out: A Guide to Ordinal and

Nominal Data

๐ŸชŸ๐ŸชŸ๐ŸชŸ๐ŸชŸ๐ŸชŸ๐ŸชŸ๐ŸชŸ๐ŸชŸ๐ŸชŸ๐ŸชŸ๐Ÿงช๐Ÿงช๐Ÿงช๐Ÿงช๐Ÿงช๐ŸชŸ๐ŸชŸ๐ŸชŸ๐ŸชŸ๐ŸชŸ๐ŸชŸ๐ŸชŸ๐ŸชŸ๐ŸชŸ๐ŸชŸ๐ŸชŸ

Data, data, everywhere! Numbers dance across spreadsheets, charts bursting with colorful bars...but not all data is created equal. Ordinal and nominal, ever heard of them?



Imagine a movie theatre:
  • The seat numbers tell you where to sit, but there is no inherent order or comparison between them. You would not claim that one seat is better than the other. This is basically nominal data.
  • Picture the rows: front row, middle row! The front row sits closer to the screen, the back row farther, and the middle row falls somewhere in between. Each level has a definite rank compared to the others. This is ordinal data.
๐Ÿงช๐Ÿงช๐Ÿงช๐Ÿงช๐Ÿงช๐Ÿงช๐Ÿงช๐Ÿงช๐Ÿงช๐Ÿงช๐Ÿงช๐Ÿงช๐Ÿงช๐Ÿงช๐Ÿงช๐Ÿงช๐Ÿงช๐Ÿงช๐Ÿงช๐Ÿงช๐Ÿงช๐Ÿงช๐Ÿงช๐Ÿงช๐Ÿงช๐Ÿงช

OK what is the difference?

  • Nominal data:
    • Think categories, not ranks. Imagine hair color, political party, or music genre. These are nominal-just labels that group things together without implying any order or inherent relationship.
    • Surveys and questionnaires love them. Ask "What is your favorite color? and you will get nominal data like blue, green, and purple without inherent order just individual categories.
    • Counting and percentages are their forte. We can count how many people like each color, but we cannot say blue is greater than green.
  • Ordinal data
    • Ranks matter! Think movie rows, exam grades, or clothing sizes. These levels have a clear order, each higher than the one below.
    • They tell you more than or less than. A student with an A outperformed someone with a C. A large shirt is bigger than a medium.
    • But beware of stretching the order!! Ordinal data does not always allow for equal intervals between levels. A B student is not necessarily twice as good as a D student.
๐ŸงŠ๐ŸงŠ๐ŸงŠ๐ŸงŠ๐ŸงŠ๐ŸงŠ๐ŸงŠ๐ŸŒก️๐ŸŒก️๐ŸŒก️๐ŸŒก️๐ŸŒก️๐ŸŒก️๐ŸŒก️๐ŸŒก️๐ŸŒก️๐ŸŒก️๐ŸŒก️๐ŸŒก️๐ŸŒก️๐ŸŒก️๐ŸŒก️๐ŸŒก️๐ŸŒก️๐ŸŒก️๐Ÿ‘€๐Ÿ‘€๐Ÿ‘€๐Ÿ‘€๐Ÿ‘€๐Ÿ‘€๐Ÿ‘€๐Ÿ‘€๐Ÿ‘€๐Ÿ‘€๐Ÿ‘€๐Ÿ‘€๐Ÿ‘€
Why does it matter??

Choosing the right data type is crucial for accurate analysis and meaningful conclusions. Using nominal data for calculations that assume order can lead to misleading results. Conversely, forcing ordinal data into strict mathematical operations might not make sense.


Data is like legos: different pieces fit together in different ways. Knowing which type you're holding is key to building something insightful and robust.

So, next time you see data dancing around, do not be afraid to ask: ordinal or nominal to unlock a hidden story within the numbers. 

๐Ÿณ️‍๐ŸŒˆ๐Ÿณ️‍๐ŸŒˆ๐Ÿณ️‍๐ŸŒˆ๐Ÿณ️‍๐ŸŒˆ๐Ÿณ️‍๐ŸŒˆ๐Ÿณ️‍๐ŸŒˆ๐Ÿณ️‍๐ŸŒˆ๐Ÿณ️‍๐ŸŒˆ๐Ÿณ️‍๐ŸŒˆ๐Ÿณ️‍๐ŸŒˆ๐Ÿณ️‍๐ŸŒˆ๐Ÿณ️‍๐ŸŒˆ

There are other data types out there, like interval and ratio data, each with their own quirks and strengths.

Data

Levels




๐ŸชŸ๐ŸชŸ๐ŸชŸ๐ŸชŸ๐ŸชŸ๐ŸชŸ๐ŸชŸ๐ŸชŸ๐ŸชŸ๐ŸชŸ๐ŸชŸ๐ŸชŸ๐ŸชŸ๐ŸชŸ๐ŸชŸ๐ŸชŸ๐ŸชŸ๐ŸชŸ
  • Four measurement scales - ways to categorize different types of variables and choose the right statistical test, visualization technique, and guide data analysis.
    • nominal - names/ labels
    • ordinal - order is important
    • interval - space between/ tell us about order and the value between each item
    • ratio - ultimate order, interval values, plus the ability to calculate ratios since a true zero can be defined

Qualitative Data ๐ŸŸฅ๐ŸŸฅ๐ŸŸฅ๐ŸŸฅ๐ŸŸฅ๐ŸŸฅ๐ŸŸฅ๐ŸŸฅ

  • Nominal Variables - values are not ordered like nationality, gender, etc.
    • Nominal scales are used for labeling variables without any quantitative value.
    • They could simply be called labels
    • nominal sounds like names and these scales are like names or labels.
    • At this level, you can not do any quantitative mathematical operations like addition or division.
    • You can do basic counts using pandas' value _counts method
    • graphs like bar charts, and pie charts.
  • Ordinal Variables - 
    • the order of the values is important and significant but the differences between each one are not known.
    • typically measures of non-numeric concepts like satisfaction, happiness, discomfort, etc.
    • Ordinal sounds like order and it is the order that matters and that is all you really get.
    • We can do basic counts as we do with nominal data and have comparisons and orderings.
    • graphs like bar and pie charts but now we can calculate medians and percentiles
    • with medians and percentiles stem and leaf plots as well as box plots are possible.

Quantitative Data๐ŸŸฆ๐ŸŸฆ๐ŸŸฆ๐ŸŸฆ๐ŸŸฆ๐ŸŸฆ๐ŸŸฆ๐ŸŸฆ๐ŸŸฆ


  •  Two types of Quantitative variables
    • Discrete Variables - their values are countable and can only assume certain values with no intermediate values like the number of heads in 10 coin tosses
    • Continuous Variables - can assume any numerical value over a certain interval or intervals example the height of a person.
Interval

  • numeric scales where we know both the order and the exact differences between the values.
  • Celsius temperature is an example because the difference between each value is the same.
  • The histogram - visualizes buckets of quantities and shows the frequencies of these buckets and we can use scatter plots - where we can graph two columns of data on our axes and visualize data points as literal points on the graph.
  • Don't have a true zero - there is no such thing as no temperature. Negative numbers also have a meaning.
  • We can add and subtract but can not multiply or divide.

Ratio

  • tell us about order, exact value between units, and have an absolute zero.
  • height and weight are examples of this.
  • They can be added, subtracted, multiplied, and divided.
  • Central tendency can be measured by mode, median, or mean
  • Measures of dispersion such as standard deviation and coefficient variation can be calculated from ratio scales.


 ๐ŸŒก️๐ŸŒก️๐ŸŒก️๐ŸŒก️๐ŸŒก️๐ŸŒก️๐ŸŒก️๐ŸŒก️๐ŸŒก️๐ŸŒก️๐ŸŒก️๐ŸŒก️๐ŸŒก️๐ŸŒก️๐ŸŒก️๐ŸŒก️๐ŸŒก️

 References

 https://medium.com/@rndayala/data-levels-of-measurement-4af33d9ab51a

Stats Can Be Sexy

                                        

Visualizing Data for the Masses



๐ŸชŸ๐ŸชŸ๐ŸŒก️๐ŸŒก️๐ŸŒก️๐ŸŸฅ๐ŸŸฅ๐ŸŸฅ▶️▶️

Wells's 1903 argument

  • Physical science and advanced thinking require mathematical analysis skills
  • Soon, citizen competence will include the ability to compute, analyze averages,, and understand extremes.

Wilks's 1951 simplification


  • "Statistical thinking will be essential for citizenship as reading and writing" (Marriott, 2014).


Wilks's breakdown of statistical thinking according to Marriot (2014).

  • Six core concepts
    • Expectation and variance - understanding averages, maximums, and minimums.
    • Distribution - Recognizing patterns in data variation
    • Probability - Assessing the likelihood of events
    • Risk - Evaluating potential costs or dangers
    • Correlation - Identifying relationships between variables
Basically, both thinkers highlight the need for data literacy in a world increasingly driven by information and analysis.
๐ŸชŸ๐ŸชŸ๐ŸชŸ๐ŸชŸ๐ŸชŸ๐ŸชŸ๐ŸชŸ๐ŸชŸ๐ŸชŸ๐ŸชŸ๐ŸชŸ๐ŸชŸ๐ŸชŸ๐ŸชŸ๐ŸชŸ

Marriot (2014) argues that the traditional definition of statistical thinking needs to be expanded to include three new concepts which are data, cognition, and visualization.


Data

Data is the lifeblood of statistics, but it's not explicitly included in the current definition. Marriot (2014) highlights the risk of big data and data science, suggesting that statisticians risk being left behind if they do not embrace data in all its forms. (Marriot, 2014)

Marriot (2014) states that adding data to the definition of statistical thinking will not solve the problem on its own, but it will send an important message that statisticians are the original data scientists and embrace data in all its forms. 


Cognition๐ŸŸฆ๐ŸŸฆ๐ŸŸฆ๐ŸŸฆ๐ŸŸฆ๐ŸŸฆ



The human ability to think statistically is limited and  Kahneman's book exposes cognitive errors made by people and statisticians according to Marriot (2014).

Dual system thinking - Marriot (2014) states that Kahneman proposes two thinking systems:
  1. System 1- fast, intuitive, prone to biases
  2. System 2 - slow, logical, effortful
  • Statistical thinking relies heavily on system 2
  • Despite our natural cognitive limitations, Marriot (2014) reminds us that Kahneman offers strategies to mitigate errors, encouraging the conscious engagement of System 2 in statistical reasoning, since System 1's instinctive responses can lead to erroneous judgments.


Visualization๐ŸŸฅ๐ŸŸฅ๐ŸŸฅ๐ŸŸฅ▶️


  • Statisticians excel at visualization tools like histograms, and box plots but at the same time struggle with effective communication through visuals.
  • Including visualization in the definition of statistical thinking emphasizes statisticians' ability to analyze and communicate data effectively.
  • Statisticians should embrace collaboration with other professionals like graphic designers and neuroscientists to keep up with evolving data trends and expertise.



๐ŸŸฅ๐ŸŸฅ๐ŸŸฅ๐ŸŸฅ๐ŸŸฅ๐ŸŸฅ๐ŸŸฅ๐ŸŸฅ๐ŸŸฅ๐ŸŸฅ๐ŸŸฅ๐ŸŸฅ


                                                            References

Marriott, N. (2014), The future of statistical thinking. Significance, 11: 78-80. https://doi.org/10.1111/j.1740-9713.2014.00787.x

Featured Blog Post

Amphetamines: A History of Abuse and Addiction

 Amphetamines have a long and complex history, dating back thousands of years (Rosenthal, 2022). Originally they were used for medicinal pur...

Popular Posts