Crash Course Statistics - Aired Order - All Seasons

S01E01 What Is Statistics

January 24, 2018
YouTube

Welcome to Crash Course Statistics! In this series we're going to take a look at the important role statistics play in our everyday lives, because statistics are everywhere! Statistics help us better understand the world and make decisions from what you'll wear tomorrow to government policy. But in the wrong hands, statistics can be used to misinform. So we're going to try to do two things in this series. Help show you the usefulness of statistics, but also help you become a more informed consumer of statistics. From probabilities, paradoxes, and p-values there's a lot to cover in this series, and there will be some math, but we promise only when it's most important. But first, we should talk about what statistics actually are, and what we can do with them. Statistics are tools, but they can't give us all the answers.

S01E02 Mathematical Thinking

January 31, 2018
YouTube

Today we’re going to talk about numeracy - that is understanding numbers. From really really big numbers to really small numbers, it's difficult to comprehend information at this scale, but these are often the types of numbers we see most in statistics. So understanding how these numbers work, how to best visualize them, and how they affect our world can help us become better decision makers - from deciding if we should really worry about Ebola to helping improve fighter jets during World War II!

S01E03 Measures of Central Tendency

February 7, 2018
YouTube

Today we’re going to talk about measures of central tendency - those are the numbers that tend to hang out in the middle of our data: the mean, the media, and mode. All of these numbers can be called “averages” and they’re the numbers we tend to see most often - whether it’s in politics when talking about polling or income equality to batting averages in baseball (and cricket) and Amazon reviews. Averages are everywhere so today we’re going to discuss how these measures differ, how their relationship with one another can tell us a lot about the underlying data, and how they are sometimes used to mislead.

S01E04 Measures of Spread

February 14, 2018
YouTube

Today, we're looking at measures of spread, or dispersion, which we use to understand how well medians and means represent the data, and how reliable our conclusions are. They can help understand test scores, income inequality, spot stock bubbles, and plan gambling junkets. They're pretty useful, and now you're going to know how to calculate them!

S01E05 Data Visualization: Part 1

February 21, 2018
YouTube

Today we're going to start our two-part unit on data visualization. Up to this point we've discussed raw data - which are just numbers - but usually it's much more useful to represent this information with charts and graphs. There are two types of data we encounter, categorical and quantitative data, and they likewise require different types of visualizations. Today we'll focus on bar charts, pie charts, pictographs, and histograms and show you what they can and cannot tell us about their underlying data as well as some of the ways they can be misused to misinform.

S01E06 Plots, Outliers, and Justin Timberlake: Data Visualization Part 2

February 28, 2018
YouTube

Today we’re going to finish up our unit on data visualization by taking a closer look at how dot plots, box plots, and stem and leaf plots represent data. We’ll also talk about the rules we can use to identify outliers and apply our new data viz skills by taking a closer look at how Justin Timberlake’s song lyrics have changed since he went solo.

S01E07 The Shape of Data: Distributions

March 7, 2018
YouTube

When collecting data to make observations about the world it usually just isn't possible to collect ALL THE DATA. So instead of asking every single person about student loan debt for instance we take a sample of the population, and then use the shape of our samples to make inferences about the true underlying distribution our data. It turns out we can learn a lot about how something occurs, even if we don't know the underlying process that causes it. Today, we’ll also introduce the normal (or bell) curve and talk about how we can learn some really useful things from a sample's shape - like if an exam was particularly difficult, how often old faithful erupts, or if there are two types of runners that participate in marathons!

S01E08 Correlation Doesn’t Equal Causation

March 14, 2018
YouTube

Today we’re going to talk about data relationships and what we can learn from them. We’ll focus on correlation, which is a measure of how two variables move together, and we’ll also introduce some useful statistical terms you’ve probably heard of like regression coefficient, correlation coefficient (r), and r^2. But first, we’ll need to introduce a useful way to represent bivariate continuous data - the scatter plot. The scatter plot has been called “the most useful invention in the history of statistical graphics” but that doesn’t necessarily mean it can tell us everything. Just because two data sets move together doesn’t necessarily mean one CAUSES the other. This gives us one of the most important tenets of statistics: correlation does not imply causation.

S01E09 Controlled Experiments

March 21, 2018
YouTube

We may be living IN a simulation (according to Elon Musk and many others), but that doesn't mean we don't need to perform simulations ourselves. Today, we're going to talk about good experimental design and how we can create controlled experiments to minimize bias when collecting data. We'll also talk about single and double blind studies, randomized block design, and how placebos work.

S01E10 Sampling Methods and Bias with Surveys

March 28, 2018
YouTube

Today we’re going to talk about good and bad surveys. From user feedback surveys, telephone polls, and those questionnaires at your doctors office, surveys are everywhere, but with their ease to create and distribute, they're also susceptible to bias and error. So today we’re going to talk about how to identify good and bad survey questions, and how groups (or samples) are selected to represent the entire population since it's often just not feasible to ask everyone.

S01E11 Science Journalism

April 11, 2018
YouTube

We’ve talked a lot in this series about how often you see data and statistics in the news and on social media - which is ALL THE TIME! But how do you know who and what you can trust? Today, we’re going to talk about how we, as consumers, can spot flawed studies, sensationalized articles, and just plain poor reporting. And this isn’t to say that all science articles you read on facebook or in magazines are wrong, but that it's valuable to read those catchy headlines with some skepticism.

S01E12 Henrietta Lacks, the Tuskegee Experiment, & Ethical Data Collection

April 18, 2018
YouTube

oday we’re going to talk about ethical data collection. From the Tuskegee syphilis experiments and Henrietta Lacks’ HeLa cells to the horrifying experiments performed at Nazi concentration camps, many strides have been made from Institutional Review Boards (or IRBs) to the Nuremberg Code to guarantee voluntariness, informed consent, and beneficence in modern statistical gathering. But as we’ll discuss, with the complexities of research in the digital age many new ethical questions arise.

S01E13 Probability Part 1: Rules and Patterns

April 25, 2018
YouTube

Today we’re going to begin our discussion of probability. We’ll talk about how the addition (OR) rule, the multiplication (AND) rule, and conditional probabilities help us figure out the likelihood of sequences of events happening - from optimizing your chances of having a great night out with friends to seeing Cole Sprouse at IHop!

S01E14 Probability Part 2: Updating Your Beliefs with Bayes

May 2, 2018
YouTube

Today we're going to introduce bayesian statistics and discuss how this new approach to statistics has revolutionized the field from artificial intelligence and clinical trials to how your computer filters spam! We'll also discuss the Law of Large Numbers and how we can use simulations to help us better understand the "rules" of our data, even if we don't know the equations that define those rules.

S01E15 The Binomial Distribution

May 9, 2018
YouTube

Today we're going to discuss the Binomial Distribution and a special case of this distribution known as a Bernoulli Distribution. The formulas that define these distributions provide us with shortcuts for calculating the probabilities of all kinds of events that happen in everyday life.They can can also be used to help us look at how probabilities are connected! For instance, knowing the chance of getting a flat tire today is useful, but knowing the likelihood of getting one this year, or in the next five years, may be more useful. And heads up, this episode is going to have a lot more equations than normal, but to sweeten the deal, we added zombies!

S01E16 Geometric Distributions & The Birthday Paradox

May 16, 2018
YouTube

Geometric probabilities, and probabilities in general, allow us to guess how long we'll have to wait for something to happen. Today, we'll discuss how they can be used to figure out how many Bertie Bott's Every Flavour Beans you could eat before getting the dreaded vomit flavored bean, and how they can help us make decisions when there is a little uncertainty - like getting a Pikachu in a pack of Pokémon Cards! We'll finish off this unit on probability by taking a closer look at the Birthday Paradox (or birthday problem) which asks the question: how many people do you think need to be in a room for there to likely be a shared birthday? (It's likely much fewer than you would expect!)

S01E17 Randomness

May 23, 2018
YouTube

There are a lot of events in life that we just can’t predict, but just because something is random doesn’t mean we don’t know or can’t learn anything about it. Today, we’re going to talk about how we can extract information from seemingly random events starting with the expected value or mean of a distribution and walking through the first four “moments” - the mean, variance, skewness, and kurtosis.

S01E18 Z-Scores and Percentiles

May 30, 2018
YouTube

Today we’re going to talk about how we compare things that aren’t exactly the same - or aren’t measured in the same way. For example, if you wanted to know if a 1200 on the SAT is better than the 25 on the ACT. For this, we need to standardize our data using z-scores - which allow us to make comparisons between two sets of data as long as they’re normally distributed. We’ll also talk about converting these scores to percentiles and discuss how percentiles, though valuable, don’t actually tell us how “extreme” our data really is.

S01E19 The Normal Distribution

June 6, 2018
YouTube

Today is the day we finally talk about the normal distribution! The normal distribution is incredibly important in statistics because distributions of means are normally distributed even if populations aren't. We'll get into why this is so - due to the Central Limit Theorem - but it's useful because it allows us to make comparisons between different groups even if we don't know the underlying distribution of the population being studied.

S01E20 Confidence Intervals

June 13, 2018
YouTube

Today we’re going to talk about confidence intervals. Confidence intervals allow us to quantify our uncertainty, by allowing us to define a range of values for our predictions and assigning a likelihood that something falls within that range. And confidence intervals come up a lot like when you get delivery windows for packages, during elections when pollsters cite margin of errors, and we use them instinctively in everyday decisions. But confidence intervals also demonstrate the tradeoff of accuracy for precision - the greater our confidence, usually the less useful our range.

S01E21 How P-Values help us test hypothesis

June 27, 2018
YouTube

Today we're going to begin our three-part unit on p-values. In this episode we'll talk about Null Hypothesis Significance Testing (or NHST) which is a framework for comparing two sets of information. In NHST we assume that there is no difference between the two things we are observing and and use our p-value as a predetermined cutoff for if something seems sufficiently rare or not to allow us to reject that these two observations are the same. This p-value tells us if something is statistically significant, but as you'll see that doesn't necessarily mean the information is significant or meaningful to you.

S01E22 P-Value Problems

July 11, 2018
YouTube

Last week we introduced p-values as a way to set a predetermined cutoff when testing if something seems unusual enough to reject our null hypothesis - that they are the same. But today we’re going to discuss some problems with the logic of p-values, how they are commonly misinterpreted, how p-values don’t give us exactly what we want to know, and how that cutoff is arbitrary - and arguably not stringent enough in some scenarios.

S01E23 Playing with Power: P-Values Pt 3

July 18, 2018
YouTube

We're going to finish up our discussion of p-values by taking a closer look at how they can get it wrong, and what we can do to minimize those errors. We'll discuss Type 1 (when we think we've detected an effect, but there actually isn't one) and Type 2 (when there was an effect we didn't see) errors and introduce statistical power - which tells us the chance of detecting an effect if there is one.

S01E24 You know I’m all about that Bayes

July 25, 2018
YouTube

Today we’re going to talk about Bayes Theorem and Bayesian hypothesis testing. Bayesian methods like these are different from how we've been approaching statistics so far, because they allow us to update our beliefs as we gather new information - which is how we tend to think naturally about the world. And this can be a really powerful tool, since it allows us to incorporate both scientifically rigorous data AND our previous biases into our evolving opinions.

S01E25 Bayes in science and everyday life

August 1, 2018
YouTube

Today we're going to finish up our discussion of Bayesian inference by showing you how we can it be used for continuous data sets and be applied both in science and everyday life. From A/B testing of websites and getting a better understanding of psychological disorders to helping with language translation and purchase recommendations Bayes statistics really are being used everywhere!

S01E26 Test Statistics

August 8, 2018
YouTube

Test statistics allow us to quantify how close things are to our expectations or theories. Instead of going on our gut feelings, they allow us to add a little mathematical rigor when asking the question: “Is this random… or real?” Today, we’ll introduce some examples using both t-tests and z-tests and explain how critical values and p-values are different ways of telling us the same information. We’ll get to some other test statistics like F tests and chi-square in a future episode.

S01E27 T-Tests: A Matched Pair Made in Heaven

August 15, 2018
YouTube

Today we're going to walk through a couple of statistical approaches to answer the question: "is coffee from the local cafe, Caf-fiend, better than that other cafe, The Blend Den?" We'll build a two sample t-test which will tell us how many standard errors away from the mean our observed difference is in our tasting experiment, and then we'll introduce a matched pair t-tests which allow us to remove variation in the experiment. All of these approaches rely on the test statistic framework we introduced last episode.

S01E28 Degrees of Freedom & Effect Sizes

August 22, 2018
YouTube

Today we're going to talk about degrees of freedom - which are the number of independent pieces of information that make up our models. More degrees of freedom typically mean more concrete results. But something that is statistically significant isn't always practically significant. And to measure that, we'll introduce another new concept - effect size.

S01E29 Chi-Square Tests

August 29, 2018
YouTube

Today we're going to talk about Chi-Square Tests - which allow us to measure differences in strictly categorical data like hair color, dog breed, or academic degree. We'll cover the three main Chi-Square tests: goodness of fit test, test of independence, and test of homogeneity. And explain how we can use each of these tests to make comparisons.

S01E30 P-Hacking

September 5, 2018
YouTube

Today we're going to talk about p-hacking (also called data dredging or data fishing). P-hacking is when data is analyzed to find patterns that produce statistically significant results, even if there really isn't an underlying effect, and it has become a huge problem in science since many scientific theories rely on p-values as proof of their existence! Today, we're going to talk about a few ways researchers have "hacked" their data, and give you some tips for identifying and avoiding these types of problems when you encounter stats in your own lives.

S01E31 The Replication Crisis

September 26, 2018
YouTube

Replication (re-running studies to confirm results) and reproducibility (the ability to repeat an analyses on data) have come under fire over the past few years. The foundation of science itself is built upon statistical analysis and yet there has been more and more evidence that suggests possibly even the majority of studies cannot be replicated. This "replication crisis" is likely being caused by a number of factors which we'll discuss as well as some of the proposed solutions to ensure that the results we're drawing from scientific studies are reliable.

S01E32 Regression

October 3, 2018
YouTube

Today we're going to introduce one of the most flexible statistical tools - the General Linear Model (or GLM). GLMs allow us to create many different models to help describe the world - you see them a lot in science, economics, and politics. Today we're going to build a hypothetical model to look at the relationship between likes and comments on a trending YouTube video using the Regression Model. We'll be introducing other popular models over the next few episodes.

S01E33 ANOVA

October 10, 2018
YouTube

Today we're going to continue our discussion of statistical models by showing how we can find if there are differences between multiple groups using a collection of models called ANOVA. ANOVA, which stands for Analysis of Variance is similar to regression (which we discussed in episode 32), but allows us to compare three or more groups for statistical significance.

S01E34 ANOVA Part 2: Dealing with Intersectional Groups

October 17, 2018
YouTube

Do you think a red minivan would be more expensive than a beige one? Now what if the car was something sportier like a corvette? Last week we introduced the ANOVA model which allows us to compare measurements of more than two groups, and today we’re going to show you how it can be applied to look at data that belong to multiple groups that overlap and interact. Most things after all can be grouped in many different ways - like a car has a make, model, and color - so if we wanted to try to predict the price of a car, it’d be especially helpful to know how those different variables interact with one another.

S01E35 Fitting Models Is like Tetris

October 24, 2018
YouTube

Today we're going to wrap up our discussion of General Linear Models (or GLMs) by taking a closer looking at two final common models: ANCOVA (Analysis of Covariance) and RMA (Repeated Measures ANOVA). We'll show you how additional variables, known has covariates can be used to reduce error, and show you how to tell if there's a difference between 2 or more groups or conditions. Between Regression, ANOVA, ANCOVA, and RMA you should have the tools necessary to better analyze both categorical and continuous data.

S01E36 Supervised Machine Learning

October 31, 2018
YouTube

We've talked a lot about modeling data and making inferences about it, but today we're going to look towards the future at how machine learning is being used to build models to predict future outcomes. We'll discuss three popular types of supervised machine learning models: Logistic Regression, Linear discriminant Analysis (or LDA) and K Nearest Neighbors (or KNN).

S01E37 Unsupervised Machine Learning

November 7, 2018
YouTube

Today we're going to discuss how machine learning can be used to group and label information even if those labels don't exist. We'll explore two types of clustering used in Unsupervised Machine Learning: k-means and Hierarchical clustering, and show how they can be used in many ways - from book suggestions and medical interventions, to giving people better deals on pizza!

S01E38 Intro to Big Data

November 14, 2018
YouTube

Today, we're going to begin our discussion of Big Data. Everything from which videos we click (and how long we watch them) on YouTube to our likes on Facebook say a lot about us - and increasingly more and more sophisticated algorithms are being designed to learn about us from our clicks and not-clicks. Today we're going to focus on some ways Big Data impacts on our lives from what liking Hello Kitty says about us to how Netflix chooses just the right thumbnail to encourage us to watch more content. And Big Data is necessarily a good thing, next week we're going to discuss some of the problems that rise from collecting all that data.

S01E39 Big Data Problems

November 21, 2018
YouTube

There is a lot of excitement around the field of Big Data, but today we want to take a moment to look at some of the problems it creates. From questions of bias and transparency to privacy and security concerns, there is still a lot to be done to manage these problems as Big Data plays a bigger role in our lives.

S01E40 Statistics in the Courts

November 28, 2018
YouTube

As we near the end of the series, we're going look at how statistics impacts our lives. Today, we're going to discuss how statistics is often used and misused in the courtroom. We're going to focus on three stories in which three huge statistical errors were made: the handwriting analysis of French officer Alfred Dreyfus in 1894, the murder charges of mother Sally Clark in 1998, and the expulsion of student Jonathan Dorfman from UC San Diego in 2011.

S01E41 Neural Networks

December 12, 2018
YouTube

Today we're going to talk big picture about what Neural Networks are and how they work. Neural Networks, which are computer models that act like neurons in the human brain, are really popular right now - they're being used in everything from self-driving cars and Snapchat filters to even creating original art! As data gets bigger and bigger neural networks will likely play an increasingly important role in helping us make sense of all that data.

S01E42 War

December 19, 2018
YouTube

Today we're going to discuss the role of statistics during war. From helping the Allies break Nazi Enigma codes and estimate tank production rates to finding sunken submarines, statistics have and continue to play a critical role on the battlefield.

S01E43 When Predictions Fail

January 2, 2019
YouTube

Today we’re going to talk about why many predictions fail - specifically we’ll take a look at the 2008 financial crisis, the 2016 U.S. presidential election, and earthquake prediction in general. From inaccurate or just too little data to biased models and polling errors, knowing when and why we make inaccurate predictions can help us make better ones in the future. And even knowing what we can’t predict can help us make better decisions too.

S01E44 When Predictions Succeed

January 9, 2019
YouTube

In our series finale, we're going to take a look at some of the times we've used statistics to gaze into our crystal ball, and actually got it right! We'll talk about how stores know what we want to buy (which can sometimes be a good thing), how baseball was changed forever when Paul DePodesta created a record-winning Oakland A's baseball team, and how statistics keeps us safe with the incredible strides we've made in weather forecasting. Statistics are everywhere, and even if you don't remember all the formulae and graphs we've thrown at you in this series, we hope you take with you a better appreciation of the many ways statistics impacts your life, and hopefully we've given your a more math-y perspective on how the world works.

All Seasons

Season 1