Definitions – CSU Stats Tutor

Term	Defintion
Treatment Group	In an experiment this group of people receive some sort of manipulation/treatment.
Control Group	In an experiment this group is considered the baseline who do not take part in the manipulation, but might take a placebo or fake treatment.
Summary Statistic	A number that summarizes the data.
Case	A single observation.
Variable	Something that can be measured, described, or manipulated.
Quantitative Variable	Also known as a numerical variable. It is a variable that can be measued in a numcerical scale. The variable is either continuous or discrete.
Continuous Variable	A quantitative variable where the measurements can take on any value.
Discrete Variable	A quantitative variable that can be counted with whole numbers.
Qualitative Variable	Also known as a categorical variable. It is a variable where it has finite values and they can fit into particular groups. The variable is either nominal or ordinal.
Nominal Variable	A qualitative variable where its categories do not have a certain order to them.
Ordinal Variable	A qualitative variable where its categories have a special ordering.
Independent Variable	Usually denoted by "x". It is a variable that does not change based on other variables.
Dependent Variable	Usually denoted by "y". It is a variable that does change depending on the value of another variable.
Population	The entire group of interest.
Sample	A subset of the population or part of the group of interest.
Bias	To have an inclination to act or feel a certain way.
Simple Random Sample	Referred to as SRS, it is a sampling technique where each case has the same probability to be chosen.
Non-response Bias	A type of bias that usually shows up in surveys. It occurs when there is a high amount of nonrespondents.
Convenience Sample	A sample that is taken with cases that are easier to reach.
Explanatory Variable	Also referred to as the predictor variable or independent variable. This variable is manipulated by the researcher and tries to explain the response variable.
Response Variable	Also referred to as the dependent variable or outcome variable. This variable is the outcome that is being measured in an experiment.
Observational Study	A study where nothing is being controlled or changed.
Experiment	A study where variables are being controlled or manipulated.
Placebo	A fake treatment or drug.
Confounding Variable	Also referred to as a lurking variable. It is a variable that correlates with the dependent and independent variable.
Prospective Study	An observational study where data is taken as time goes on.
Retrospective Study	An observational study where data is taken after events have taken place.
Stratified Sampling	A sampling technique where the population is divided into groups called strata and then SRS is used.
Strata	Divided groups of a population that are formed wben using stratified sampling.
Cluster Sample	A sampling technique where the population is divided into many groups and a certain number of groups are randomly chosen. Then everybody in those groups are sampled.
Multistage Sample	Similar to cluster sampling but instead of sampling everybody in the cluster, we randomly sample a certain amount of people.
Blocks	In experimental design, similar individuals are put into groups called blocks to reduce variability.
Blind	An experiment is called blind when the individuals do not know which treatment they are receiving, but the researchers does.
Double Blind	An experiment is called double blind when neither the individuals nor the researchers know what treatment the individuals are taking.
Scatterplot	A way to visualize the relationship between two quantitative variables, where the variables are plotted in an X Y graph.
Dot Plot	A way to visualize one quantitative variable on a number line.
Mean	Also referred to as the average or arithmetic mean. It is a way to measure the center of your data.
Distribution	The shape of the data that shows how often values occur.
Weighted Mean	A different way to compute the average that gives more importance to certain values.
Histrogram	A method to visualize the distribution of your data that uses rectangles to show the frequency of values.
Right Skewed	A way to describe a distribution where there are some extreme high values. The mean will be higher than the median. In a histogram, there will be a long right tail.
Left Skewed	A way to describe a distribution where there are some exreme low values. The mean will be lower than the median. In a histogram, there will be a long left tail.
Symmetrical	A way to describe a distribution where the frequency of values lower than the mean will mirror the frequency of higher values than the mean.
Mode	The values that appears most often in a data set.
Unimodal	A distribution with one peak or one unique mode.
Bimodal	A distribution with two peaks or two unique modes.
Multimodal	A distribution with more than two peaks or more than two modes.
Deviation	How far away a value is from the mean.
Variance	A measurement of how spread out a data set in units squared.
Standard Deviation	A measurement of the spread of a data set. It is often chosen over the variance because of its mathematical properties and it is reported with the original units of the data.
Box Plot	A method to visualize the five number summary of a data set
Median	Also called the second quartile. The midpoint of a data set where 50% of the data is below this value
Interquartile Range	A measure of the spread of the middle 50% a data set.
First Quartile	The 25th percentile. 25% of the data is below this value.
Third Quartile	The 75th percentile. 75% of the data is below this value.
Outliers	An extreme value relative to the rest of the data.
Robust Estimates	A statistic that changes very little due to high variability or extreme values.
Contingency Table	A table that shows the frequency between two categorical variables.
Frequency Table	A table that shows the frequency for one categorical variable.
Relative Frequency Table	Similar to a contingency table, but instead shows the percentage or proportion rather than the count.
Bar Plot	A method to visualize the frequency of one categorical variable.
Segmented Bar Plot	A method to visualize a contigency table.
Mosaic Plot	A method to visualize the frequency of either one or two categorical variables.
Pie Chart	A method to visualize the frequency of one categorical variable.
Law of Large Numbers	The sample mean will reach the population mean as the number of observations increases.
Mutually Exclusive	Also referred to as disjoint. Two outcomes are called disjoint when they cannot happen at the same time.
Venn Diagrams	A way to visualize what outcomes are different or similar between usually two or three events.
Sample Space	The set of all possible outcomes of an event.
Complement	It is the outcomes that are not in the event of interest.
Marginal Probabilities	The probability of an event based on a single variable.
Joint Probabilities	The probability of an event based two or more variables.
Conditional Probability	The probability of an outcome based on the outcome of another event.
Tree Diagrams	A technique to visualize the outcomes of an event and its probabilites.
Random Variable	Usually represented as a capital letter, it is a random process with a numerical outcome.
Expected Value	The expected average of an event.
Probability Density Function	A PDF represents the outcomes and probability for a continuous variable.
Nornal Distribution	A distribution that is symmetric, unimodal, and bell shaped. It is centered around the mean and tapers off on both ends.
Standard Normal Distribution	A normal distribution with a mean of 0 and a standard deviation of 1.
Parameters	A quantity that describes the population or distribution.
Z-Score	The number of standard deviations a value is from the mean.
Percentile	The percentage of data that is below some value.
Bernoulli Random Variable	A discrete random variable where it only has two outcomes.
Binomial Distribution	Describes the probability of having exactly k "successess" in n independent Bernoulli Trials.
Negative Binomial Distribution	Describes the probability of observing the kth "success" on the n trial.
Poisson Distribution	Describes the proability of a given number events occurring over a fixed period of time.
Population Mean	A parameter that describes the average of the population of interest.
Sample Mean	A statistic that describes the average of the sample.
Point Estimate	Considered a "best guess" to estimate an unknown population parameter.
Sampling Variation	The variation in estimates that arises from multiple random samples.
Sampling Distribution	The distribution from the repeated random sampling of a statistic.
Standard Error	The standard deviation of the sampling distribution.
Confidence Interval	A plausible range of values for a population parameter.
Margin of Error	The error that is permitted in a confidence interval
Null Hypothesis	The commonly accepted fact.
Alternative Hypothesis	The claim to be tested.
Type 1 Error	Also known as a "false positive". We reject the null hypothesis when the null hypthesis is true.
Type 2 Error	Also known as a "false negative". We accept the null hypothesis when the alternative hypothesis is true.
Significance Level	The probability for a Type 1 Error. It is also the value that is compared with the P-value.
P-Value	The probability of obseraving data as extreme or more, if the null hypothesis is true. A small p-value indicates strong evidence for the alternative hypothesis, which means rejecting the null hypothesis.
Central Limit Theorem	As the sample size increases, the sampling distribution for the mean will become normal.
Test Statistic	A summary statistic used to evaluate a hyothesis test and to calculate the p-value.
Statistically Significant	When the null hyothesis is rejected, results are deemed statistically signficant, but they might not always be practically significant.
Degrees of Freedom	Describes the shape of the t-distribution.
Pooled Standard Deviation	A statistic calculated by two samples to better estimate the standard deviation.
Rejection Regions	A range of values that determine when to reject the null hypothesis, given the test statistic.
Power	The probability of rejecting the null hypothesis when it is false.
Effect Size	A quantity used to determine what is practically significant.
Analysis of Variance	A test that is used to analyze whether the mean of multiple groups are equal.
Mean Square Between Group	Also known as Mean Square Treatment. It is the amount of variability between groups.
Mean Square Error	The amount of variability within each group.
Bonferroni Correction	A technique used to compare the mean of multiple groups. It is used to prevent the probability of a type 1 error from increasing.
Pooled Proportion	A statistic calculated by two samples to estimate the proportion.
Predictor	Also known as the explanatory or independent variable. It is a variable used to estimate the outcome of another variable.
Residuals	The difference between the observed value and the estimated value.
Correlation	The strength of a linear relationship between two variables. Measured between -1 and 1.
Extrapolation	Estimating values that is beyond the range of the observed data.
R-Squared	Measures how much of the variability can be explained by the model. It is also used to evaluate how good a linear model fits.
Indicator Variable	A variable used in a linear model to include a categorical variable as a predictor.
High Leverage	A data point that is extreme within the predictor variable.
Influential Point	A data point that has a major influenfce on the slope of the fitted line.
Collinear	Two predictor variables that are correlated.
Adjusted R-Square	An R-Squared measurement that takes into account multiple predictor variables.
Diagnostic Plots	Various graphs to analyze the assumptions required for regression.
Logistic Regression	A way to predict the response of a categorical variable with two categories.
Logit Transformation	A function used to map the odds of a value to a probability.