 A dot plot is a graphical display of data.
 A boxandwhisker plot is a graphical way to display the median, quartiles, and extremes of a data set to show the distribution of the data.
 Stemandleaf plot is a method of organizing numerical data in order of place value.
 A frequency table show the number of times each score appears.
 In statistics, a histogram is a graphical representation of the distribution of data. It is an estimate of the probability distribution of a continuous data.
 Kernel density estimation is a way to estimate the probability density function of a random variable.
 The standard score, zvalue, zscore, or normal score, is a measurement of a score's relationship to the mean. A zvalue of 0 means the score is the same as the mean. A zscore can also be positive or negative, indicating whether it is above or below the mean and by how many standard deviations. where μ is the mean and σ is the standard deviation of the population.
 Covariance measures the degree to which two variables change together. The covariance of two variables is positive if they vary together in the same direction.
 A covariance matrix is a matrix whose element in the i, j position is the covariance between the i^{th} and j^{th} elements of a random vector.
 Scatter plot is a graph of plotted points that show the relationship between two sets of data.
 A parametric test makes assumptions about the parameters of the population distribution from which data are drawn.
 Normality tests are tests of the null hypothesis that a sample has been drawn from a normal distribution. Tests of normality include the quantilequantile plot (QQ plot), the probabilityprobability plot (PP plot), skewness, kurtosis, the JarqueBera test, the ShapiroWilk W test, the KolmogorovSmirnov normality test, W/S normality test, D'Agostino's D normality test, D'Agostino Ksquared test and RyanJoiner test.
 A confidence interval for the mean gives an estimated range of values which is likely to include an unknown population mean, the estimated range being calculated from a given set of sample data.
 The one sample ttest is used when we want to know whether our sample comes from a particular population. The onesample ttest is used only for tests of the sample mean.
 The two sample ttest is used for the significance of the difference between the means of two independent samples. where
 The paired ttest determines whether two paired sets of data differ from each other in a significant way under the assumptions that the paired differences are independent and normally distributed.
 The oneway analysis of variance (oneway ANOVA) is used to compare means of two or more samples (using the Fdistribution). The ANOVA tests the null hypothesis that samples in two or more groups are drawn from population with the same mean values. where
 When the decision from the oneway ANOVA is to reject the null hypothesis, it means that at least one of the means is not the same as the other means. What we need is a way to figure out where the difference lie, not just that there is a difference. This is where the Scheffe test, the Least Square Difference (LSD) test, and Tukey honest significance difference (HSD) test come into play. They will help us analyze pairs of means to see if there is a difference.
 The twoway analysis of variance is an extension to the oneway analysis of variance. There are two independent variables. The twoway ANOVA examines the influence of different independent variables on one dependent variable.
 The sample standard deviation is just a value calculated from a sample of data. Sometimes we compute the confidence interval for a standard deviation.
 A chisquare test for variance can be used to test if the variance of a population is equal to a certain value.
 For the equality of the variances of two populations, an Ftest can be used.
 Bartlett's test is used to test if three or more samples are from populations with equal variances.
 Pearson correlation coefficient is a measure of the linear correlation between two variables. It requires both variables to be measured on an interval or ratio scale. where
 Simple regression analysis is a statistical process for estimating the relationships between one independent variable and one dependent variable. According to the functional model, simple regression analysis include linear, quadratic, exponential, logarithmic, sigmoid, etc
 When there are two or more independent variables and one dependent variable, multiple regression analysis can be used. This test is used also when there is an interaction between two independent variables.
 Analysis of covariance is a combination of analysis of variance (ANOVA) and linear regression. It tests whether there is a significant difference between groups after controlling for variance explained by a covariate. A covariate is a continuous variable that correlates with the dependent variable.
 A nonparametric test makes no assumptions about the parameters of the population distribution from which data are drawn.
 Randomness tests are used to analyze the distribution pattern of a set of data. Randomness test for a sequence tests the null hypothesis that the observations in the sample are randomly generated. Randomness test for two samples tests the null hypothesis that data are randomly allocated into two samples.
 One sample sign test is a nonparametric equivalent of the one sample ttest in parametric test.
 WaldWolfowitz Runs Test is a nonparametric test that examines whether two populations differ in central tendency, variances, skewness, or any other distribution pattern.
 MannWhitney U test is used to compare differences between two independent groups when the dependent variable is either ordinal or continuous, but the assumption of normal distribution of data is not necessary for use of this test. The MannWhitney U test is nonparametric equivalent of the parametric two sample ttest. where n_{1} is the sample size for sample 1, and R_{1} is the sum of ranks in sample 1. where n_{2} is the sample size for sample 2, and R_{2} is the sum of ranks in sample 2. The smaller value of U_{1} and U_{2} is the value of U, the test statistic of MannWhitney U Test.
 When there are two samples of data, KolmogorovSmirnov test is used to test whether or not these two samples may come from the same distribution. This test does not require the assumption that the population is normally distributed.
 FlignerPolicello test is a nonparametric test that compare two location parameters (medians) from distributions. where and
 Two sample sign test can be used to test the hypothesis that the difference median is zero between the distributions of two random variables.
 Wilcoxon signedrank test is designed to test a hypothesis about the median of a population distribution. It often involves the use of matched pairs, for example, before and after data.
 KruskalWallis H test is used to test whether multiple samples have come from the same distribution. This test does not assume that the data are normally distributed. It is the nonparametric analogue of the parametric oneway analysis of variance.
 Median test (Mood's Median Test) is a nonparametric method that tests the null hypothesis that the medians of the populations from which two or more samples are drawn are identical.
 JonckheereTerpstra Test (Jonckheere's Trend Test) is a nonparametric test for an ordered alternative hypothesis within an independent samples design.
 Friedman test is used to detect differences in treatments across multiple test subjects. This test uses chisquare distribution. It is a nonparametric equivalent of the parametric twoway analysis of variance. where R_{.j}^{2} is the square of the ranks total for treatment j, n is the number of blocks, and k is the number of treatments.
 Quade test is used to detect differences in treatments across multiple test subjects. This test uses Fdistribution. It is a nonparametric equivalent of the parametric twoway analysis of variance.
 Page's L trend test is useful when trends between several variables are examined. The data sets must be arranged so that the rank sum of each treatment, or conditions are in ascending order.
 SkillingsMack test is used to detect differences in the treatments across multiple test subjects. It can be used for the data set with missing values. With no missing values, SkillingMack test provides the same result as the Friedman test.
 Durbin test is a nonparametric test for the balanced incomplete block design (BIBD). This test is used to detect differences in the treatments across multiple test subjects. It reduces to the Friedman test in case of complete block design. where where t is the number of treatments, k is the number of treatments per block, b is the number of blocks, and r is the number of times each treatment appears.
 Monte Carlo Simulation, or Monte Carlo Method, is a problem solving technique used to approximate the probability of certain outcomes by running multiple trial runs using random variables. In Statext, Monte Carlo Simulation is used in Friedman test Quade test SkillingsMack test, and Durbin test.
 SiegelTukey test is used to determine if one of two groups of data tends to have more widely dispersed values than the other.
 Levene's test is used to test whether two or more samples have equal variances. Some statistical tests assume that variances are equal across groups or samples. The Levene's test can be used to verify that assumption. Levene's test is an alternative to the Bartlett's test. The Levene's test is less sensitive than the Bartlett's test to departures from normality. If you are not sure that your data come from a normal distribution, then Lenvene's test can be a better choice.
 Spearman's rank correlation coefficient, like the Pearson r, measure the strength of relationship between two variables. While the Pearson correlation coefficient requires both variables to be measured on an interval or ratio scale, the Spearman's correlation coefficient only requires data that are at least ordinal. where d_{i} = x_{i}  y_{i}, is the difference between ranks. When there are tied ranks, the following can be used:
 Kendall coefficient W is a number of measures of correlation. It is an extension of Spearman correlation procedure to more than two groups. Assume there are m raters rating n subjects in ranking order from 1 to n. Then where or where
 Categorical data represent types of data which may be divided into groups. Examples of categorical data are sex, T/F, and educational level.
 Counting nominal data is a basic step before any statistical procedures.
 One sample proportion test is used to compare a sample proportion to a specific value, or a population proportion.
 Two sample proportion test compares two sample proportions to see if there is a difference.
 Chisquare goodnessoffit test is used when you have one categorical variable from a population. It can be used to determine whether sample data are consistent with a hypothesized, or expected distribution.
 Chisquare independence testis used when you have two categorical variables from a population. It can be used to determine whether there is a significant association between the two variables.
 Chisquare homogeneity test is used to a single categorical variable from two different populations. It can be used to determine whether frequency counts are distributed identically across different populations.
 G Test is likelihoodratio test that can be used in situations where chisquared tests are used.
 Fisher's exact test is a statistical test to determine if there are association between two categorical variables. Fisher's exact test is more accurate than the chisquare test of independence when the expected numbers are small. The probability of obtaining any such set of values is given by the hypergeometric distribution:
 McNemar test is a test on a 2x2 contingency table when you want to test the difference between paired proportions. One of the following may be used as the McNemar test statistic:
 Cochran Q test is a nonparametric test to verify if several treatments have identical effects. Cochran Q test is an extension to the McNemar test for three or more matched sets of frequencies or proportions. where k is the number of treatments, X_{.j} is the column total for the j^{th} treatment, b is the number of blocks, X_{i.} is the row toal for the i^{th} block, and N is the grand total.
 The degree of association between two variables can be assessed by number of coefficient, such as the phi coefficient, the contingency coefficient, and Cramer's V.
 Binary logistic regression is used to analyze a data set in which there are one or more independent variables that determine an outcome. The outcome is measured with only two possible outcomes.
 Principal Components Analysis (PCA) reduces the number of observed variables to a smaller number of principal components which account for most of the variance of the observed variables. It is used when variables are highly correlated and tries to reexpress the data as a sum of uncorrelated components. PCA is a powerful tool for analysing data. The other main adavantage of PCA is that once you have found these patterns in the data, and you compress the data by reducing the number of dimensions, without much loss of information.
 A standard normal distribution is a normal distribution with mean=0 and variance=1. It is described by the probability density function:
 Suppose we have a random sample of size n drawn from a normal population with mean μ and standard deviation σ Let m denote the sample mean and s, the sample standard deviation. Then the quantity
has a Student's tdistribution with n1 degrees of freedom.
 If Y_{i} have independent standard normal distribution, then has a chisquare distribution with r degrees of freedom.
 If a random variable X has a chisquared distribution with m degrees of freedom and a random variable Y has a chisquared distribution with n degrees of freedom, and X and Y are independent, then is distributed as Fdistribution with m and n degrees of freedom.
 The Central Limit Theorem says that many samples containing n elements from a population, the distribution of the means of those samples is approximately normal when n is large enough. It is very useful because it tells us that if n is large enough, distributions can be treated as normal distributions.
 Binomial distribution is the distribution of a total number of successes in a given number of trials when there are exactly two mutually exclusive outcomes of a trial. If the random variable X follows the binomial distribution with parameter n and p, the probability of getting exactly k successes in n trials is given by where
 Poisson distribution is a probability distribution which arises when counting the number of occurrences of a rare event in a long series of trials. A discrete random variable X is said to have a Poisson distribution with the population mean, if for k=0, 1, 2, ..., the probability function of X is given by where e is the base of the natural logarithm, and k! is the factorial of k.
 The inverse matrix of a square matrix A, is a matrix A^{1} such that AA^{1}=I, where I is the identity matrix.
