Introduction to Statistics and Data Analysis 5th 2016
Statistics terminology
- make informed judgments : 做出明智的判断/ Summarize the available data in a useful and informative manner : 以有用和有益的方式汇总可用数据
- We hope that this textbook will help you to understand the logic behind statistical reasoning, prepare you to apply statistical methods appropriately, and enable you to recognize when statistical arguments are faulty : 我们希望这本教科书能帮助您理解统计推理背后的逻辑,为您准备适当地应用统计方法,并使您能够识别出统计论证有误的情况。
- quantifying the chance of an incorrect conclusion : 量化得出错误结论的可能性
- In general, data are continuous when observations involve making measurements, as opposed to counting.--- 一般来说,当观测涉及测量而不是计数时,数据是连续的。
- Sampling variability—the extent to which samples from the same population differ from one another and from the population—is a central idea in statistics
- Be sure to include scales and labels on the axes of graphical displays : 确保在图形显示的轴上包括刻度和标签
In descriptive statistics, a box plot or boxplot is a method for graphically depicting groups of numerical data through their quartiles : 在描述统计学中,箱线图是一种通过四分位数以图形方式描绘数值数据组的方法。
Interquartile range(IQR) 四分位距;四分位差;四分间距;四分位间距;四分位范围
A PDF(probability density function) is used to specify the probability of the random variable falling within a particular range of values, as opposed to taking on any one value. This probability is given by the integral of this variable’s PDF over that range
The Mathematics Behind Principal Component Analysis (PCA) : 主成分分析背后的数学; The central idea of principal component analysis (PCA) is to reduce the dimensionality of a data set consisting of a large number of interrelated variables while retaining as much as possible of the variation present in the data set --- 主成分分析(PCA)的中心思想是减少由大量相互关联的变量组成的数据集的维数,同时尽可能保留数据集中存在的变化。
Such a number describes roughly where the data are located or “centered” along the number line, and is called a measure of center : 这样的数字大致描述了数据沿数字线的位置或“居中”,称为中心度量。
A particular line is said to be a good fit to the data if the deviations from the line are small in magnitude : 如果与某条线的偏差幅度很小,则认为该条线非常适合数据
Coefficient of determination: The proportion of variation in y that can be attributed to an approximate linear relationship between x and y : 确定系数:y的变化比例可以归因于x和y之间的近似线性关系
Ch6 Probability
- Mutually exclusive: Two events are mutually exclusive if they have no outcomes in common : 如果两个事件没有共同的结果,则这两个事件是互斥的
- describe the long-run relative frequency of occurrence of various types of outcomes : 描述各种结果发生的长期相对频率
- uniform distribution : 均匀分布/ normal probability plot of the data : 数据的正态概率图
- that sampling distribution describes sample-to-sample variability in the values of a statistic : 该抽样分布描述了统计值中的样本间变异性。
Ch9 Estimation
- unbiased estimator of a population characteristic : 总体特征的无偏估计; 总体比例 population proportion
- the relationships between sample size, margin of error, and the width of a confidence interval : 样本量、误差范围和置信区间宽度之间的关系
- construct and interpret a confidence interval for a population mean : 构造和解释总体均值的置信区间
- Unbiased statistic: A statistic whose mean value is equal to the value of the population characteristic being estimated --- 无偏统计:其平均值等于要估计的总体特征值的统计
- Confidence interval: An interval of plausible values for a population characteristic --- 置信区间:一个总体特征的可信值的区间。confidence level --- 置信水平<置信水平是指总体参数值落在样本统计值某一区内的概率;而置信区间是指在某一置信水平下,样本统计值与总体参数值间误差范围。置信区间越大,置信水平越高>
The confidence level provides
information on how much “confidence” we can have in the method used to construct the interval estimate;
Confidence level: The success rate of the method used to construct a confidence interval.Standard error: The estimated standard deviation of a statistic.
bound on error of estimation : 估计误差的界限
comparing two population or treatment : 比较两个总体或两种处理条件
Ch10 Hypothesis Testing
- A test of hypotheses is a method that uses sample data to decide between two competing claims (hypotheses) about a population characteristic.
- Power of a test: The probability of rejecting the null hypothesis
- estimate the difference between two population means or to test hypotheses about this difference : 估计两个总体均值之间的差异,或检验关于该差异的假设
- the differences between goodness-of-fit tests, tests for homogeneity, and tests of independence : 拟合优度测试,同质性测试和独立性测试之间的差异