Skip to main content
Indiana University Bloomington




H100 Statistical Literacy, Honors (3 cr.)
P: MATH M014 or equivalent. Permission of the Honors College is required.

How to be an informed consumer of statistical analysis. Experiments and observational studies, summarizing and displaying data, relationships between variables, quantifying uncertainty, drawing statistical inferences. S100 cannot be taken for credit if credit has already been received for any statistics course (in any department) numbered 300 or higher.

S100 Statistical Literacy (3 cr.)

How to be an informed consumer of statistical analysis. Experiments and observational studies, summarizing and displaying data, relationships between variables, quantifying uncertainty, drawing statistical inferences. S100 cannot be taken for credit if credit has already been received for any statistics course (in any department) numbered 300 or higher.

S300 Introduction to Applied Statistical Methods (4 cr.)
P: MATH M014 or equivalent

Introduction to methods for analyzing quantitative data. Graphical and numerical descriptions of data, probability models of data, inferences about populations from random samples. Regression and analysis of variance. Lecture and laboratory. Credit given for only one of the following: S300, CJUS K300, ECON E370 or S370, LAMP L316, MATH K300 or K310, PSY K300 or K310, SOC S371, SPEA K300.

S301 Applied Statistical Methods for Business (3 cr.)
P: MATH M118 or equivalent.

Introduction to methods for analyzing data arising in business, designed to prepare business students for the Kelley School___s Integrative Core. Graphical and numerical descriptions of data, probability models, fundamental principles of estimation and hypothesis testing, applications to linear regression and quality control. Microsoft Excel is used to perform analyses. 3 hours lecture. Credit given for only one of the following: STAT S300 or S301 or S310, CJUS K300, ECON E370 or S370, LAMP L316, MATH K300 or K310, PSY K300 or K310, SOC S371, SPEA K300.

S303 Applied Statistical Methods for the Life Sciences (3 cr.)

Introduction to methods for analyzing data in the life sciences, designed for Biology, Human Biology, and premedical students. Graphical and numerical descriptions of data, probability models, fundamental principles of estimation and hypothesis testing, inferences about means, correlation, linear regression. Learning Objectives: (1) Demonstrate how statistical reasoning is used in biological, medical, and agricultural research; (2) raise awareness of basic statistical issues such as randomization, confounding, and the role of independent replication; (3) provide hands-on experience in the use of statistical methods to analyze data, thereby enabling students to associate different kinds of questions with appropriate statistical methods; (4) perform simple statistical analyses and interpret the results.

K310 Statistical Techniques (3 cr.)
P: N & M P: MATH M119 or equivalent.

Introduction to probability and statistics. Elementary probability theory, conditional probability, independence, random variables, discrete and continuous probability distributions, measures of central tendency and dispersion. Concepts of statistical inference and decision: estimation, hypothesis testing, Bayesian inference, statistical decision theory. Special topics discussed may include regression and correlation, time series, analysis of variance, nonparametric methods. Credit given for only one of K310 or S300, ANTH A306, CJUS K300, ECON E370 or S370, MATH K300 or K310, POLS Y395, PSY K300 or K310, SOC S371, or SPEA K300.

S320 Introduction to Statistics (3 cr.)
P: MATH M212 or M301 or M303

S320 introduces the basic concepts of statistical inference through a careful study of several important procedures. Topics include 1- and 2-sample location problems, the one-way analysis of variance, and simple linear regression. Most assignments involve applying probability models and/or statistical methods to practical situations and/or actual data sets.

Prerequisites: No previous knowledge of probability is assumed; S320 is recommended for students who wish to take a single, self-contained semester of statistics that emphasizes analyzing data. We will use several basic concepts from calculus; hence, S320 has a prerequisite of MATH M212.

Who Should Take This Course?

As reflected by the large number of introductory statistics courses at IU, there are a great many different ways to begin the study of statistics. The best way to have a positive experience with statistics is to take a course that provides the kind of experience that you want to have.

The Department of Statistics offers three introductory statistics courses. STAT S100 emphasizes quantitative reasoning skills and statistical literacy. It should make you a more critical consumer of the quantitative information that you encounter in newspapers, magazines, etc.; however, it is not the purpose of S100 to introduce you to a variety of methods for analyzing experimental data.

Both STAT S300 and STAT S320 emphasize using statistical methods to analyze data. Such "methods" courses come in a variety of flavors. Most describe recipes for analyzing data and use a statistical software package in which these recipes have been implemented. S300 is a good example of such courses. Many other departments at IU offer an introductory statistics course of this type.

S320 provides greater emphasis on understanding fundamental principles of statistical inference than does S300. S320 differs from typical methods courses in the following respects:

点点点点点* Greater emphasis on why a method works. Many courses explain how, but provide little explanation of why.

点点点点点* Greater depth, less breadth. Many courses provide superficial coverage of a great many topics; S320 covers fewer topics, but in considerably more detail. Students desiring knowledge of procedures not covered in this course are strongly encouraged to take additional statistics courses. S320 is the gateway to majoring in statistics.

点点点点点* More math. S320 is not a theoretical course (like STAT S420) and it does not use sophisticated mathematics. However, S320 does introduce a good deal of mathematical notation and it does assume that students are comfortable plugging numbers into formulas.

点点点点点* Interactive computing. Rather than use a statistical computer package as a "black box," S320 relies on computer tools that simplify the computational burden but which require the student to understand how the analysis is to be performed.

In a nutshell: Students in the empirical sciences collect and analyze data, often using computer software that they don't understand. S320 was designed for students who really want to *understand* what they're doing when they perform such analyses.

S420 Introduction to Statistical Theory (3 cr.)
P: STAT S320 and MATH M463, or consent of instructor

Fundamental concepts and principles of data reduction and statistical inference, including the method of maximum likelihood, the method of least squares, and Bayesian inference. Theoretical justification of statistical procedures introduced in S320.

S425 Nonparametric Theory and Data Analysis (3 cr.)
P: STAT S420 and S432, or consent of instructor

Survey of methods for statistical inference that do not rely on parametric probability models. Statistical functionals, bootstrapping, empirical likelihood. Nonparametric density and curve estimation. Rank and permutation tests.

S426 Bayesian Theory and Data Analysis (3 cr.)
P: Consent of instructor

Introduction to the theory and practice of Bayesian inference. Prior and posterior probability distributions. Data collection, model formulation, computation, model checking, sensitivity analysis.

S431 Applied Linear Models I (3 cr.)
P: STAT S320 and MATH M301, or consent of instructor

Part I of a 2-semester sequence on linear models, emphasizing linear regression and the analysis of variance, including topics from the design of experiments and culminating in the general linear model.

S432 Applied Linear Models II (3 cr.)
P: STAT S431, or consent of instructor

Part II of a two semester sequence on linear models, emphasizing linear regression and the analysis of variance, including topics from the design of experiments and culminating in the general linear model.

S437 Categorical Data Analysis (3 cr.)
P: Consent of instructor

The analysis of crossclassified categorical data. Loglinear models; regression models in which the response variable is binary, ordinal, nominal, or discrete. Logit, probit, multinomial logit models; logistic and Poisson regression.

S439 Multilevel Models (3 cr.)
P: STAT S420 and S432, or consent of instructor

Introduction to the general multilevel model with an emphasis on applications. Discussion of hierarchical linear models, and generalizations to nonlinear models. How such models are conceptualized, parameters estimated and interpreted. Model fit via software. Major emphasis throughout the course will be on how to choose an appropriate model and computational techniques.

S440 Multivariate Data Analysis (3 cr.)
P: STAT S420 and STAT S432, or consent of instructor

Elementary treatment of multivariate normal distributions, classical inferential techniques for multivariate normal data, including Hotelling___sT2 and MANOVA. Discussion of analytic techniques such as principal component analysis, canonical correlation analysis, discriminant analysis, and factor analysis.

S445 Covariance Structure Analysis (3 cr.)
P: STAT S420 and S440, or consent of instructor

Path analysis. Introduction to multivariate multiple regression, confirmatory factor analysis, and latent variables. Structural equation models with and without latent variables. Mean-structure and multi-group analysis.

S450 Time Series Analysis (3 cr.)
P: Consent of instructor

Techniques for analyzing data collected at different points in time. Probability models, forecasting methods, analysis in both time and frequency domains, linear systems, state-space models, intervention analysis, transfer function models and the Kalman filter. Topics also include: Stationary processes, autocorrelations, partial autocorrelations, autoregressive, moving average, and ARMA processes, spectral density of stationary processes, periodograms and estimation of spectral density.

S455 Longitudinal Data Analysis (3 cr.)
P: STAT S420 and S432, or consent of instructor

Introduction to methods for longitudinal data analysis; repeated measures data. The analysis of change - models for one or more response variables, possibly censored. Association of measurements across time for both continuous and discrete responses.

S460 Sampling (3 cr.)
P: STAT S420 and S432, or consent of instructor

Design of surveys and analysis of sample survey data. Simple random sampling, ratio and regression estimation, stratified and cluster sampling, complex surveys, nonresponse bias.

S470 Exploratory Data Analysis (3 cr.)
P: STAT S420 and S432, or consent of instructor

How do you analyze data? When faced with data from various sources, of various types, what questions should one ask, and what clues can we find in the data to further our understanding?

Statistics, broadly defined, is the science of and art of analyzing data. Many statistical procedures require formal probability model structures with parameters, and statistical methods offer tools for estimating those model parameters. Sometimes the assumptions governing those models hold, but often they do not. What analyses can provide insight into the data and the underlying mechanisms while being insensitive to model assumptions? Nonparametric methods are distribution-free, but some prior analysis is needed to understand the data.

Exploratory data analysis is a philosophy of analyzing data. The ubiquity of data and the emergence of "data mining" makes this course essential for anyone who wants to analyze data. In this course, we will learn many different tools for data analysis as well as the commands and programs in R (free statistical software) for conducting these analyses. Some prior familiarity with statistical methods is assumed. Those who have had formal statistics courses can take the course at a higher level, where connections between EDA tools and mathematical statistical methods will be developed. This course is valuable to anyone who has data to analyze. It is also a lot of fun; students learn a lot.

Course objectives: Introduce philosophy of exploratory data analysis; Teach tools for the analysis of data; Provide opportunities for analyzing data (R/S-Plus); Demonstrate the value of oral/written communication skills; Offer experience in preparing oral and written reports of data analyses.

Topics: The philosophy of exploratory versus confirmatory data analysis Summarizing batches of data: Stem-and-leaf diagrams, boxplots, qq plots, Data Transformations (ladder of re-expressions), Jackknife and bootstrap, Two-way and three-way analyses (median polish), Standardization, Fitting robust-resistant lines (least absolute deviations), Analyzing count data

S475 Statistical Learning and High-Dimensional Data Analysis (3 cr.)
P: STAT S440 or consent of instructor

Data-analytic methods for exploring the structure of high-dimensional data. Graphical methods, linear and nonlinear dimension reduction techniques, manifold learning. Supervised, semisupervised, and unsupervised learning.

This course surveys various data-analytic approaches to detecting structure in multivariate data sets. Many of the topics covered are active areas of research in multivariate statistics and machine learning. High-dimensional data sets arise in many applications, e.g., gene expression levels from a microarray experiment. Techniques for high-dimensional data are useful in a wide variety of disciplines; I plan to emphasize applications to bioinformatics and text mining.

Here is a rough outline of the topics that I expect to cover:

1. Multivariate Data. Data matrices, proximity matrices and graphs. Labeled and unlabeled data.

2. Graphical Methods for Exploring Multivariate Data. Scatterplots in two and three dimensions, grand tours, projection pursuit. Parallel coordinates. Brushing.

3. Dimension Reduction. Linear techniques: principal component analysis, biplots and $h$-plots, principal coordinate analysis. Spectral techniques for manifold learning: Isomap, Locally Linear Embedding, Laplacian eigenmaps, diffusion maps. Nonspectral embedding techniques and their application to dimension reduction.

4. Supervised Learning. Linear/quadratic discriminant analysis, nearest neighbor methods, distance/metric learning, support vector machines. Multiple kernel learning.

5. Unsupervised learning. K-means clustering, self-organizing maps, iterative denoisong.

Text: I will rely on my own lecture notes and various talks, technical reports, and papers from the literature.

The essential prerequisite for this course is some familiarity with linear algebra (vectors, matrices, eigenvalues, etc.). We will use a high-level statistical programming language (R), so some previous experience with a computer programming language would be helpful. Previous exposure to classical multivariate statistical methods is helpful, but not essential.

For more information, please visit the course web page:

If you are uncertain whether or not you have the background to take this course, please contact me at .

Computer Science graduate students may count STAT S675 as an Area 5 (Artificial Intelligence) course for the purpose of fulfilling their area distribution requirements. Any student who intends to do so should notify Amr Sabry .

S481 Topics in Applied Statistics - Statistics in Legal Applications (3 cr.)
P: Consent of instructor

This course will cover applications of statistics in legal settings. Depending upon the background in statistics of the course participants, we will cover statistical concepts as needed. The focus will be on legal situations where statistical analyses were essential. Some examples include employment discrimination, disputed wills, and forensic science (including recent reports from the National Academies). Participants are invited to present their own cases for class discussion. Active participation will be expected, and interim and final projects will be required.

S481 Topics in Applied Statistics (3 cr.)
P: Consent of instructor

Network Science Network science is concerned with the relationships between individuals, organizations, groups, and other "social" entities. This methodological and theoretical approach to the social world has gained interest in fields across the social, behavioral and political sciences - and shares much in terms of methods with network studies in the natural sciences. At the core of the field is attention to the interconnected nature of actors and their relationships. This type of approach requires a different set of assumptions and analytical tools than standard statistical methods. This course will primarily focus on statistical methodology for relational data measured on groups of social actors. Topics to be discussed include an introduction to graph theory and the use of directed graphs to study structural theories of actor interrelations; structural and locational properties of actors, such as centrality, prestige, and prominence; subgroups and cliques; equivalence of actors, including structural equivalence, blockmodels, and an introduction to role algebras; an introduction to local analyses, including dyadic and triad analysis; and statistical global analyses, using models such as pl, p*, and their relatives. The course will also introduce data collection and harvesting methods, egocentric analysis, and the use of popular networks analysis and visualization software packages. This is not a course in network theory; it is a course in methodology, with emphasis on statistical approaches. Students are expected to attend lectures and register for lab sessions. Assignments will include regular lab exercises and a final network project. Students should have completed at least two upper level statistics courses or contact the instructors for permission to enroll in the course.

S481 Bayesian Modeling and Computation (3 cr.)
P: STAT S426 or consent of instructor

This course covers topics on stochastic simulation for Bayesian inference with a focus on Markov chain Monte Carlo (MCMC) techniques. The applications consider hierarchical models that range from regression analysis to more advanced settings such as mixture models, time series and spatial models. Free software programs such as R and BUGS will be used for data analysis. This is a follow-up class to S426-S626.

S481 Topics in Applied Statistics - Time Series II (3 cr.)
P: Consent of instructor

Time Series II

In the first Time Series I course in Fall 2008, we learned about time series from a dynamical systems and discrete time perspective. In this course, we will build on these skills by approaching time series analysis from a primarily spectral or frequency-based approach.

Topics will include:

Basic calculus review, complex numbers and variables, Fourier analysis, digital filters, spectral estimation, linear filtering in the frequency domain, noise models, sampling, aliasing, the discrete and fast Fourier transforms (DFT & FFT), Gibbs phenomenon, signal quantization, and the impact of these concepts on estimation and signal detection.

Advanced topics may include:

Wavelets, independent component analysis, image analysis (2D Fourier analysis), multivariate time series, nonlinear processes.

S481 Spatial Statistics (3 cr.)
P: Consent of Instructor

This course aims to introduce a variety of statistical methods in the spatial domain. We will introduce and discuss methods for three types of spatial data including geostatistical data, regional data and spatial point patterns. Major topics to be covered include spatial covariance functions, variograms, kriging, spatial autoregressive models, K function, etc.

S481 Statistical Methods for Causal Inference (3 cr.)
P: STAT S431 or consent of instructor. Stata will be the main statistical software used in the class. Basic knowledge in R programming is essential.

Many statistical problems are driven to identify and estimate causal effects. For example, whether a training program is effective in increasing trainee___s productivity and subsequently their income, whether and how much advantage incumbent candidates have during elections, whether and to what degree peers influence each other___s behaviors, whether and by how much a medicine or a surgical procedure helps patients reduce their sufferings, etc. Based on the potential outcomes framework, this course presents a series of methods to make causal inference that are distinctive from the traditional regression framework and that can be used to analyze both experimental data or observational data, including propensity score methods (e.g., propensity score regression, matching, stratification, and weighting), regression discontinuity design, difference-in-difference methods, instrumental variable estimation, natural experiments, methods for mediation and interaction, directed acyclic graphs (DAGs), etc. These concepts and methods will be illustrated by a variety of examples drawn from social and medical sciences.

S481 Functional Data Analysis (3 cr.)
P: Consent of instructor

This course will introduce students to methods for analyzing functional data --- i.e., data that are entire curves, rather than single observations or vectors of several measurements. Such data objects involve numerous highly-related points per object, and the methods for analyzing them make explicit use of data objects as functions. In contrast to multivariate analysis (multiple values per object that different degrees of associations) and longitudinal analysis or ``panel studies'' (where measurements are repeated on an individual at only a few time points), the methods here treat the object as a function. We discuss these methods which include graphical displays, summaries, analogs of conventional analyses (analysis of variance, principal components), with an emphasis on applications. The two required books will address both these goals (methodology and applications). The course is valuable for both data researchers whose data are entire functions (gait, spectra, etc.) as well as students interested in participating in a relatively new and important area of statistical research. Required textbooks: James Ramsay and Bernard Silverman, Functional Data Analysis (FDA), Springer. James Ramsay and Bernard Silverman, Applied Functional Data Analysis (AFDA), Springer.

S481 Multivariate Methods II (3 cr.)

This course requires knowledge in basic statistical theory, in particular the basics in multivariate statistical analysis. The actual content of this course depends on the audience and the teaching method depends on the number of students in the class. The course material will be published on the web. Thus, no textbook is required. The topics will, as a rule, be taught in depth and detail. The content will be chosen from the following list of topics/subtopics:

(I) Advanced multivariate models: (i) Conditional models, (ii) Covariance analysis, (iii) Growth curve models, (iv) Symmetry normal models, (v) Graphical normal/discrete models, (iv) Missing data normal models, (v) Mixture models.

(II) Eigenvalues: (i) Canonical correlations, (ii) Principal component analysis, (iii) Factor analysis models, (iv) Discriminant/classification analysis, (v) Testing and eigenvalues in multivariate normal models.

(III) Exponential families: (i) Generalized linear models, (ii) Multivariate logistic regression, (iii) Rasch measurement models, (iv) Conjugated priors, (v) EM/scorings algorithm, (vi) Testing in exponential families, (vii) Models in the Wishart distributions.

References: A.J. Izenman (2008) Modern Multivariate Statistical Techniques, Springer.

Brian S. Everitt (2005) An R and S-Plus Companion to Multivariate Analysis, Springer.

S482 Statistical Model Selection (3 cr.)
P: Consent of instructor

M-estimates are a broad class of statistical estimates obtained as the solution to an empirical optimization process. Typically, the population parameter is defined as the minimizer of a population risk function and its estimate is defined as the minimizer of the empirical risk. While M-estimates are known to enjoy many desirable properties, goodness of fit alone is not an adequate method for selecting the best among models of different "complexity." On the one hand, "simpler" models can be more revealing of the structure in the data. On the other hand, they are often restricted versions of more "complex" models and hence will never be preferred based on goodness of fit alone. In this course, we cover model selection techniques with an emphasis on variable selection in generalized linear models. We review classical variable selection methods such as AIC, BIC, and Mallows' Cp and discuss some of the computational issues involved. In addition, we introduce some alternative measures of the complexity of a model and review how they can be used for model selection purposes. Finally, we briefly review some of the issues specific to high-dimensional data sets and how they can be addressed.

S490 Statistical Consulting (4 cr.)
P: S631 and S632, or consent of instructor

This class will cover necessary skills for effective statistical consulting with applications derived from real consulting situations. Students will have the opportunity to engage with clients, perform analyses, and present the results of their consultations in class. Along the way students will learn practical methods of data analysis, questions that need to be considered for future studies, and ways of presenting data for different purposes. Clients and projects will come from the Indiana Statistical Consulting Center.

S495 Readings in Statistics (3 cr.)
P: Consent of instructor

Supervised reading of a topic in statistics. May be repeated with different topics for a maximum of 12 credit hours.