Skip to main content
Indiana University Bloomington

GRADUATE PROGRAM Courses

 

 

S501 Statistical Methods I: Introduction to Statistics (3 cr.)
P: One undergraduate course in statistics.

This course takes a systematic approach to the exposition of the general linear model for continuous dependent variables, including correlation, simple linear and multiple regression. Students are introduced to the use of statistical analysis software. This course is broken up into three sections. The first section covers fundamental concepts of quantitative data analysis: including measurement and presentation and an introduction to the notion used throughout the semester. The second section focuses on the assumptions and mechanics of the classical linear regression model. At the end of the second section you will have a good mechanical knowledge of regression analysis. The third section includes a practical exposition of the general linear model as we begin to relax the assumptions of the classical linear regression model. At the end of the third section you will have a deeper theoretical and applied understanding of the flexibility and limitations of the general linear regression model for social science data. At the end of this course students will be able to think creatively.

S502 Statistical Methods II: Experimental Design (3 cr.)
S503 Statistical Methods IIb: Generalized Linear Models and Categorical Data (3 cr.)
P: STAT S501, or one graduate course in statistics

This course takes a systematic approach to the exposition of the generalized linear models focusing on categorical data. Of primary concern will be models for which the response variable is categorical. Such models include probit, logit, ordered logit, and Poisson regression, among others. Students learn how to think creatively about the use of statistical methods in their own research.

S503 Statistical Methods IIb: Generalized Linear Models and Categorical Data (3 cr.)
P: STAT S501, or one graduate course in statistics

This course introduces techniques for categorical data analysis, focusing on models in which the dependent variable is either binary, ordinal, nominal or count. Such models include probit, logit, ordered logit and probit, multinominal logit, Poisson regression, negative binomial regression, and zero-inflated count models. Students learn how to apply these techniques in their own research.

S520 Introduction to Statistics (3 cr.)
P: MATH M212, M301, M303, or the equivalent.

Basic concepts of data analysis and statistical inference, applied to 1-sample and 2- sample location problems, the analysis of variance, and linear regression. Probability models and statistical methods applied to practical situations and actual data sets from various disciplines. Elementary statistical theory, including the plug-in principle, maximum likelihood, and the method of least squares.

S520 provides a strong introduction to elementary statistical methodology and a gentle introduction to elementary statistical theory. It meets concurrently with S320, but includes supplementary material not covered in that course. S520 introduces material that is covered in greater depth in S620 (Introduction to Statistical Theory), but less mathematically and in the context of actual experiments and data. It fulfills the theory requirement for the M.S. degree in Applied Statistics (currently under review).

S620 Introduction to Statistical Theory (3 cr.)
P: STAT S320 and MATH M463, or consent of instructor

Fundamental concepts and principles of data reduction and statistical inference, including the method of maximum likelihood, the method of least squares, and Bayesian inference. Theoretical justification of statistical procedures introduced in S320.

S625 Nonparametric Theory and Data Analysis (3 cr.)
P: Two statistics courses at the graduate level, or consent of instructor

Survey of methods for statistical inference that do not rely on parametric probability models. Statistical functionals, bootstrapping, empirical likelihood. Nonparametric density and curve estimation. Rank and permutation tests.

S626 Bayesian Theory and Data Analysis (3 cr.)
P: Two statistics courses at the graduate level, or consent of instructor

Introduction to the theory and practice of Bayesian inference. Prior and posterior probability distributions. Data collection, model formulation, computation, model checking, sensitivity analysis.

S631 Applied Linear Models I (3 cr.)
P: STAT S320 and MATH M301 or M303 or S303, or consent of instructor

Part I of a 2-semester sequence on linear models, emphasizing linear regression and the analysis of variance, including topics from the design of experiments and culminating in the general linear model.

S632 Applied Linear Models II (3 cr.)
P: STAT S631, or consent of instructor

Part II of a two semester sequence on linear models, emphasizing linear regression and the analysis of variance, including topics from the design of experiments and culminating in the general linear model.

S637 Categorical Data Analysis (3 cr.)
P: Two statistics courses at the graduate level, or consent of instructor

The analysis of crossclassified categorical data. Loglinear models; regression models in which the response variable is binary, ordinal, nominal, or discrete. Logit, probit, multinomial logit models; logistic and Poisson regression.

S640 Multivariate Data Analysis (3 cr.)
P: Two statistics courses at the graduate level, or consent of instructor

Elementary treatment of multivariate normal distributions, classical inferential techniques for multivariate normal data, including Hotellings T and MANOVA. Discussion of analytic techniques such as principal component analysis, canonical correlation analysis, discriminant analysis, and factor analysis.

S645 Covariance Structure Analysis (3 cr.)
P: Two statistics courses at the graduate level, or consent of instructor

Path analysis. Introduction to multivariate multiple regression, confirmatory factor analysis, and latent variables. Structural equation models with and without latent variables. Mean-structure and multi-group analysis. Course is equivalent to EDUC Y645.

S650 Time Series Analysis (3 cr.)
P: Two statistics courses at the graduate level, or consent of instructor

Techniques for analyzing data collected at different points in time. Probability models, forecasting methods, analysis in both time and frequency domains, linear systems, state-space models, intervention analysis, transfer function models and the Kalman filter. Topics also include: Stationary processes, autocorrelations, partial autocorrelations, autoregressive, moving average, and ARMA processes, spectral density of stationary processes, periodograms and estimation of spectral density. Course is equivalent to MATH M568.

S655 Longitudinal Data Analysis (3 cr.)
P: Two statistics courses at the graduate level, or consent of instructor

Introduction to methods for longitudinal data analysis; repeated measures data. The analysis of change - models for one or more response variables, possibly censored. Association of measurements across time for both continuous and discrete responses. Course is equivalent to EDUC Y655

S660 Sampling (3 cr.)
P: Two statistics courses at the graduate level, or consent of instructor

Design of surveys and analysis of sample survey data. Simple random sampling, ratio and regression estimation, stratified and cluster sampling, complex surveys, nonresponse bias.

S670 Exploratory Data Analysis (3 cr.)
P: Two statistics courses at the graduate level or consent of instructor

How do you analyze data? When faced with data from various sources, of various types, what questions should one ask, and what clues can we find in the data to further our understanding?

Statistics, broadly defined, is the science of and art of analyzing data. Many statistical procedures require formal probability model structures with parameters, and statistical methods offer tools for estimating those model parameters. Sometimes the assumptions governing those models hold, but often they do not. What analyses can provide insight into the data and the underlying mechanisms while being insensitive to model assumptions? Nonparametric methods are distribution-free, but some prior analysis is needed to understand the data.

Exploratory data analysis is a philosophy of analyzing data. The ubiquity of data and the emergence of "data mining" makes this course essential for anyone who wants to analyze data. In this course, we will learn many different tools for data analysis as well as the commands and programs in R (free statistical software) for conducting these analyses. Some prior familiarity with statistical methods is assumed. Those who have had formal statistics courses can take the course at a higher level, where connections between EDA tools and mathematical statistical methods will be developed. This course is valuable to anyone who has data to analyze. It is also a lot of fun; students learn a lot.

Course objectives: Introduce philosophy of exploratory data analysis; Teach tools for the analysis of data; Provide opportunities for analyzing data (R/S-Plus); Demonstrate the value of oral/written communication skills; Offer experience in preparing oral and written reports of data analyses.

Topics: The philosophy of exploratory versus confirmatory data analysis Summarizing batches of data: Stem-and-leaf diagrams, boxplots, qq plots, Data Transformations (ladder of re-expressions), Jackknife and bootstrap, Two-way and three-way analyses (median polish), Standardization, Fitting robust-resistant lines (least absolute deviations), Analyzing count data

S675 Statistical Learning and High-Dimensional Data Analysis (3 cr.)
P: Two statistics courses at the graduate level, or consent of instructor

Data-analytic methods for exploring the structure of high-dimensional data. Graphical methods, linear and nonlinear dimension reduction techniques, manifold learning. Supervised, semisupervised, and unsupervised learning.

This course surveys various data-analytic approaches to detecting structure in multivariate data sets. Many of the topics covered are active areas of research in multivariate statistics and machine learning. High-dimensional data sets arise in many applications, e.g., gene expression levels from a microarray experiment. Techniques for high-dimensional data are useful in a wide variety of disciplines; I plan to emphasize applications to bioinformatics and text mining.

Here is a rough outline of the topics that I expect to cover:

1. Multivariate Data. Data matrices, proximity matrices and graphs. Labeled and unlabeled data.

2. Graphical Methods for Exploring Multivariate Data. Scatterplots in two and three dimensions, grand tours, projection pursuit. Parallel coordinates. Brushing.

3. Dimension Reduction. Linear techniques: principal component analysis, biplots and $h$-plots, principal coordinate analysis. Spectral techniques for manifold learning: Isomap, Locally Linear Embedding, Laplacian eigenmaps, diffusion maps. Nonspectral embedding techniques and their application to dimension reduction.

4. Supervised Learning. Linear/quadratic discriminant analysis, nearest neighbor methods, distance/metric learning, support vector machines. Multiple kernel learning.

5. Unsupervised learning. K-means clustering, self-organizing maps, iterative denoisong.

Text: I will rely on my own lecture notes and various talks, technical reports, and papers from the literature.

The essential prerequisite for this course is some familiarity with linear algebra (vectors, matrices, eigenvalues, etc.). We will use a high-level statistical programming language (R), so some previous experience with a computer programming language would be helpful. Previous exposure to classical multivariate statistical methods is helpful, but not essential.

For more information, please visit the course web page:

http://mypage.iu.edu/~mtrosset/675.html

If you are uncertain whether or not you have the background to take this course, please contact me at .

Computer Science graduate students may count STAT S675 as an Area 5 (Artificial Intelligence) course for the purpose of fulfilling their area distribution requirements. Any student who intends to do so should notify Amr Sabry .

S681 Large-scale Inference (3 cr.)
P: Consent of instructor

Statistical inference no longer consists of single hypothesis tests or analyses of variance: today's huge data sets (e.g., fMRI pixel intensities or microarray gene expressions) encourage researchers to investigate hundreds, thousands, or even tens of thousands of comparisons in high-dimensional (and highly correlated) responses to various conditions. The consequence for proper inference of conducting multiple hypothesis tests was first noted by John Tukey in a highly cited "unpublished manuscript" entitled "The Problem of Multiple Comparisons" (1953), now published in its entirety in Volume VIII of "The Collected Works of John W. Tukey" (ed. Henry I. Braun: Chapman and Hall). The availability of computational resources enables an entirely different approach to multiple inference, one that takes advantage of empirical null distributions rather than resorting to theoretical formulas for probability distributions.

This course will study such approaches by going through the recently published book:

Bradley Efron (2010), Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction (Cambridge University Press)

http://www.cambridge.org/gb/knowledge/isbn/item5010376/?site_locale=en_GB

The format of the course will be highly interactive, with participants taking lead roles in presenting topics and engaging class discussion on the chapters of the book:

1. Empirical Bayes and the James-Stein Estimator
2. Large-scale Hypothesis Testing
3. Significance Testing Algorithms
4. False Discovery Rate Control
5. Local False Discovery Rates
6. Theoretical, Permutation, and Empirical Null Distributions
7. Estimation Accuracy
8. Effects of Correlation
9. Sets of cases (Enrichment)
10. Combination, Relevance, Comparison
11. Prediction and Effect Size Estimation

Requirements for registered participants will be class involvement and a final data analysis project.

Due to the advanced level of this course, Consent of Instructor is required, and the class size will be limited to 10 students. Auditors will be welcome (as space permits). Please contact Professor Kafadar for further information (kkafadar@indiana.edu).

S681 Topics in Applied Statistics - Statistics in Legal Applications (3 cr.)
P: Consent of instructor

This course will cover applications of statistics in legal settings. Depending upon the background in statistics of the course participants, we will cover statistical concepts as needed. The focus will be on legal situations where statistical analyses were essential. Some examples include employment discrimination, disputed wills, and forensic science (including recent reports from the National Academies). Participants are invited to present their own cases for class discussion. Active participation will be expected, and interim and final projects will be required.

S681 Spatial Statistics (3 cr.)
P: Two statistics courses at the graduate level, or consent of instructor.

This course aims to introduce a variety of statistical methods in the spatial domain. We will introduce and discuss methods for three types of spatial data including geostatistical data, regional data and spatial point patterns. Major topics to be covered include spatial covariance functions, variograms, kriging, spatial autoregressive models, K function, etc.

S681 Topics in Applied Statistics - Time Series II (3 cr.)
P: Consent of instructor

Time Series II
This course is cross-listed with PSY-P657

In the first Time Series I course in Fall 2008, we learned about time series from a dynamical systems and discrete time perspective. In this course, we will build on these skills by approaching time series analysis from a primarily spectral or frequency-based approach.

Topics will include:

Basic calculus review, complex numbers and variables, Fourier analysis, digital filters, spectral estimation, linear filtering in the frequency domain, noise models, sampling, aliasing, the discrete and fast Fourier transforms (DFT & FFT), Gibbs phenomenon, signal quantization, and the impact of these concepts on estimation and signal detection.

Advanced topics may include:

Wavelets, independent component analysis, image analysis (2D Fourier analysis), multivariate time series, nonlinear processes.

S681 Bayesian Modeling and Computation (3 cr.)
P: STAT S626 or consent of instructor

This course covers topics on stochastic simulation for Bayesian inference with a focus on Markov chain Monte Carlo (MCMC) techniques. The applications consider hierarchical models that range from regression analysis to more advanced settings such as mixture models, time series and spatial models. Free software programs such as R and BUGS will be used for data analysis. This is a follow-up class to S426-S626.

S681 Statistical Methods for Causal Inference (3 cr.)
P: STAT S501, S631, SOC S554, or consent of instructor. Stata will be the main statistical software used in the class. Basic knowledge in R programming is essential.

Many statistical problems are driven to identify and estimate causal effects. For example, whether a training program is effective in increasing trainee's productivity and subsequently their income, whether and how much advantage incumbent candidates have during elections, whether and to what degree peers influence each other's behaviors, whether and by how much a medicine or a surgical procedure helps patients reduce their sufferings, etc. Based on the potential outcomes framework, this course presents a series of methods to make causal inference that are distinctive from the traditional regression framework and that can be used to analyze both experimental data or observational data, including propensity score methods (e.g., propensity score regression, matching, stratification, and weighting), regression discontinuity design, difference-in-difference methods, instrumental variable estimation, natural experiments, methods for mediation and interaction, directed acyclic graphs (DAGs), etc. These concepts and methods will be illustrated by a variety of examples drawn from social and medical sciences.

S681 Topics in Statistical Machine Learning (3 cr.)
P: Consent of instructor

This is a graduate level course in advanced machine learning. The term "statistical" in the title reflects the emphasis on statistical analysis and methodology, which is the predominant approach in modern machine learning.

The course combines methodology with theoretical foundations and computational aspects. It treats both the "art" of designing good learning algorithms and the "science" of analyzing an algorithm's statistical properties and performance guarantees. Theorems are presented together with practical aspects of methodology and intuition to help students develop tools for selecting appropriate methods and approaches to problems in their own research.

The course includes topics in statistical theory that are now becoming important for researchers in machine learning, including consistency, minimax estimation, and concentration of measure. It also presents topics in computation including elements of convex optimization, variational methods, randomized projection algorithms, and techniques for handling large data sets.

S681 Topics in Applied Statistics - Functional Data Analysis (3 cr.)
P: Consent of instructor

This course will introduce students to methods for analyzing functional data --- i.e., data that are entire curves, rather than single observations or vectors of several measurements. Such data objects involve numerous highly-related points per object, and the methods for analyzing them make explicit use of data objects as functions. In contrast to multivariate analysis (multiple values per object that different degrees of associations)and longitudinal analysis or ``panel studies'' (where measurements are repeated on an individual at only a few time points), the methods here treat the object as a function. We discuss these methods which include graphical displays, summaries, analogs of conventional analyses (analysis of variance, principal components), with an emphasis on applications. The two required books will address both these goals (methodology and applications). The course is valuable for both data researchers whose data are entire functions (gait, spectra, etc.) as well as students interested in participating in a relatively new and important area of statistical research. Required textbooks: James Ramsay and Bernard Silverman, Functional Data Analysis (FDA), Springer. James Ramsay and Bernard Silverman, Applied Functional Data Analysis (AFDA), Springer.

S682 Topics in Mathematical Statistics - Introduction to Graphical Models (3 cr.)
P: Consent of instructor

INTRODUCTION TO GRAPHICAL MARKOV MODELS IN MULTIVARIATE STATISTICAL ANALYSIS. A central aspect of statistical science is the assessment of dependences among a set of stochastic variables. The familiar concepts of correlation, regression, and prediction are manifestations of this idea, and many aspects of causal relationships ultimately rest on representations of multivariate dependence.

Graphical Markov models (GMM) use graphs, either undirected, directed, or mixed, to represent multivariate dependences in a visual and computationally efficient manner. By representing each variable as a node in a graph a GMM is usually constructed by specifying local dependences for each node of the graph in terms of its immediate neighbors, parents, or both. A GMM can thus represent a highly varied and complex system of multivariate dependences by means of the global structure of the graph. The local specification permits efficiencies in modeling, inference, and probabilistic calculations.

For a fixed graph  model, the classical methods of statistical inference may be utilized. In many applied domains, however, such as expert systems for medical diagnosis, weather forecasting, or the analysis of gene-expression data, the graph is unknown and is itself the first goal of the analysis. This poses numerous challenges, including the following:
  • The numbers of possible graphs and models grow superexponentially in the number of variables.
  • Distinct graphs G may be Markov equivalent  statistically indistinguishable.
  • Conversely, the same graph may possess different Markov interpretations.

    Furthermore, in applications, GMMs represent one of the most interdisciplinary topics of contemporary statistical science. Applications arise in a host of areas, e.g., computer science (expert systems, robotics, data-mining, machine learning), electrical engineering (automatic speech recognition systems, error-correcting codes), genetics (modelling gene-expression data), epidemiology (causal models), econometrics (structural equations), and behavioral science (modelling social networks).

    References:
    Cox,D.R. and Wermuth, N. (1996) Multivariate Dependencies: Models, Analysis, and Interpretation. Chapman and Hall, London.

    Edwards, D. (2000). Introduction to Graphical Modeling, 2nd ed. Springer, New York.

    Lauritzen, S.L. (1996) Graphical models. Oxford University Press, Oxford.

    Whittaker, J.L. (1990) Graphical models in Applied Multivariate Statistics. Wiley, New York.

  • S682 Statistical Model Selection (3 cr.)

    M-estimates are a broad class of statistical estimates obtained as the solution to an empirical optimization process. Typically, the population parameter is defined as the minimizer of a population risk function and its estimate is defined as the minimizer of the empirical risk. While M-estimates are known to enjoy many desirable properties, goodness of fit alone is not an adequate method for selecting the best among models of different "complexity." On the one hand, "simpler" models can be more revealing of the structure in the data. On the other hand, they are often restricted versions of more "complex" models and hence will never be preferred based on goodness of fit alone. In this course, we cover model selection techniques with an emphasis on variable selection in generalized linear models. We review classical variable selection methods such as AIC, BIC, and Mallows' Cp and discuss some of the computational issues involved. In addition, we introduce some alternative measures of the complexity of a model and review how they can be used for model selection purposes. Finally, we briefly review some of the issues specific to high-dimensional data sets and how they can be addressed.

    S682 Topics in Mathematical Statistics -- Statistical Theory I (3 cr.)
    P: S620 and some knowledge of elementary measure theory, and consent of the instructor

    Mathematically rigorous introduction to major areas of statistical theory and practice, including statistical models, sufficiency, likelihood inference, estimation and testing, Bayesian inference, decision theory, equivariance, and optimality of test statistics. The statistical program package "R" will be introduced and used.

    S682 Topics in Mathematical Statistics - Multivariate Statistical Analysis (3 cr.)

    (3 cr.) P: STAT S721 and S722, or consent of instructor. Multivariate normal distributions. Tensor notation. Multivariate linear normal models (MANOVA), estimation and testing. Wishart distributions and models. Inference for the covariance matrix, including multivariate Bartlett's test, test of block independence, and test of sphericity. Box approximations. Eigenvalues, including canonical correlations and principal components/factor analysis.

    S690 Statistical Consulting (4 cr.)
    P: S631 and S632, or consent of instructor

    This class will cover necessary skills for effective statistical consulting with applications derived from real consulting situations. Students will have the opportunity to engage with clients, perform analyses, and present the results of their consultations in class. Along the way students will learn practical methods of data analysis, questions that need to be considered for future studies, and ways of presenting data for different purposes. Clients and projects will come from the Indiana Statistical Consulting Center.

    S695 Readings in Statistics (3 cr.)

    (1-3 cr.) P: Consent of instructor. Supervised reading of a topic in statistics. May be repeated with different topics for a maximum of 12 credit hours.

    S710 Statistical Computing (3 cr.)
    P: STAT S620 or consent of instructor

    Survey of numerical methods in statistics. Matrix factorizations and algorithms for linear regression. Nonlinear optimization, maximum likelihood and nonlinear regression. Pseudorandom number generation and Monte Carlo methods.

    S710 Statistical Computing (3 cr.)
    P: STAT S620, or consent of instructor.

    This course will cover two aspects of statistical computing. The first aspect will cover the use of R, a statistical computing software environment for performing statistical procedures and making graphical displays of data. Some previous exposure to R and to statistics procedures will be assumed (e.g., regression, analysis of variance, basic plotting); in this course we will focus instead on some less familiar but very useful methods (e.g., random number generation for simulations, diagnostic plots for validating model assumptions, robust methods of regression, bootstrapping for standard errors, generalized additive models, visualizng multivariate data). The second aspect will focus on some of the consequences of using the computer's finite arithmetic on statisical results (e.g., periods of random number algorithms, matrix computations, expediting calculations for smoothing algorithms such as loess, etc.). The two books required for this course address these two aspects.

    Textbooks:

    (1) John Maindonald and John Braun, Data Analysis and Graphics Using R, Cambridge University Press

    (2) Ronald Thisted, Elements of Statistical Computing, Chapman and Hall

    S721 Advanced Statistical Theory I (3 cr.)
    P: P: S620, some knowledge of elementary measure theory, and consent of the instructor.

    Mathematical introduction to major areas of statistical theory and practice, including statistical models, sufficiency, likelihood inference, estimation and testing, Bayesian inference, decision theory, equivariance, and optimality of test statistics.

    S722 Statistical Theory II (3 cr.)
    P: P: S721 or consent of instructor.

    A continuation of S721. A mathematically rigorous introduction to major areas of statistical theory and practice including multinomial models, canonical linear models, exponential families, asymptotic theory, and general linear models.

    S730 Theory of Linear Models (3 cr.)
    P: P: STAT S620, or consent of instructor.

    Theory of the general linear model. Distribution theory, linear hypotheses, the Gauss-Markov theorem, testing and confidence regions. Application to regression and to analysis of variance.

    S740 Multivariate Statistical Theory (3 cr.)
    P: P: STAT S721 and S722, or consent of the instructor.

    Multivariate normal distributions. Multivariate linear normal models, estimation and testing. Wishart distributions and models. Inference for the covariance matrix. Eigenvalues, including canonical correlations and principal components/factor analysis.

    S781 Advanced Topics in Applied Statistics (3 cr.)
    P: P: Consent of the instructor.

    Careful study of an advanced statistical topic from an applied perspective. As topics vary, this course may be repeated for credit.

    S782 Advanced Topics in Mathematical Statistics (3 cr.)
    P: P: Consent of the instructor.

    Careful study of an advanced statistical topic from a mathematical or theoretical perspective. As topics vary, this course may be repeated for credit.