UMDNJ School Of Public Health

Piscataway/New Brunswick Campus

 

Applied Categorical Data Analysis

 

Spring 2007

3 Credits

SPH, Room 2A, Wednesdays, 6:10-9:00 p.m.

 

 

Course Information

            Instructor:         Pamela Ohman Strickland, Ph.D.

                                    Division of Biometrics, UMDNJ-SPH

                                    School of Public Health, Room 218, 732-235-9721

                                    ohmanpa@umdnj.edu

            TA:                  Susan Huyck, PhD candidate in Biostatistics

 

            Textbooks

Required:

Agresti, A. (1996) An Introduction to Categorical Data Analysis.  John Wiley & Sons.

Highly Recommended:

Stokes, M.E., Davis, C.S. and Koch, G.G.  Categorical Data Analysis Using the SAS System, 2nd Edition.  SAS Publishing

 

Prerequisites:   Introduction to Biostatistics, Biostatistics Computing, Regression Methods for Public Health Studies

 

Course Description

 

Public health studies, especially those involving questionnaires, contain large amounts of categorical data.  This class provides an introduction to descriptive and inferential statistics for univariate and multivariate categorical data with applications to epidemiological and clinical studies.  For 2 and 3-way contingency tables, measures of association and tests for homogeneity between populations and independence of variables are presented.  Related tests of trend for ordinal data are studied.  Loglinear and logistic regression analyses are investigated for data sets with both nominal and ordinal variables.

 

Computing Language: SAS

 

Course Objectives

 

By the end of this course, students will be able to:

1.      Formulate appropriate statistical hypotheses for examining cross-classified data from public health and clinical studies

2.      Justify the basic theoretical models for categorical data

3.      Create and/or actively participate in the design and analysis plan for a study involving categorical data, whether nominal or ordinal in nature

4.      Conduct and/or actively participate in the analysis of categorical data

5.      Interpret results from contingency tables or generalized linear models that evaluate relationships between categorical variables

 

Outline

 

January 17

Introduction, Distributions and Sampling

Examples to be Used in the Course of the Semester,  Definitions

Poisson, Binomial and Multinomial Distributions

Inference about univariate proportions

January 24

Two-way Contingency Tables

Probability Structures for 2-way tables, Chi-square tests of independence

Comparing proportions, risk ratios and odds ratios

January 30

Two-way Contingency Tables with Ordinal Data

Continuous or Categorical?

Testing independence for ordinal data

Measures of Association for ordinal data

February 7

Analyses for Matched Pairs of Categorical Data

Defining the hypothesis of interest, Testing independence

Three-way Tables

Justification for studying three-way relationships, partial association, Cochran-Mantel-Haenszel Methods, Common odds ratios

February 14

Generalized Linear Models

Components of a generalized linear model, linear regression, models for binary data and Poisson regression Models for count and rate data, model inference and model fitting

February 21

Logistic Regression

Interpreting the logistic regression model, inference for logistic regression, model checking, logit models with qualitative and/or ordinal predictors, multiple logistic regression

Project Plan due.

February 28

Logistic Regression, Continued

March 7

Mid-term

March 21

Multi-category Logit Models

Logit Models for Nominal Responses, cumulative logit models for ordinal responses, paired-category logits for ordinal responses

March 28

Loglinear Models for Contingency Tables

Loglinear models for two- and three-way tables, inference for loglinear models, loglinear models for higher dimensions, the loglinear-logit connection

April 4

Building and Applying Logit and Loglinear Models

Association graphs and collapsibility, modeling ordinal associations, tests of conditional independence, effects of sparse data

April 11

Models for Matched Pairs

Comparing dependent proportions, logistic regression for matched pairs, symmetry and quasi-symmetry models for square tables, comparing marginal distributions, analyzing rater agreement

April 18

Conditional Logistic Regression (Final Project Reports Due)

April 25

Final Exam

May 1

Revised Final Report Due

 

Evaluation

 

1.      Homework                                                                                           25%

2.      Class Attendance and Participation                                                         5%

3.      Exams                                                                                                  45%

4.      Final Project Write-Up                                                             15%

5.      Peer Review of Final Project                                                                 10%

 

Homework policies

 

  1. All homework must be turned in at the beginning of the class period on the day on which they are due.
  2. On all homework assignments/problem sets, students are encouraged to discuss with one another, but work should be carried out and written up independently. If any two identical write-ups are found, both homework assignments will be given a grade of zero.
  3. The homework assignments will involve computer work using SAS.  Relevant portions (only) of the output should be cut and pasted into the homework assignment at the appropriate spot, not attached at the end.  Failure to comply will result in a reduction of 50% of points for that homework.

 

 

Exam policies

Exams will be given on March 7th and April 25th.  Any unexcused absences for these exams will result in a grade of zero for that exam. 

 


Final Project

 

There are two options for the final project.

 

1)      Data Analysis

Complete a full analysis of a set of data that contains an ordinal or multinomial response and at least one ordinal covariate.  This data set may not be one that you are using for a fieldwork project or dissertation.  Present two alternative analyses, one of which must be a generalized linear model.  Compare and contrast the two approaches.

 

2)      Report on a Statistical Paper

Report on a statistical paper addressing the analysis of categorical or ordinal data.

    1. Describe the problem it addresses
    2. Give an original example of data to which the methods would apply
    3. Summarize the solution proposed along with advantages and disadvantages of the solution
    4. Formulate two or three application or research questions propagated by the research paper.  (Try to be as original as possible.)

 

The emphasis in these final projects should be on the statistical methodology and how it is applied.  A “Project Plan” will be due on February 21st along with any regular homework assignment.  The final project report is due on April 18th.  After a feedback and peer review, a revised final report may be submitted by May 1st.

 

Final Project Peer Review

 

Final drafts of the Paper will be due on April 18th.  Each student will be randomly assigned three other students’ papers to review.  Using a template, each student will be asked to rate and comment on these three other papers.  Statistical methods will be used to combine these ratings and create a final ranking of all papers.  10% of your final grade will be based on these peer reviews.  (Please note that if you do not hand in your own peer reviews for other students’ papers, you will get a 0 out of this 10%.  If you do not have a paper to hand out to other students for review on April 18th, you will also get 0 out of this 10%.  No exceptions will be made!)

 

You may use these peer reviews to revise your own paper for a final submission of the project due on May 1st.