Statistics

Section Chairs:

Maribeth Johnson

Medical College of Georgia

Augusta, GA

Olivia Rud

Advanta Corporation

Horsham, PA

     
   

The Use of SAS to Construct Horvitz-Thompson Type Estimators for Safety Belt Compliance

   

Holmes Finch, University of South Carolina, Columbia, SC

   

In many survey applications, multi-stage sampling is used in an effort to reduce the variance and get parameter estimates for very specific subgroups. In conducting an observational survey of safety belt use across the state of South Carolina, such a sampling scheme was used at 3 levels: counties within strata, census tracts within counties, and sites within census tracts. Each of the state’s 46 counties was placed in one of 6 strata which were a combination of geographic location and urban/rural classification. Within counties, census tracts were randomly selected with probability proportional to size (pps), as were sites within tracts. The statewide estimate of the percent of occupants using safety belts, and its standard error must take into account this complex sampling scheme. One such method for calculating compliance yields Horvitz-Thompson type estimates. It uses a recursive process which builds estimators at one stage on those from the stages below, employing a combination of weights and raw compliance estimates from individual sites. Because of the uniqueness of these estimates, standard software approaches were unavailable for their calculation, and thus a set of SAS programs was written to calculate them and their standard errors.

   

Holmes Finch is the manager of the Statistical Laboratory (Stat Lab) at the University of South Carolina, where he has been for 7 years. He has a Master’s degree in Educational Research, and has been a SAS user for 10 years. His primary responsibilities include consulting with researchers in a wide variety of disciplines, and conducting statistical analysis for research projects involving the Stat Lab.

   

PROC FACTOR: How to Interpret the Output of a Real-World Example

   

Rachel J. Goldberg, Guideline Research/Atlanta, Inc., Atlanta, GA

   

This paper summarizes a real-world example of a factor analysis with a VARIMAX rotation utilizing the SAS System's PROC FACTOR procedure. Each step you must undergo to perform a factor analysis is described -- from the initial programming code to the interpretation of the PROC FACTOR output. The paper begins by highlighting the major issues that you must consider when performing a factor analysis using the SAS System's PROC FACTOR. This is followed by an explanation of sample PROC FACTOR program code, and then a detailed discussion of how to interpret the PROC FACTOR output. The main focus of the paper is to help SAS software beginning and average skill level users learn how to interpret programming code and output from PROC FACTOR. Some knowledge of Statistics and/or Mathematics would be helpful in order to understand parts of the paper. All of the results discussed utilize Base SAS and SAS/STAT(r) software.

   

Rachel Goldberg, Project Director of Guideline Research/Atlanta, Inc., is responsible for overall project management, questionnaire design, budget estimation, proposal coordination, analysis, and interpretation of quantitative and qualitative marketing research projects and competitive intelligence studies. Rachel has been with Guideline Research for over two years; the majority of her work is focused in the energy services industry. She has a B.S. in Statistics from the University of Florida with extensive course work in psychology and research methods and is a member of the Society of Competitive Intelligence Professionals. Rachel has used SAS for the past seven years, the last two of which were in the Microsoft Windows environment.

   

Evaluation of Manatee Feeding Preferences from Partial Ranks

   

Jay Harrison, University of Florida, Gainesville, FL

   

A sequence of trials was conducted to measure the feeding preferences of captive manatees at Homosassa Springs State Park, Florida. During each trial, one manatee was allowed to browse through a completely randomized grid of five plant species with four replications per species. Responses consisted of arbitrary numbers of partial ranks which indicated the order in which the plants were eaten. A method of describing food preferences based on partial ranks was developed by obtaining Mann-Whitney preference counts for each pair of plants, then fitting Bradley-Terry models to the corresponding contingency tables. PROC GENMOD in SAS was used to obtain the estimate.

   

Jay Harrison is the senior statistician for the Institute of Food and Agricultural Sciences at the University of Florida. He has also worked for the National Agricultural Statistics Service and the Mayo Clinic. Jay has a bachelor’s degree in bioengineering from Texas A&M and a master’s degree in statistics from Ohio State, and he is currently working on a Ph.D. in education at Florida.

   

Williams’ Test: The Macro

   

Derek B. Janszen, Chemical Industry Institute of Toxicology, Research Triangle Park, NC, and William P. Hahn, North Carolina State University, Raleigh, NC

   

Williams’ test (Biometrics 27, 1971, 103-117; Biometrics 28, 1972, 519-531) is used in pharmacology and toxicology for determining the lowest dose group that is significantly different from the control group. It assumes a strictly increasing or decreasing response with increasing dose and is more powerful than Dunnett’s test. A procedure for Williams’ test was written by Stan Young and George Fraction for SAS version 5 (SUGI Supplemental Library User’s Guide, Version 5. 1986. Cary, NC: SAS Institute, Inc.) but this has not been available for later versions. In version 6.08, SAS introduced "WILLIAMS" as an option for the probmc() function. Given the maximum likelihood estimates (MLE) of the monotonic responses, probmc() computes the p-values. Frequently, the actual responses are not monotonic. A macro has been written (requiring only the BASE and STAT modules of SAS) that computes the MLEs of the monotonic responses for a set of data (regardless of the actual response pattern) and computes the corresponding p-values for Williams’ test. The macro is well-documented. In this presentation, the approach and algorithms used in the macro will be discussed. Several examples demonstrating the features of the macro will also be presented. Among the key features are the ability to accept data in either a columnar or stacked format, to perform several transformations on the data, and to perform Shirley’s test, a non-parametric analog of Williams’ test.

   

Derek Janszen, Ph.D., is a biostatistician at the Chemical Industry Institute of Toxicology, Research Triangle Park, NC. He has used SAS for about 10 years and is currently extolling the virtues of JMP.

Bill Hahn is a doctoral student in statistics at NC State University. In his spare time he programs the macros that Derek doesn’t have time to do himself.

   

Summarization of Logistic Regression Results When There Are Many Dependent Variables and a Common Shared Set of Explanatory Variables

   

John T. Jones, PharmaResearch Corp., Wilmington, NC

   

Side by side comparisons of an univariate and multivariate estimate reveals the significance and robustness of that estimate. Using the Logistic procedure and the stepwise technique, the Macro language, the Data Step, and the Report procedure, a program can be developed that succinctly summarizes the univariate and multivariate logistic results, permitting side by side comparison, for many dependent variables that share a common set of explanatory variables in a clear and easily interpretable table format. This technique can be easily extended to other regression methods.

   

John Jones has been a professional SAS programmer for eleven years in a variety of business settings including manufacturing of consumer products and semiconductor chips, software development, marketing, clinical trials, and Pharmecoeconomics. His SAS skills include STAT, ETS, IML, GRAPH, SCL, FRAME, FSP, QC, Macro language, PROC REPORT, and PROC SQL. John has a Master of Science degree from North Carolina State University and a Master of Arts Econometrics from the University of Akron. He is currently employed by PharmaResearch Corporation in Wilmington, NC as a Statistician.

   

PROC MIXED and the Output Delivery System (ODS): A Brief Introduction

   

Francis J. Kelley, University of Georgia, Athens, GA

   

It is often the case, particularly for Macro writers, that some portion of procedural output will be needed for subsequent use. Although the SAS System provides extensive capabilities in this regard, there are still those instances where there is no provision for writing the statistic(s), parameter(s) or values to a file - even though they may be present in the "printed" output. It is conventional in such circumstances to use PROC PRINTTO to redirect the output to a file, then read that file back into SAS, parse the "printed format" output, and extract the desired data. This is of course, tedious, time-consuming, and error-prone. With PROC MIXED, SAS introduced the "Output Delivery System", a feature that will be standard with the release of Version 7. This system allows all output to be sent to SAS datasets for use in later steps. This paper will provide a quick introduction to some of these features and a short example of their use.

   

Joe Kelley is a senior consultant in the Host Systems and Statistical Software Support group at University Computing, The University of Georgia. He provides assistance for the IBM and CDC mainframes at UGA (operating systems, compilers, editors and software packages) and provides SAS software support for all the systems it is available on at the University: MVS/TSO, VM/CMS, Unix, OS/2, Windows, Macintosh, and DOS. He has used SAS extensively for well over 12 years.

   

Good Statistics or Good Programming - Can We Have Both?

   

Grace Lossman, Statcon, Inc.

   

Present in every industry is the need for timely results. The purpose of this paper will be to show that efficiency can be greatly increased if we just spend the time to produce reliable and easily modifiable, efficient code. Producing appropriate statistical models is only half the package. Examples will be provided to show the effectiveness in this approach.

   

Grace Lossman has a Master’s degree in Physics and is completing her Master’s in Statistics. She is a consultant working for Smith Hanley Consulting Group and is CEO of Statcon, Inc. She has over 10 years experience with the SAS system and is currently working on a contracting assignment at Pfizer Pharmaceuticals in Manhattan. Grace has two books in progress through the Institute’s Books by User’s Program: one dealing with statistics and graphics for scientists and engineers; the other dealing with problems inherent to the pharmaceutical industry. Grace was a co-chair of the 1996 SESUG conference in Atlanta.

   

Identifying Clinical Practice Improvement Opportunities Using Resampling Techniques in PROC MULTTEST

   

Gregory L. Pearce, Mission+St. Joseph’s Health System, Asheville, NC, and Peter H. Westfall, Texas Tech University, Lubbock, TX

   

Six surgeons perform more than 800 coronary artery bypass graft (CABG) surgeries annually at the Owen Heart Center, a program of the Mission+St. Joseph’s Health System. Surgeon specific results are evaluated quarterly. Each surgeon is compared to the remainder of the group for seven adverse events with the intent of identifying continuous quality improvement (CQI) opportunities for clinical practice. In order to drive out fear in the CQI process, the probability of declaring a false significance must be controlled. Without adjustment, the probability of declaring a significant difference spuriously approaches 88%. Adjustment techniques to address the multiple comparison problem are available (e.g., Bonferroni) but may prove too conservative to identify CQI opportunities for surgeons. Therefore, a method that balances the risk of falsely declaring a significant result with the ability to detect clinically important differences is desirable. We have employed PROC MULTTEST to resample the data to make permutation adjustments. This method approximates the distribution of the minimum p-value of all tests and this distribution is then used to adjust individual raw p-values. The Cochran-Armitage linear trend test and linear contrasts are used to make surgeon specific comparisons.

   

Gregory Pearce is employed as the Manager of Heart Services Research for the Mission+St. Joseph’s Health System in Asheville, North Carolina. Greg received a Master’s degree in Statistics from the University of Tennessee, Knoxville in 1989 and has been a SAS user since 1984. Collaborative investigations with clinicians have resulted in his co-authoring 16 articles on various aspects of cardiovascular and renal research.

Peter H. Westfall is a Professor of Statistics at Texas Tech University. The received his Ph.D. in Statistics from the University of California at Davis.

   

On the Analysis of Repeated Measures and Longitudinal Data using the SAS System

   

Jane Pendergast, University of Florida, Gainesville, FL

   

Repeated measures and longitudinal data require special attention because they involve correlated data. This correlation is induced when the primary sampling units are measured repeatedly over time or under different conditions. Examples of techniques applicable to non-normal data using the SAS system will be presented. The primary objectives are to investigate trends over time and how they relate to treatment groups or other covariates.

   

Jane Pendergast is a faculty member is the Division of Biostatistics, Dept. of Statistics, at the University of Florida. She has had extensive consulting and collaborative research experience within the health science center. She has co-authored two tutorials with Ramon Littell on using the SAS System to analyze repeated measures and longitudinal data. She has published on a wide variety of topics, including a recent review of methods applicable to correlated/longitudinal binary outcome data. She has been a SAS user for 20 years.

   

Logistic Regression for Polychotomous Outcomes

   

Maura Stokes, SAS Institute, Inc., Cary, NC

   

Usually, logistic regression is performed when you have a dichotomous response variable. However, logistic regression also applies when you have more than two outcomes. If your response variable is ordinal, then ordered logistic regression is appropriate. Also known as the proportional odds model, this strategy is implemented with the LOGISTIC procedure. If your response variable is nominal, then you can perform logistic regression based on the analysis of generalized logits. This strategy is implemented with the CATMOD procedure. Both strategies are demonstrated with examples.

   

Maura Stokes is Manager of Statistical Applications R and D in the Applications Division at SAS Institute, where she has worked since 1985. She received her DrPH from the Department of Biostatistics at the University of North Carolina, Chapel Hill in 1986, where she is now an assistant adjunct professor. She is co-author of the book Categorical Data Analysis Using the SAS System.

   

Modeling Higher Education Processes with Multiple Logistic Regression: SAS and SPSS Examples

   

Daniel Teodorescu, College of Charleston, Charleston, SC

   

This paper is intended to familiarize SAS users, research analysts, and statisticians with the potential use of the logistic regression in the study of higher education processes (i.e., college choice decisions, enrollment management, retention analysis, etc.). The paper first reviews concepts and theoretical assumptions employed in logistic regression and then discusses a series of applications in the field of higher education research and their implications for policy analysis, planning, and management across college campuses. throughout the discussion, examples are offered in both SAS and SPSS code. A special emphasis is placed on the interpretation of the results of logistic regression as well as their dissemination to diverse audiences.

   

Daniel Teodorescu currently works as a statistician for the College of Charleston in the Office of Institutional Research. Prior to this, he had held research positions at the New Jersey Institute of Technology and the Institute for Educational Sciences in Bucharest, Romania. He earned and M.S. degree and a Ph.D. degree in Educational Administration & Policy from the University at Albany, NY. He has been using SAS/BASE, SAS/STAT, SAS/OR, and SAS/GRAPH for more than two years for survey research, tabulations, and statistical modeling.

   

Identifying the Order of Integration of Time-series Data Using Customized SAS Macros: Dickey-Fuller and Augmented Dickey-Fuller Tests

   

Jun Zuo, STATPROBE, Inc., Ann Arbor, MI

   

Identifying the order of integration of each single time-series variable is the first step in examining the potential presence of cointegration among multiple time-series variables. Recently, consideration for cointegration has become a very important topic in time-series analysis. In this paper, the customized SAS macros were developed in order to conveniently and efficiently identify the order of integration of time-series data. The order of integration of a single time-series variable can be determined in terms of results of testing stationarity of the nth order difference of the variable by using either Dickey-Fuller of Augmented Dickey-Fuller tests, which are accommodated in the SAS macros. One advantage to using these macros to determine the order of integration is that different specifications (without drift, with drift, or with both drift and linear trend) and different lags (for ADF test) can be easily selected using options defined within these macros.

   

Jun Zuo is currently working as a statistician/statistical programmer at STATPROBE, Inc. in Ann Arbor, Michigan. He has been a SAS user for more than five years, with two years in the pharmaceutical industry. He obtained his Ph.D. degree in Applied Economics from The Ohio State University, Columbus, Ohio. His Ph.D. dissertation focused on applications of time-series data analysis with SAS software including SAS macro, SAS/STAT, and SAS/ETS.