Advanced Tutorials

Section Chairs:

Phil d'Almada

EDS and Centers for Disease Control and Prevention (CDC)

Atlanta, GA

David Harris

American Cancer Society

Atlanta, GA

 

 

   
   

Data Warehouse Implementation with the SAS System

   

Tony Brown, SAS Institute, Inc., Cary, NC

   

The successful implementation of a data ware-house can be an extremely complex process, requiring significant planning, patience, and persistence. The Professional Services Division of SAS Institute Inc. has found that a good understanding of data warehousing concepts, coupled with a solid development methodology, and a technical infrastructure built around the SAS System can enable practitioners to achieve their goals of effective data warehousing. This paper addresses many of the challenges and issues that The Professional Services Division has confronted during the development of data warehouses. It explores the definition and purpose of data warehouses, common myths and mistakes in warehouse development, and presents the SAS Data Warehouse Model. It also provides information on staffing data warehouse projects, and finally introduces the SAS Rapid Data Warehouse Methodology. This paper will be of interest to anyone who is planning or has recently initiated a data warehouse project. Several SAS products will be highlighted, but experience in their use is not necessary to derive benefit from this paper.

   

Tony Brown is a senior applications developer at SAS Institute. Tony has been with the Institute for only 1 year, but has 13 years of experience with SAS software. His area of expertise includes Data Warehousing, Decision Support, OLAP, and Data Mining; including the following SAS modules: base SAS, SAS/AF, FRAME, SCL, SAS/EIS, SAS/ASSIST, SAS/GRAPH, SAS/STAT, SAS/CONNECT, SAS/SHARE, SAS/MDDB Server, SAS/Warehouse Administrator software, and SPDS. Tony has a BS degree in information systems.

   

The Data Review Two-step: Macros for Comparison and Exception Reports

   

Ron Fehd, Centers for Disease Control and Prevention (CDC), Atlanta, GA

   

Data Review, consists of two basic steps:

1. %COMPARWS: compare two data sets, with summary. This section shows how to use PROC COMPARE output data set to write an updateable text file with the changes noted. This text file has the form:

* if ID = 1 then do;

* var1 = '3';*value from base data set;

* var1 = '8';*value from compare data set;

* end;

2. XRPT: %ID and %SHOW:

This section presents two macros to ease the writing of exception reports which show data out of range. This text file has the form:

* if ID = 1 then do;

* review: data out of range: gt 24 hours/day;

* var2 = 25;

* end;

   

Ronald Fehd received his B.S. in Computer Science from the University of Hawaii-Manoa in 1986. He has been a SAS user for more than ten years and has worked for the Division of Laboratory Systems, Public Health Practice Program Office, Centers for Disease Control and Prevention, Atlanta GA, for nine years. He has posted to SAS-L under the name the Macro Maven. He is a programmer/analyst, data manager, and data detective. His expertise is the macro language and advanced data step programming with an emphasis on character variable manipulation. These routines are part of his data manager and programmer's toolkit.

   

A Step-by-Step Illustration of Building a Data Analysis Tool with Macros

   

John Charles Gober, Beekeeper Enterprises, Alexandria, VA, and Diana Zhang Wobus, University of Maryland, Baltimore, MD

   

Grouping observations in a data set into subgroups according to their percentile based on a continuous variable (expense, age, or score) and then generating statistics for each subgroup can be tedious. Applying macros appropriately, however, can achieve greater program efficiency and error-proof results. This paper illustrates a rational process of building a data analysis tool that demonstrates the power and efficiency of macros. By describing three approaches we employed in a real-life project, the paper shows how each approach improves the tool from an earlier one. With a step-by-step illustration of solving the problem, the paper helps SAS programmers who have just begun tapping into "mysterious" macro programming to learn about some of the basics of macros and the rationales that led us to the results we desired.

 

John Charles Gober has been programming using the SAS system for over 18 years. He specializes in the cleaning, manipulation, and mining of large datasets, program efficiency techniques, multi-platform migration, database design, and year 1999/2000 problems. He has worked either in a full time capacity or as a contractor at various companies such as Claims Administration Corporation, Department of Energy, Boehringer Manheim, and Atlantic Research Corporation. He is currently employed by Princeton Computer where for the last nine months he has been contracted out to the Federal National Mortgage Association (Fannie Mae). John is also Princeton's Technical Advisor for hiring and placing SAS proficient programmers.

Diana Zhang Wobus is unable to attend SESUG this year. She relocated in July temporarily to help run an international school in Xiamen, China. Prior to that she was a senior analyst with the Center for Health Program Development and Management in Maryland. She was working on capitation projects helping Maryland State Department of Health and Mental Hygiene to determine payment rates for managed care organizations in Maryland's Medicaid reform. She began SAS programming in 1988 and has been using SAS to process and analyze large data sets on a daily basis since 1993. Diana has a Ph.D. in education and research and is active in her spare time in several educational programs and research projects. Diana's spare time is also taken up by SCUBA diving, Tai Chi, travel, and Sichuan cooking. Diana is originally from Beijing, China.

   

Data Warehouse: How to use SAS from Beginning to Almost Ending for Your Warehouse

   

Akbar Golmirzaie, University of Arkansas, Fayetteville, AR

   

This presentation shows how to cross different platforms and put the data on Data Warehouse server and let the user use SAS or other software by using ODBC driver to get access to Data Warehouse.

   

Akbar Golmirzaie: I have been at University of Arkansas since 1980. I have two masters(Computer and Statistic) and working on my PHD. I have been full time employed here since September, 1988 as statistician and rep for all statistical software. I have been using SAS since 1981. Almost a little over a year ago, I started this project called Data Warehousing and had great progress with SAS. I developed the whole thing by myself and using SAS from beginning until almost ending. I have been using SAS on MVS, Unix, and .... And there is more to say but I hope this is enough.

   

Moving a SAS/AF Application to the World Wide Web

   

Keith Humphrey and Ray L Ransom, Centers for Disease Control and Prevention (CDC), Atlanta, GA

   

By itself, Version 6.12 of the SAS system does not provide the ability to develop an application that will run from the World Wide Web. The Division of Sexually Transmitted Disease Prevention (DSTDP) of the Centers for Disease Control and Prevention (CDC) needed to provided access to a STD surveillance application developed using SAS/AF, for state and local health officers via the Web. This paper will clarify components of SAS that are needed for Web application development including SAS/AF, SAS/Connect, SAS/Share and SAS/IntrNet software and other resources required. We will attempt to answer the question, "What is required to move an existing SAS/AF application to the Web?"

   

Keith Humphrey is a network specialist with the Centers for Disease Control and Prevention in the Division of Sexually Transmitted Disease Prevention. He is a member of a project team responsible for designing a SAS/AF application that will provide easy access to surveillance data. His responsibilities include maintaining connectivity between the Unix, Netware and NT servers as well as help system programmers write code on these different operating systems platforms.

Ray L Ransom is a systems analyst with the Centers for Disease Control and Prevention in Atlanta, Ga. He has been using the SAS System for 11 years and is currently leading a development team charged with the development of a GUI application to analyze and disseminate data regarding national surveillance of sexually transmitted diseases. Current challenges include Web publishing and programming using the SAS System.

   

Ten Great Reasons to Learn SAS Software's SQL Procedure

   

Kirk Paul Lafler, Software Intelligence Corporation, Spring Valley, CA

   

The SQL Procedure has so many great features for both end-users and programmers. It's fun, easy to learn and use, and can often result in fewer and shorter lines of code than using conventional DATA and PROC step methods. This presentation will demonstrate ten great reasons for learning the SQL procedure. Several examples will illustrate how PROC SQL can be used to retrieve data, subset and query data, create and modify tables, perform statistical computations, produce great looking reports, construct views, join two or more tables of data, and exchange data between your favorite data base and the SAS System by using the SQL Passthru facility.

   

Kirk Paul Lafler is senior consultant and founder of Software Intelligence Corporation with twenty years of experience using the SAS System. His areas of expertise includes information-based systems analysis, design, and development; year 2000; computer training; course design and development; and custom programming using base-SAS, SAS/SQL, SAS/FSP, SAS/AF, Screen Control Language, FRAME, and SAS/EIS software.

   

Beyond Field Validation: Incorporating the Batch Edit Into the Total Data System

   

John Quarantillo and Judy Rayner, Westat Inc., Rockville, MD

   

Computer Systems professionals have always been faced with the problem of ‘dirty data’. In the past, errors were detected using batch edit programs to identify errors. The output from these batch edits was then reviewed, corrections were transcribed onto coding sheets, then keyed, and finally applied to the data using a batch update program. With the advent of custom interactive data entry systems, many edit checks were built into the data entry system, replacing the batch edit process. There are clear advantages to this, but in the interest of speed and simplicity, the data entry edits were not as robust or thorough as the batch edits of the past. Many complex checks were left undone. The approach that we describe uses the SAS system, and SAS/AF® to combine a customized interactive data entry system with improved batch systems, taking advantage of the strengths of each to produce an integrated system for data entry and cleaning that is both thorough and easy to use. Methodologies such as those outlined here are important additions to the front end of any data warehousing effort. Our system is used under, but not limited to the MVS and Windows environments.

   

John Quarantillo is a Systems Analyst with Westat, Incorporated in Rockville, Maryland. He has worked with SAS for ten years, using SAS for data manipulation, and data collection. In his spare time, John enjoys web page design, 'net surfing and 'real' surfing.

Judy Rayner is a Senior Systems Analyst and Group Manager with Westat, Inc. in Rockville, Maryland. She has used SAS for over 20 years and is quite appreciative of the progress it has made over this time. She has used SAS for data manipulation, data management, data collection, and statistical analysis.

   

Self-Modifying SAS Programs: A DATA Step Interface

   

S. David Riba, JADE Tech, Inc., Clearwater, FL

   

There are many situations where it is useful to have a SAS program modify itself during execution. Some examples include:

  • Conditionally execute a Macro based on the data in a SAS dataset
  • Conditionally execute a Procedure when there is no data in a SAS dataset
  • Generate in-line SAS program code based on the contents in a SAS dataset
  • Write SQL statements for the SQL Pass-Through Facility "on-the-fly"

This paper explores a little-known routine in the SAS System that allows a SAS program to write SAS code "on-the-fly" during execution. Since CALL EXECUTE is a DATA step routine, the full capabilities of the DATA step can be applied to conditionally generate SAS program statements while the program is executing. If you have ever written program code to a temporary file and then %INCLUDEd it back into your program, you need to take a look at CALL EXECUTE. The paper will discuss the uses of CALL EXECUTE with specific examples based on real world situations.

   

S. David Riba is CEO of JADE Tech, Inc., a SAS Institute Quality Partner who specializes entirely in applications development, consulting, and training in the SAS System. He is the founder and President of the Florida Gulf Coast SAS Users Group. He chartered and served as Co-Chair of the first SouthEast SAS Users Group conference, SESUG '93, and serves as President of the Executive Council of the SouthEast SAS Users Group. His first SUGI was in 1983, and he has been actively involved in both SUGI and the Regional SAS Users Group community since then. He has presented papers and assisted in various capacities at SUGI, SESUG, NESUG, MWSUG, SCSUG, and PharmaSUG. Dave is an unrepentant SAS bigot. His major areas of interest are efficient programming techniques and applications development using the SAS System, particularly using Screen Control Language with SAS/AF and FRAME technology.

   

Usage of SYSDATE for Dynamic and Internal Updates during the File Maintenance Run in the Macro-Based Information System

   

Jeff F. Sun, Blue Cross Blue Shield of North Carolina, Durham, NC

   

The file maintenance for a large macro-based information system is a time-consuming and complex process. This process generally involves updating a lot of programs for each run. These updates should be logically and systematically organized so that the whole system can be smoothly and timely updated. Since most file maintenance runs are periodic and routine processes, the updates on these programs can be done internally and dynamically by the host system using the system information to gain in efficiency and accuracy. This paper provides a general approach on how to use the automatic macro variable SYSDATE in the internal and dynamic updates, i.e. file names, time periods, or text, in the macro-based information system during the file maintenance run. The logical design, information retrieve, macro coding, installation, and validation checking will be addressed in details.

   

I completed a B.S. in Electronic Engineering from Xi’an Jiaotong University, P.R.China, in 1984, a M.A. in Applied Economics (1993), and a M.A. in Applied Statistics (1995) at the Ohio State University. I have extensively used SAS system for eight years. My experience includes clinic trails, statistical programming, macro processing, statistical graphics, statistical modeling, and file management in the academic, financial, pharmaceutical, and health care units. Currently I am working primarily on MVS and Windows platforms in the Blue Cross Blue Shield of North Carolina as a health service analyst.

   

How MERGE Really Works

   

Bob Virgile, Robert Virgile Associates, Inc., Woburn, MA

   

Do your MERGEs produce unexpected results? Three basic DATA step concepts resolve the complexities of MERGE: compile and execute, the Program Data Vector, and data movement among various storage locations. These concepts apply to all DATA steps, not just those using MERGE. This paper examines these key issues in general and then applies them specifically to the MERGE process. Examples illustrate trickier aspects of MERGE as well as techniques to code around the pitfalls.

   

Bob Virgile is an independent consultant. He has written numerous papers and books, and creates the problem-solving contests for SUGI and NESUG. He also develops and teaches an exceptional series of SAS classes.

   

Tables and Views and Forms: Oh My!

   

Alan Wilson, Robert Wood Johnson Medical School, New Brunswick, NJ, and Marge Scerbo, University of Maryland, Baltimore, MD

   

New in Release 6.12, the VIEWTABLE window provides a spreadsheet-like interface for viewing and editing SAS data files. This is a FRAME entry that displays a data set in a Data Table Object. Version 6.11 of the SAS System introduced Data Tables and Data Forms as new data entry classes in the SAS/AF FRAME product. Major enhancements in Release 6.12 have made VIEWTABLE a very acceptable alternative to PROC FSVIEW for those users with graphics-capable interfaces. Two viewing modes are easily accessed, table view and form view. Moving between these modes is easy, as is customizing the appearance of columns. The Import Wizard enables creation of SAS tables from dBase, tab-delimited, or comma-separated files. The table created by VTIMPORT can be displayed in the VIEWTABLE window for further processing. VTQUERY will display the result of an interactive SQL interface. SAS data sets can be searched, subsetted, sorted and saved to a new table using VIEWTABLE. This paper will provide insight into this new product and display in online format some useful features.

   

Alan Wilson is an Associate Professor of Clinical Medicine at Robert Wood Johnson Medical School in New Brunswick, NJ. A biochemist by training, he directs the Lipid Laboratory for the Cardiology division. He has used SAS since 1984 to analyze clinical trial and disease registry data sets and he appreciates being able to see what the data looks like.

Marge Scerbo is Data Manager at the Center for Health Program Development and Management at the University of Maryland at Baltimore. Her respobsibilities include design and analysis of Medicaid and clinical data in the pursuit of better health care access for the masses. She has been a frequent presenter at regional and SUGI meetings.

   

One Trainer's Method of Workshop Design

   

Teresa Wilson, Clemson University, Clemson, SC

   

The freshwater fisheries biologists in the South Carolina Department of Natural Resources desired to learn programming SAS software in order to complete some of their basic data analyses independently. The author designed a series of workshops involving a hands-on programming experience for the biologists, as well as workbooks with examples and exercises relevant to fisheries research and management.

   

Teresa currently has a grant to provide SAS consultation, programming and training for the freshwater fisheries biologists in the South Carolina Department of Natural Resources. She is a fisheries biologist with Clemson University and assists the faculty and graduate students with SAS programming and data analysis. She also teaches a graduate level course in SAS programming.