SESUG 2014 Conference Abstracts

Application Development

Using SAS® software to shrink the Data used in Apache Flex® Application
Ahmed Al-Attar
AD-44

This paper discusses the techniques I used at the Census Bureau to overcome the issue of dealing with large amount of data while modernizing some of their public facing web applications by using Service Oriented Architecture (SOA) to deploy SAS powered Flex web applications.  Techniques that resulted in reducing 142,293 XML lines (3.6 MB) down to 15,813 XML lines (1.8 MB) a 50% size reduction on the server side (HTTP Response), and 196,167 observations down to 283 observations, a reduction of 99.8% in summarized data on the client side (XML Lookup file).


%Destroy() a Macro With Permutations
Brandon Welch and James Vaughan
AD-103

The SAS® Macro is a powerful tool.  It minimizes repetitive tasks and provides portable tools for users.  These tools are sometimes delivered to clients and a quality macro is necessary.  For example, when developed to perform a complicated statistical test, we want the macro to produce accurate results and a clean log.  To accomplish this, we insert parameter checks.  Depending on the complexity of the macro it is sometimes difficult to perform a thorough check.  We introduce the %Destroy()macro which uses the call RANPERK routine to permute a list of arguments.  These arguments are then passed to the macro you test.  We show how to add appropriate parameter checks to ensure on subsequent runs of %Destroy() the testing macro produces the desired results.  While this article targets a clinical computing audience, the techniques we present offer a good overview of macro processing that will educate SAS programmers of all levels across various disciplines.


This is the Modern World: Simple, Overlooked SAS® Enhancements
Bruce Gilsen
AD-18

At my job as a SAS ® consultant at the Federal Reserve Board, reading the SAS-L internet newsgroup, and at SAS conferences, I’ve noticed that some smaller, less dramatic SAS enhancements seem to have fallen through the cracks.  Users continue to use older, more cumbersome methods when simpler solutions are available.  Some of these enhancements were introduced in Version 9.2, but others were introduced in Version 9, Version 8, or even Version 6!  This paper reviews underutilized enhancements that allow you to more easily do the following.
  1. Write date values in the form yyyymmdd
  2. Increment date values with the INTNX function
  3. Create transport files: PROC CPORT/CIMPORT versus PROC COPY with the XPORT engine
  4. Count the number of times a character or substring occurs in a character string or the number of words in a character string
  5. Concatenate character strings
  6. Check if any of a list of variables contains a value
  7. Sort by the numeric portion of character values
  8. Retrieve DB2 data on z/OS mainframes

DFAST & CCAR: One size does not fit all
Charyn Faenza
AD-127

In 2014, for the first time, mid-market banks (consisting of banks and bank holding companies with 10-50 bn in consolidated assets) were required to submit Capital Stress Tests to the federal regulators under the Dodd-Frank Wall Street Reform and Consumer Protection Act (DFAST).  This is a process large banks have been going through since 2011; however, mid-market banks are not positioned to commit as many resources to their annual stress tests as their largest peers.  Limited human and technical resources, incomplete or non-existent detailed historical data, lack of enterprise-wide cross functional analytics teams, and limited exposure to rigorous model validations, are all challenges mid-market banks face.  While there are fewer deliverables required from the DFAST banks, the scrutiny the regulators are placing on the analytical modes is just as high as their expectations for CCAR banks.  This session is designed to discuss the differences in how DFAST and CCAR banks execute their stress tests, the challenges facing DFAST banks, and potential ways DFAST banks can leverage the analytics behind this exercise.


PROC RANK, PROC SUMMARY and PROC FORMAT Team Up and a Legend is Born!
Christianna Williams
AD-73

The task was to produce a figure legend that gave the quintile ranges of a continuous measure corresponding to each color on a five-color choropleth map.  Actually, figure legends for several dozen maps for several dozen different continuous measures and time periods…so, the process needed to be automated.  A method was devised using PROC RANK to generate the quintiles, PROC SUMMARY to get the data value ranges within each quintile, and PROC FORMAT (with the CNTLIN= option) to generate and store the legend labels.  And then, of course, these were rolled into a few macros to apply the method for the many different figure legends.  Each part of the method is quite simple – even mundane – but together these techniques allowed us to standardize and automate an otherwise very tedious process.  The same basic strategy could be used whenever one needs to dynamically generate data “buckets” but then keep track of the bucket boundaries – whether for producing labels or legends or so that future data can be benchmarked against the stored categories.


Useful Tips for Building Your Own SAS® Cloud
Danny Hamrick
AD-146

Everyone has heard about SAS® Cloud.  Now come learn how you can build and manage your own cloud using the same SAS® virtual application (vApp) technology.


More Hash: Some Unusual Uses of the SAS Hash Object
Haikuo Bian, Carlos Jimenez and David Maddox
AD-102

Since the introduction of the SAS Hash Object in SAS 9.0 and recent enhancements, the popularity of the methodology has been grown.  The significant effects of the technique in conjunction with the large memory capacity of modern computing devices has brought new and exciting capabilities to the data step.  The most often cited application of the SAS Hash Object is table lookup.  This paper will highlight several unusual applications of the methodology including random sampling, “sledge-hammer matching”, anagram searching, dynamic data splitting, matrix computation, and unconventional transposing.


Your Database can do SAS too!
Harry Droogendyk
AD-57

How often have you pulled oodles of data out of the corporate data warehouse down into SAS for additional processing?  Additional processing, sometimes thought to be uniquely SAS's, such as FIRST. logic, cumulative totals, lag functionality, specialized summarization or advanced date manipulation?  Using the Analyical / OLAP and Windowing functionality available in many databases (e.g. Teradata, Netezza ) all of this processing can be performed directly in the database without moving and reprocessing detail data unnecessarily.

This presentation will illustrate how to increase your coding and execution efficiency by utilizing the database's power through your SAS environment.


Moving Data and Results Between SAS® and Microsoft Excel
Harry Droogendyk
AD-58

Microsoft Excel spreadsheets are often the format of choice for our users, both when supplying data to our processes and as a preferred means for receiving processing results and data.  SAS® offers a number of ways to import Excel data quickly and efficiently.  There are equally flexible methods to move data and results from SAS to Excel  This paper will outline the MANY techniques available and identify useful tips for moving data and results between SAS and Excel efficiently and painlessly.


Before and After: Implementing a Robust Outlier Identification Routine using SAS®
Jack Shoemaker
AD-126

Due to the long memory of the Internet, the author still receives frequent questions about a paper presented in the early 1990s that described a set of SAS ® macros to implement Tukey’s robust outlier (non-parametric) methods.  The UNIVARIATE procedure formed the core of these macros.  The paper was done prior to the advent of the Output Delivery System (ODS) and SAS Enterprise Guide™ (SAS/EG).  As a way of demonstrating how SAS technologies have evolved and improved over time, this paper starts with that original 1990 implementation and then implements the same methods taking advantage first of ODS and then the data-analysis features built into SAS/EG.


SAS® Debugging 101
Kirk Paul Lafler
AD-38

SAS® users are almost always surprised to discover their programs contain bugs.  In fact, when asked users will emphatically stand by their programs and logic by saying they are bug free.  But, the vast number of experiences along with the realities of writing code says otherwise.  Bugs in software can appear anywhere; whether accidentally built into the software by developers, or introduced by programmers when writing code.  No matter where the origins of bugs occur, the one thing that all SAS users know is that debugging SAS program errors and warnings can be a daunting, and humbling, task.  This presentation explores the world of SAS bugs, providing essential information about the types of bugs, how bugs are created, the symptoms of bugs, and how to locate bugs.  Attendees learn how to apply effective techniques to better understand, identify, and repair bugs and enable program code to work as intended.


Top Ten SAS® Performance Tuning Techniques
Kirk Paul Lafler
AD-39

The Base-SAS® software provides users with many choices for accessing, manipulating, analyzing, and processing data and results.  Partly due to the power offered by the SAS software and the size of data sources, many application developers and end-users are in need of guidelines for more efficient use.  This presentation highlights my personal top ten list of performance tuning techniques for SAS users to apply in their applications.  Attendees learn DATA and PROC step language statements and options that can help conserve CPU, I/O, data storage, and memory resources while accomplishing tasks involving processing, sorting, grouping, joining (merging), and summarizing data.


The Power of SAS® Macro Programming – One Example
Milorad Stojanovic
AD-114

When we are using SAS macro programming, macro tools and features can make our life more difficult at the beginning and a lot easier at the end.  We should envision the work of macros in different combinations of input data and relationships between variables.  Also macro code should prevent the processing data if ‘critical’ files are missing.  In this paper we present examples of creating macro variables, using one or more ampers (&), creating dynamic SAS code, using %Sysfunc, %IF, %THEN, %ELSE and delivering flexible reports.  Code is data driven by using macro programming tools.


The New Tradition: SAS® Clinical Data Integration
Vincent Amoruccio
AD-138

Base SAS® programming has been around for a very long time.  And over that time, there have been many changes.  New and enhanced procedures, new features, new functions and even operating systems have been added.  Over time, there have been many windows and wizards that help to more easily generate code that can be used in programs.  Through it all, programmers always come back to their SAS roots, the basic programs with which they started.  But, as we move into the future, is this the best use of time, to sit and manually code everything?  Or can we take advantage of the new tools and solutions that generate code and use metadata to describe data, validate output and document exactly what the programmer has done.  This paper will show you how we can change the current process using the graphical user interface of SAS Clinical Data Integration to integrate data from disparate data sources and transform that data into industry standards in a methodical, repeatable, more automated fashion.



Building Blocks

Hidden in Plain Sight: My Top Ten Underpublicized Enhancements in SAS ® Versions 9.2 and 9.3
Bruce Gilsen
BB-17

SAS ® Versions 9.2 and 9.3 contain many interesting enhancements.  While the most significant enhancements have been widely publicized in online documentation, conference papers, the SAS-L internet newsgroup/listserv, and elsewhere, some smaller enhancements have received little attention.  This paper reviews my ten favorite underpublicized features.

SAS®, Excel®, and JMP® Connectivity — HOW
Charlie Shipp and Kirk Paul Lafler
BB-123

Microsoft Excel is the most used software on planet Earth — the importance of connectivity with Excel is increasingly important to everyone.  JMP is the best in the world for statistical graphics and data discovery; and SAS software is the gold standard for robust and reliable statistical analysis!  Combine these three heavyweight software products with easy connectivity and you have a profound competitive edge.  Depending on requirements, your (1) input, (2) discovery and analysis, and (3) final display and reporting can begin with any of the three and end with any of the three.  We demonstrate the most likely paths that emphasizes SAS and JMP capabilities.  You will leave the workshop appreciating the many possibilities to utilize Excel with SAS and JMP, including using the powerful Output Delivery System.


PROC SQL for PROC SUMMARY Stalwarts
Christianna Williams
BB-69

One of the endlessly fascinating features of SAS is that the software often provides multiple ways to accomplish the same task.  A perfect example of this is the aggregation and summarization of data across multiple rows “BY groups” of interest.

These groupings can be study participants, time periods, geographical areas, or really just about any type of discrete classification that one desires.  While many SAS programmers may be accustomed to accomplishing these aggregation tasks with PROC SUMMARY (or equivalently, PROC MEANS), PROC SQL can also do a bang-up job of aggregation – often with less code and fewer steps.  The purpose of this step-by-step paper is to explain how to use PROC SQL for a variety of summarization and aggregation tasks, and will use a series of concrete, task-oriented examples to do so.  For each example, both the PROC SUMMARY method and the PROC SQL method will be presented, along with discussion of pros and cons of each approach.  Thus, the reader familiar with either technique can learn a new strategy that may have benefits in certain circumstances.  The presentation style will be similar to that used in the author’s previous paper, “PROC SQL for DATA Step Die-Hards”.


FORMATs Top Ten
Christianna Williams
BB-70

SAS FORMATs can be used in so many different ways!  Even the most basic FORMAT use of modifying the way a SAS data value is displayed (without changing the underlying data value) holds a variety of nifty tricks, such as nesting formats, formats that affect various style attributes (such as color, font, etc.), and conditional formatting.  Add in PICTURE formats, multi-label FORMATs, using FORMATs for data cleaning, and FORMATs for joins and table look-ups, and we have quite a bag of tricks for the humble SAS FORMAT and the PROC FORMAT used to generate them.  The purpose of this paper is to describe a few handfuls of very useful programming techniques that employ SAS FORMATs.  While this paper will be appropriate for the newest SAS user, it will also focus on some of the lesser-known features of FORMATs and PROC FORMAT and so should be useful for even quite experienced users of SAS.


A Non-Standard Report Card - Informing Parents About What Their Children Know
Daniel Ralyea
BB-122

A non-traditional use of the power of SAS.  A typical report card lists the subject and a letter or a number grade.  It does not identify the skills that lead to that grade.  SAS allows us to read the grades from PowerSchool's Oracle database, combine in test scores from an outside vendor and summarize multiple grades into more generalized standards.  Using SAS ODS individual report cards are sorted by school and teacher and printed for each student.


A Quick View of SAS Views
Elizabeth Axelrod
BB-63

Looking for a handy technique to have in your toolkit? Consider SAS® Views, especially if you work with large datasets.  After a brief introduction to Views, I’ll show you several cool ways to use them that will streamline your code and save workspace.


Combining Multiple Date-Ranged Historical Data Sets with Dissimilar Date Ranges into a Single Change History Data Set
Jim Moon
BB-66

This paper describes a method that uses some simple SAS® macros and SQL to merge data sets containing related data that contains rows with varying effective date ranges.  The data sets are merged into a single data set that represents a serial list of snapshots of the merged data, as of a change in any of the effective dates.  While simple conceptually, this type of merge is often problematic when the effective date ranges are not consecutive or consistent, or when the ranges overlap, or when there are missing ranges from one or more of the merged data sets.  The technique described is used by the Fairfax County Human Resources Department to combine various employee data sets (Employee Name and Personal Data, Personnel Assignment and Job Classification, Personnel Actions, Position-Related data, Pay Plan and Grade, Work Schedule, Organizational Assignment, and so on) from the County's SAP-HCM ERP system into a single Employee Action History/Change Activity file for historical reporting purposes.  The technique currently is used to combine nineteen data sets, but is easily expandable by inserting a few lines of code using the existing macros.


PROC TRANSPOSE® For Fun And Profit
John Cohen
BB-59

Occasionally we are called upon to transform data from one format into a “flipped,” sort of mirror image.  Namely if the data were organized in rows and columns, we need to transpose these same data to be arranged instead in columns and rows.  A perfectly reasonable view of incoming lab data, ATM transactions, or web “click” streams may look “wrong” to us.  Alternatively extracts from external databases and production systems may need massaging prior to proceeding in SAS®.  Finally, certain SAS procedures may require a precise data structure, there may be particular requirements for data visualization and graphing (such as date or time being organized horizontally/along the row rather than values in a date/time variable), or the end user/customer may have specific deliverable requirements.

Traditionalists prefer using the DATA step and combinations of Array, Retain, and Output statements.  This approach works well but for simple applications may require more effort than is necessary.  For folks who intend to do much of the project work in, say, MS/Excel®, the resident transpose option when pasting data is a handy short cut.  However, if we want a simple, reliable method in SAS which once understood will require little on-going validation with each new run, then PROC TRANSPOSE is a worthy candidate.  We will step through a series of examples, elucidating some of the internal logic of this procedure and its options.  We will also touch on some of the issues which cause folks to shy away and rely on other approaches.


The Nuances of Combining Hospital Data
Jontae Sanders, Charlotte Baker and Perry Brown
BB-100

Hospital data can be used for the surveillance of various health conditions in a population.  To maximize our ability to tell the story of a population's health, it is often necessary to combine multiple years of data.  This step can be tedious as there are many factors to take into account such as changes in variable names or data formats between years.  Once you have resolved these issues, the data can be successfully combined for analysis.  This paper will demonstrate many factors to look for and how to handle them when combining data from hospitals.


Move over MERGE, SQL and SORT. There is a faster game in town! #Hash Table
Karen Price
BB-121

The purpose of this paper and presentation is to introduce the basics of what a hash table is and to illustrate practical applications of this powerful Base SAS® DATA Step construct.  We will highlight features of the hash object and show examples of how these features can improve programmer productivity and system performance of table lookup and sort operations.  We will show relatively simply code to perform typical “look-up” match-merge usage as well as a method of sorting data through hashing as an alternative to the SORT procedure.


Point-and-Click Programming Using SAS® Enterprise Guide®
Kirk Paul Lafler and Mira Shapiro
BB-34

SAS® Enterprise Guide® (EG) empowers organizations with all the capabilities that SAS has to offer.  Programmers, business analysts, statisticians and end-users have a powerful graphical user interface (GUI) with built-in wizards to perform reporting and analytical tasks, access to multi-platform enterprise data sources, deliver data and results to a variety of mediums and outlets, construct data manipulations without the need to learn complex coding constructs, and support data management and documentation requirements.  Attendees learn how to use the GUI to access tab-delimited and Excel input files; subset and summarize data; join two or more tables together; flexibly export results to HTML, PDF and Excel; and visually manage projects using flowcharts and diagrams.


Formatting Data with Metadata – Easy and Powerful
Leanne Tang
BB-113

In our organization a lot of efforts are put into building and maintaining our organizational level metadata databases.  Metadata databases are used to store the information, or metadata, about our data.  Many times we have to “decipher” the keys and codes associate with our data so that they can be presented to our data users for data analysis.  One of options to interpret our data is to generate user defined formats from our metadata using Proc Format.  One advantage of using formats is that we do not have to create a new SAS dataset for data lookup.  The second advantage is that the formats generated can be used in any programs in need of data interpretation.  The best advantage is that, without maintaining the metadata myself, I can generate the formats with the most up-to-date information available in the metadata database with a simple proc format execution.  In this paper we are going to explore some of the powerful options available in proc format procedure and how we apply the formats generated from our metadata to our data.


SAS® Macro Magic is not Cheesy Magic! Let the SAS Macros Do the Magic of Rewriting Your SAS Code
Robert Williams
BB-105

Many times, we need to rewrite weekly or monthly SAS programs to change certain key statements such as the conditions inside the WHERE statements, import/export file paths, reporting dates and SAS data set names.  If we hard coded these statements, it is cumbersome and a chore to read through the SAS code to rewrite these hard-coded statements.  Sometimes, we might miss an important statement that needs to be re-coded resulting in inaccurate data extract and reports.  This paper will show how the SAS Magic Macros can streamline and eliminate the process rewriting the SAS codes.  Two types of SAS Macros will be reviewed with examples:
  1. Defining SAS Macro variables with values using %LET to be rewritten in the SAS statement using & sign.
  2. Creating SAS Macro programming using %MACRO and %MEND to write a series of SAS statements to be rewritten in the SAS code.
You will be amazed how useful the SAS Magic Macro is for many of your routine weekly and monthly reports.  Let the SAS Magic Macro relieve you of the tedious task of rewriting many of the SAS statements!


Flat Pack Data: Converting and ZIPping SAS® Data for Delivery
Sarah Woodruff
BB-25

Clients or collaborators often need SAS data converted to a different format.  Delivery or even storage of individual data sets can become cumbersome, especially as the number of records or variables grows.  The process of converting SAS data sets into other forms of data and saving files into compressed ZIP storage has become not only more efficient, but easier to integrate into new or existing programs.  This paper describes and explores various methods to convert SAS data as well as effective strategies to ZIP data sets along with any other files that might need to accompany them.

PROC IMPORT and PROC EXPORT have been long standing components of the SAS toolbox, so much so that they have their own wizards, but understanding their syntax is important to effectively use them in code being run in batch or to include them in programs that may be run interactively but “hands free”.  The syntax of each is described with a particular focus on moving between SAS, STATA and SPSS, though some attention is also given to Excel.  Once data sets and their attendant files are ready for delivery or need to be put into storage, compressing them into ZIP files becomes helpful.  The process of using ODS PACKAGE to create such ZIP files is laid out and can be connected programmatically to the creation of the data sets or documents in the first place.


The Power of PROC APPEND
Ted Logothetti
BB-33

PROC APPEND is the fastest way to concatenate SAS® data sets.  This paper discusses some of the features of this procedure, including how much it lessens processing time, some tips and tricks, and a correction to the online SAS® documentation.  It also lists some limitations of the procedure.



Coder's Corner

Creating a Hyperbolic Graph Using the SAS® Annotate Facility
Bill Bland and Liza Thompson
CC-129

In order to optimize their rate design, electric utilities analyze their customers’ bills and costs for electricity by looking at the each hour’s use of demand.  These graphs of kWh energy usage versus hours of use are produced on a monthly basis (from 0 to 730 hours).  The resulting graphs are typically curvilinear.  To allow for an easier rate and cost comparison, we wanted the ability to plot hyperbolic hours, as well as cost curves, on the same graph.  To do this, we applied a hyperbolic transformation to the hours use axis.  This linearized the graph and made it easier to interpret.  For our analysis, it is necessary to graph cost and price versus the hyperbolic axis, but at the same time show the original hours use axis.  Proc GPLOT does not allow multiple X axes.  Therefore, we solve the problem using the annotate facility.  In this presentation, we will show you a step-by-step example of how we changed cost graphs for easier analysis and explain the code we used.


Debugging SAS ® code in a macro
Bruce Gilsen
CC-19

Debugging SAS ® code contained in a macro can be frustrating because the SAS error messages refer only to the line in the SAS log where the macro was invoked.  This can make it difficult to pinpoint the problem when the macro contains a large amount of SAS code.

Using a macro that contains one small DATA step, this paper shows how to use the MPRINT and MFILE options along with the fileref MPRINT to write just the SAS code generated by a macro to a file.  The ""de-macroified"" SAS code can be easily executed and debugged.


Using PROC FCOMP to Do Fuzzy Name and Address Matching
Christy Warner
CC-119

This paper discussions how to utilize PROC FCOMP to create your own fuzzy-matching functions.  Name and address matching are a common task performed among SAS programmers, and this presentation will provide some code and guidance on how to handle tough name-matching exercises.  The code shared in this presentation has been utilized to cross-check with the List of Excluded Individuals and Entities (LEIE) file, maintained by the Department of Health and Human Services' OIG.  It has also been incorporated to match against the Specially Designated Nationals (SDN) Terrorist Watch List and in name-matching for Customs and Border Protection (CBP).  * Christy Warner is a Senior Associate with Integrity Management Services, LLC (IMS) and has 22 years of SAS programming experience, as well as a degree in Math and a minor in Statistics.  She has served as the Deputy Project Director of a Medicaid Integrity Contract (MIC) Audit, and has spent the last 14 years developing algorithms to identify healthcare fraud, waste, and abuse.


Manage Variable Lists for Improved Readability and Modifiability
David Abbott
CC-76

Lists of variables occur frequently in statistical analysis code, for example, lists of explanatory variables, variables used as rows in demographics tables, and so forth.  These lists may be long, say 10-30 or more variables and the same list, or a major portion of it, may occur in multiple places in the code.  The lists are often replicated using cut and paste by the programmer during program composition.  Readers of the code may find themselves doing repeated “stare and compare” to determine if the list in location A is really the same list as in location B or location C.  Simply adding a variable to the list may require changing numerous lines of code since the list occurs in the code numerous times.  If managed naively, variable lists can impair code readability and modifiability.

The SAS macro facility provides the tools needed to eliminate repeated entry of lengthy variable lists.  Related groups of variables can be assigned to macro variables and the macro variables concatenated as needed to generate the list of variables needed at different points in the code.  Certain SAS macros can be used to programmatically alter the list, for example, remove specific variables from the list (not needed for a given regression) or change the delimiter character to comma (when the list is used with PROC SQL).  The macro variable names can express the purpose of the groups of variables, e.g. ExplanVars, OutcomeVars, DemographicOnlyVars, etc..  Employing this approach makes data analysis code easier to read and modify.


Integrating Data and Variables between SAS® and R via PROC IML: Enable Powerful Functionalities for Coding in SAS®
Feng Liu
CC-106

Programming in R provides additional features and functions which augments SAS procedures for many statisticians or scientists in areas like bioinformatics, finance, education etc.  It is a high demanding feature for SAS users to be able to call R within their SAS programs.  However, existing papers show how to import R data set into SAS or vice verse, it lacks a comprehensive solution to transfer more formats of variables other than data set.  In this paper, we present solutions to use PROC IML to interfaces R software which enables transferring variables and data set transparently between SAS and R.  We also provide examples of calling R functions directly in SAS which offers much flexibility for coding in SAS, especially for big projects involving intensive coding.  This can also be used to pass parameters from SAS to R.  In this paper, you will see a step-by-step demonstration of a SAS project which integrates calling R functions via PROC IML.


How to Build a Data Dictionary – In One Easy Lesson
Gary Schlegelmilch
CC-32

In the wonderful world of programming, the Child Left Behind is usually documentation.  The requirements may be thoroughly analyzed (usually on a combination of phoned-in notes, e-mails, draft documents, and the occasional cocktail napkin).  Design is often on the fly, due to various restraints, deadlines, and in-process modifications.  And doing documentation after the fact, once the program is running, is, well, a great idea – but it often doesn’t happen.

Some software tools allow you to build flow diagrams and descriptions from existing code and/or comments embedded in the program.  But in a recent situation, there was a lament that a system that had been running in the field for quite a while had no Data Dictionary – and one would be really handy for data standardization and data flow.  SAS to the rescue!


Hands Free: Automating Variable Name Re-Naming Prior to Export
John Cohen
CC-61

Often production datasets come to us with data in the form of rolling 52 weeks, 12 or 24 months, or the like.  For ease of use, the variable names may be generic (something like VAR01, VAR02, etc., through VAR52 or VAR01 through VAR12), with the actual dates corresponding to each column being maintained in some other fashion – often in the variable labels, a dataset label, or some other construct.  Not having to re-write your program each week or month to properly use these data is a huge benefit.

Until, however, you may need to capture the date information to properly document – in the variable names (so far VAR01, VAR02, etc.) – prior to, say, exporting to MS/Excel® (where the new column names may instead need to be JAN2011, FEB2011, etc.).  If the task of creating the correct corresponding variable names/column names each week or month were a manual one, the toll on efficiency and accuracy could be substantial.

As an alternative we will use an approach using a “program-to-write-a-program” to capture date information in the incoming SAS® dataset (from two likely alternate sources) and have our program complete the rest of the task seamlessly, week-after-week (or month-after-month).  By employing this approach we can continue to use incoming data with generic variable names.


Simple Rules to Remember When Working with Indexes
Kirk Paul Lafler
CC-37

SAS® users are always interested in learning techniques related to improving data access.  One way of improving information retrieval is by defining an index consisting of one or more columns that are used to uniquely identify each row within a table.  Functioning as a SAS object, an index can be defined as numeric, character, or a combination of both.  This presentation emphasizes the rules associated with creating effective indexes and using indexes to make information retrieval more efficient.


Interacting with SAS using Windows PowerShell ISE
Mayank Nautiyal
CC-135

The most conventional method of using SAS on a Windows environment is via a GUI application.  There are numerous SAS users who have a UNIX background and can definitely take advantage of the Windows PowerShell ISE to gain job eficiency.  The Windows PowerShell Integrated Scripting Environment (ISE) is a host application for Windows PowerShell.  One can run commands, write, test, and debug scripts in a single Windows-based graphic user interface with multiline editing.  This paper will demonstrate how frequently used SAS procedures can be scripted and submitted at the PowerShell command prompt.  Job scheduling and submission for batch processing will also be illustrated.


Let SAS® Do the Coding for You
Robert Williams
CC-104

Many times, we need to create the same reports going to different groups based on the group’s subset of queried data or we have to develop many repetitive SAS codes such as a series of IF THEN ELSE statements or a long list of different conditions in a WHERE statement.  It is cumbersome and a chore to manually write and change these statements especially if the reporting requirements change frequently.  This paper will suggest methods to streamline and eliminate the process of writing and copying/pasting your SAS code to be modified for each requirement change.  Two techniques will be reviewed along with a listing of key words in a SAS dataset or an Excel® file:
  1. Create code using the DATA _NULL_ and PUT statements to an external SAS code file to be executed with %INCLUDE statement.
  2. Create code using the DATA _NULL_ and CALL SYMPUT to write SAS codes to a macro variable.
You will be amazed how useful this process is for hundreds of routine reports especially on a weekly or monthly basis.  RoboCoding is not just limited to reports; this technique can be expanded to include other procedures and data steps.  Let the RoboCoder do the repetitive SAS coding work for you!


@/@@ This Text file. Importing Non-Standard Text Files using @,@@ and / Operators
Russsell Woods
CC-45

SAS in recent years has done a fantastic job at releasing newer and more powerful tools to the analyst and developer tool boxes, but these tools are only as effective as the data that is available to them.  Most of you would have no trouble dealing with a simple delimited or column formatted text file.  However, data can often be provided in a non-standard format that must be parsed before analysis can be performed or the final results can have very specific non-standard formatting rules that must be adhered to when delivering.  In these cases we can use some simple SAS operators, ‘@’, ‘@@’ and ‘/’ in conjunction with conditional statements to interact with any flat file that has some kind of distinguishable pattern.  In this presentation I will demonstrate the step-by-step process I used to analyze several non-standard text files and create import specifications capable of importing them into SAS.  It is my hope that once you master these techniques it will not matter if you are preparing an audit report for the federal government or a side-effects analysis for the FDA you will easily be able to accommodate any specifications they may have.


VBScript Driven Automation in SAS®: A Macro to Update the Text in a Microsoft® Word Document Template at Preset Bookmarks
Shayala Gibbs
CC-101

SAS® can harness the power of Visual Basic Scripting Edition (VBScript) to programmatically update Microsoft Office® documents.  This paper presents a macro developed in SAS to automate updates to a Microsoft Word document.  The useful macro invokes VBScript to pass text directly from a SAS data set into a predefined bookmark in an existing template Word document.


Searching for (and Finding) a Needle in a Haystack: A Base Macro-Based SAS Search Tool to Facilitate Text Mining and Content Analysis through the Production of Color-Coded HTML Reports
Troy Hughes
CC-94

Text mining describes the discovery and understanding of unstructured, semi-structured, or structured textual data.  While SAS® Text Miner presents a comprehensive solution to text mining and content analysis, simpler business questions may warrant a more straightforward solution.  When first triaging a new data set or database of unknown content, a profile and categorization of the data may be a substantial undertaking.  An initial analytic question may include a request to determine if a word or phrase is found “somewhere” within the data, with what frequency, and in what fields and data sets.   This text describes an automated text parsing Base SAS tool that iteratively parses SAS libraries and data sets in search of a single word, phrase, or a list of words or phrases.  Results are saved to an HTML file that displays the frequency, location, and search criteria highlighted in context.



Hands On Workshop

SAS Enterprise Guide for Institutional Research and Other Data Scientists
Claudia McCann
How-82

Data requests can range from on-the-fly, need it yesterday, to extended projects taking several weeks or months to complete.  Often institutional researchers and other data scientists are juggling several of these analytic needs on a daily basis, i.e., taking a break from the longitudinal report on retention and graduation to work on responding to a USN&WR survey to answering the simple 5 minute data query question from an administrator.  SAS Enterprise Guide is a terrific tool for handling multiple projects simultaneously.  This Hands On Workshop is designed to walk the data analyst through the process of setting up a project, accessing data from several sources, merging the datasets, and running the analyses to generate the data needed for the particular project.  Specific tasks covered are pulling SAS datasets and Excel files into the project, exploring several facets of the ever-so-powerful Query Builder, and utilizing several quick and easy descriptive statistical techniques in order to get the desired results.


A Tutorial on the SAS® Macro Language
John Cohen
How-60

The SAS Macro language is another language that rests on top of regular SAS code.  If used properly, it can make programming easier and more fun.  However, not every program is improved by using macros.  Furthermore, it is another language syntax to learn, and can create problems in debugging programs that are even more entertaining than those offered by regular SAS.

We will discuss using macros as code generators, saving repetitive and tedious effort, for passing parameters through a program to avoid hard coding values, and to pass code fragments, thereby making certain tasks easier than using regular SAS alone.  Macros facilitate conditional execution and can be used to create program modules that can be standardized and re-used throughout your organization.  Finally, macros can help us create interactive systems in the absence of SAS/AF® or SAS/Intrnet®.

When we are done, you will know the difference between a macro, a macro variable, a macro statement, and a macro function.  We will introduce interaction between macros and regular SAS language, offer tips on debugging macros, and discuss SAS macro options.


Store and Recall Macros with SAS Macro Libraries
John Myers
How-71

When you store your macros in a SAS macro library, you can recall the macros with your SAS programs and share them with other SAS programmers.  SAS macro libraries help you by reducing the time it takes to develop new programs by using code that has been previously tested and verified.  Macro libraries help you to organize your work by saving sections of code that you can reuse in other programs.  Macro libraries improve your macro writing skills by focusing on a specific task for each macro.  Macro libraries are not complicated – they are just a way to store macros in a central location.  This presentation will give examples of how you can build macro libraries using %INCLUDE files, AUTOCALL library, and STORED COMPILED MACRO library.


Application Development Techniques Using PROC SQL
Kirk Paul Lafler
How-35

Structured Query Language (SQL) is a database language found in the base-SAS software.  It permits access to data stored in data sets or tables using an assortment of statements, clauses, options, functions, and other language constructs.  This Hands On Workshop illustrates core concepts as well as SQL’s many applications, and is intended for SAS users who desire an overview of this exciting procedure’s capabilities.  Attendees learn how to construct SQL queries, create complex queries including inner and outer joins, apply conditional logic with case expressions, create and use views, and construct simple and composite indexes.


The DoW-Loop
Paul Dorfman and Lessia Shajenko
How-115

The DoW-loop is a nested, repetitive DATA step structure enabling you to isolate instructions related to a certain break event before, after, and during a DO-loop cycle in a naturally logical manner.  Readily recognizable in its most ubiquitous form by the DO UNTIL(LAST.ID) construct, which readily lends itself to control-break processing of BY-group data, the DoW-loop's nature is more morphologically diverse and generic.  In this workshop, the DoW-loop's logic is examined via the power of example to reveal its aesthetic beauty and pragmatic utility.  In some industries like Pharma, where flagging BY-group observations based on in-group conditions is standard fare, the DoW-loop is an ideal vehicle greatly simplifying the alignment of business logic and SAS code.  In this Hands On Workshop, the attendees will have an opportunity to investigate the program control of the DoW-loop step by step using the SAS DATA step debugger and learn of a range of nifty practical applications of the DoW-loop.


Reliably Robust: Best Practices for Automating Quality Assurance and Quality Control Methods into Software Design
Troy Hughes
How-95

An often objective of SAS development is the generation of autonomous, automated processes that can be scheduled for recurring execution or confidently run with push-button simplicity.  While the adoption of SAS software development best practices most confidently predicts programmatic success, robust applications nevertheless require a quality management strategy that incorporates both quality assurance (QA) and quality control (QC) methods.  To the extent possible, these methods both ensure and demonstrate process success and product validation while minimizing the occurrence and impact of environmental and other exceptions that can cause process failure.  QA methods include event handling that drives program control under normal functioning, exception handling (e.g., error trapping) that identifies process failure and may initiate a remedy or graceful termination, and post hoc analysis of program logs and performance metrics.  QC methods conversely identify deficits in product (e.g. data set) availability, validity, completeness, and accuracy, and are implemented on input and output data sets as well as reports and other output.  QC methods can include data constraints, data structure validation, statistical testing for outliers and aberrant data, and comparison of transactional data sets against established norms and historical data stores.  The culmination of any quality management strategy prescribes the timely communication of failures (or successes) to stakeholders through alerts, report generation, or a real-time dashboard.  This text describes the advantages and best practices of incorporating a comprehensive quality management strategy into SAS development, as well as the more challenging transformation of error-prone legacy code into a robust, reliable application.



Pharma & Healthcare

Time Series Mapping with SAS®: Visualizing Geographic Change over Time in the Health Insurance Industry
Barbara Okerson
PH-22

Changes in health insurance and other industries often have a spatial component.  Maps can be used to convey this type of information to the user more quickly than tabular reports and other non-graphical formats.  SAS® provides programmers and analysts with the tools to not only create professional and colorful maps, but also the ability to display spatial data on these maps in a meaningful manner that aids in the understanding of the changes that have transpired.  This paper illustrates the creation of a number of different maps for displaying change over time with examples from the health insurance arena.


You've used FREQ, but have you used SURVEYFREQ?
Charlotte Baker
PH-86

PROC FREQ is a well utilized procedure for descriptive statistics.  If the data being analyzed is from a complex survey sample, it is best to use the PROC SURVEYFREQ procedure instead.  Other than the SAS documentation on PROC SURVEYFREQ, few user examples exist for how to perform analyses using this procedure.  This paper will demonstrate why PROC SURVEYFREQ should be used and how to implement it in your research.


A Comprehensive Automated Data Management System for Clinical Trials
Heather Eng, Jason Lyons and Theresa Sax
PH-125

A successful data coordinating center for multicenter clinical trials and registries must provide timely, individualized, and frequent feedback to investigators and study coordinators over the course of data collection.

Investigators require up-to-date reports to help them monitor subject accrual and retention, randomization balance, and patient safety markers.  Study coordinators need to know what data are expected or are delinquent, and they need to know about errors to be corrected, such as missing data or data that don’t pass validity and logical consistency tests.  Frequent monitoring can reveal systemic issues in procedures that require remedial adjustments to keep the project from being at risk.

Data managers at the University of Pittsburgh’s Epidemiology Data Center in the Graduate School of Public Health have developed an integrated system to collect and import data from multiple and disparate sources into a central relational database, subject it to comprehensive quality control procedures, create reports accessible on the web, and email individualized reports to investigators and study coordinators, all on an automated and scheduled basis.

Post-hoc routines monitor execution logs, so that unexpected errors and warnings are automatically emailed to data managers for same-day review and resolution.

The system is developed almost exclusively using SAS® software.  While SAS® is best known among clinical trialists as statistical software, its strength as a data management tool should not be overlooked.  With its strong and flexible programming capabilities for data manipulation, reporting and graphics, web interfacing, and emailing, it provides the necessary infrastructure to serve as a single platform for the management of data collected in clinical trials and registries.

This paper will describe the modules of the system and their component programs as they were developed for the Computer-Based Cognitive-Behavioral Therapy (CCBT) Trial currently underway at the University of Louisville and the University of Pennsylvania, with data coordination at the University of Pittsburgh.


Using SAS® to Analyze the Impact of the Affordable Care Act
John Cohen and Meenal (Mona) Sinha
PH-28

The Affordable Care Act being implemented in 2014 is expected to fundamentally reshape the health care industry.  All current participants--providers, subscribers, and payers--will operate differently under a new set of key performance indicators (KPIs).  This paper uses public data and SAS® software to illustrate an approach to creating a baseline for the health care industry today so that structural changes can be measured in the future to assess the impact of the new law.


Using SAS/STAT to Implement A Multivariate Adaptive Outlier Detection Approach to Distinguish Outliers From Extreme Values
Paulo Macedo
PH-89

A standard definition of outlier states that “an outlier is an observation that deviates so much from other observations as to arouse the suspicion that it was generated by a different mechanism” (Hawkins, 1980).  To identify outliers in the data a classic multivariate outlier detection approach implements the Robust Mahalanobis Distance Method by splitting the distribution of distance values in two subsets (within-the-norm and out-of-the-norm): the threshold value is usually set to the 97.5% Quantile of the Chi-Square distribution with p (number of variables) degrees of freedom and items whose distance values are beyond it are labeled out-of-the-norm.  This threshold value is an arbitrary number, though, and it may flag as out-of-the-norm a number of items that are indeed extreme values of the baseline distribution rather than outliers coming from a “contaminating” distribution.  Therefore, it is desirable to identify an additional threshold, a cutoff point that divides the set of out-of-norm points in two subsets - extreme values and outliers.

One way around the issue, in particular for large databases, is to increase the threshold value to another arbitrary number but this approach requires taking into consideration the size of the dataset as that size is expected to affect the threshold separating outliers from extreme values.  As an alternative, a 2003 article by D. Gervini (Journal of Multivariate Statistics) proposes “an adaptive threshold that increases with the number of items N if the data is clean but it remains bounded if there are outliers in the data”.

This paper implements Gervini’s adaptive threshold value estimator using PROC ROBUSTREG and the SAS Chi-Square functions CINV and PROBCHI, available in the SAS/STAT environment.  It also provides data simulations to illustrate the reliability and the flexibility of the method in distinguishing true outliers from extreme values.


Impact of Affordable Care Act on Pharmaceutical and Biotech Industry
Salil Parab
PH-80

On March 23, 2010, President Obama signed The Patient Protection and Affordable Care Act (PPACA), commonly called the Affordable Care Act (ACA).  The law has made historic changes to the health care system in terms of coverage, cost, and quality of care.  This paper will discuss the impact the law will have on the pharmaceutical and biotech industry.

ACA imposes several costs on pharmaceutical, biotech and related businesses.  A fee will be imposed on each covered entity that manufactures or imports branded prescription drugs with sales of over $5 million to specified government programs.  In addition to the fees on branded prescription drug sales, ACA imposes a 2.3% excise tax on sale of medical devices.  The tax is levied on the manufacturer or importer before a medical device is sent to the wholesaler or hospital.  Prior to ACA, prescription drug manufacturer had to pay a rebate under Medicaid coverage which was greater of 15.1% of average manufacturer price (“AMP”) or the difference between AMP and best price of the drug.  ACA increased the rebate percentage to 23.1%.  It also modified the definition of AMP, calculation of additional rebate for price increase of line of extension drugs, and expanded rebate program to additional drug sales.  Under ACA, manufacturers of drugs that wish to sell their drugs covered under Medicare Part D must participate in coverage gap discount.

Along with additional costs on the industry, ACA will bring positive changes.  ACA is anticipated to add 35 million uninsured citizens as new customers who will directly impact the industry’s bottom line by $115 billion over the next 10 years or so.  Under The Qualifying Therapeutic Discovery Project program as part of ACA, a tax credit will be given to companies that treat unmet medical needs or chronic diseases.  This will significantly boost innovation, particularly for small to mid-size enterprises and benefit the overall industry.  The Biologics Price Competition and Innovation Act (BPCIA), which is part of ACA includes guidelines for market approval of “biosimilar” products, patent provisions, data and market exclusivity, and incentives for innovation.


Evaluating and Mapping Stroke Hospitalization Costs in Florida
Shamarial Roberson and Charlotte Baker
PH-108

Stroke is the fourth leading cause of death and the leading cause of disability in Florida.  Hospitalization charges related to stroke events have increased over the past ten years even while the number of hospitalizations have remained steady.  North Florida lies in the Stroke Belt, the region of the United States with the highest stroke morbidity and mortality.  This paper will demonstrate the use of SAS to evaluate the influence of socio-economic status, sex, and race on total hospitalization charges by payer type in North Florida using data from the State of Florida Agency for Health Care Administration and the Florida Department of Health Office of Vital Statistics.


SDTM What? ADaM Who? A Programmer's Introduction to CDISC
Venita DePuy
PH-90

Most programmers in the pharmaceutical industry have at least heard of CDISC, but may not be familiar with the overall data structure, naming conventions, and variable requirements for SDTM and ADaM datasets.  This overview will provide a general introduction to CDISC from a programing standpoint, including the creation of the standard SDTM domains and supplemental datasets, and subsequent creation of ADaM datasets.  Time permitting, we will also discuss when it might be preferable to do a “CDISC-like” dataset instead of a dataset that fully conforms to CDISC standards.



Planning, Support, and Administration

Configurable SAS® Framework for managing SAS® OLAP Cube based Reporting System
Ahmed Al-Attar and Shadana Myers
PA-42

This paper illustrates a high-level infrastructure discussion with some explanation of the SAS codes that are used to implement a configurable batch framework for managing and updating SAS® OLAP Cubes.  The framework contains collection of reusable parameter driven SAS Base macros, SAS Base custom programs and UNIX/LINUX shell scripts.

This collection manages typical steps and processes used for manipulating SAS files and executing SAS statements.

The SAS Base macro collections contains a group of Utility Macros that includes and a group of OLAP related Macros, that includes

Case Studies in Preparing Hadoop Big Data for Analytics
Doug Liming
PA-143

Before you can analyze your big data, you need to prepare the data for analysis.  This paper discusses capabilities and techniques for using the power of SAS® to prepare big data for analytics.  It focuses on how a SAS user can write code that will run in a Hadoop cluster and take advantage of the massive parallel processing power of Hadoop.


SAS Metadata Querying and Reporting Made Easy: Using SAS Autocall Macros
Jiangtang Hu
PA-41

Metadata is the core of the modern SAS system (aka, SAS Business Analysis Platform) and SAS offers various techniques to access it via SAS data step functions, procedures, libname engines and Java interface.  Furthermore, SAS also provides bunch of autocall macros which were well packaged for metadata querying and reporting using techniques above.

In this paper, I will go through such metadata autocall macros to get quick results against SAS metadata like users, libraries, datasets, jobs and most important, permissions.  For best display of SAS metedata, SAS ODS Reporting Writing Interface technique is also used for this demo(again, it's not new and it's in SAS system folder of sample codes which are omitted by most SAS programmers).  All demo codes (can be submitted through SAS Display Manager, SAS Enterprise Guide and SAS Data Integration Studio) can be found in Github, https://github.com/Jiangtang/SESUG.

Metadata browsing configurations are also supplied for users of SAS Display Manager, SAS Enterprise Guide and SAS Data Integration Studio respectively.


A Review of "Free" Massive Open Online Content (MOOC) for SAS Learners
Kirk Paul Lafler
PA-54

Leading online providers are now offering SAS users with “free” access to content for learning how to use and program in SAS.  This content is available to anyone in the form of massive open online content (or courses) (MOOC).  Not only is all the content offered for “free”, but it is designed with the distance learner in mind, empowering users to learn using a flexible and self-directed approach.  As noted on Wikipedia.org, “A MOOC is an online course or content aimed at unlimited participation and made available in an open access forum using the web.”  This presentation illustrates how anyone can access a wealth of learning technologies including comprehensive student notes, instructor lesson plans, hands-on exercises, PowerPoints, audio, webinars, and videos.


Google® Search Tips and Techniques for SAS® and JMP® Users
Kirk Paul Lafler and Charlie Shipp
PA-40

Google (www.google.com) is the worlds most popular and widely-used search engine.  As the premier search tool on the Internet today, SAS® and JMP® users frequently need to identify and locate SAS and JMP content wherever and in whatever form it resides.  This paper provides insights into how Google works and illustrates numerous search tips and techniques for finding articles of interest, reference works, information tools, directories, PDFs, images, current news stories, user groups, and more to get search results quickly and easily.


Stretching Data Training Methods: A Case Study in Expanding SDTM Skills
Richard Addy
PA-48

With CDISC moving towards becoming an explicit standard for new drug submissions, it is important to expand the number of people who can implement those standards efficiently and proficiently.  However, the CDISC models are complex, and increasing expertise with them across a large group of people is a non-trivial task.

This paper describes a case study focusing on increasing the number of people responsible for creating the SDTM portion of the submissions package (data set specifications, annotated CRF, metadata, and define file).  In moving these tasks from a small dedicated group who handled all submission-related activities to a larger pool of programmers, we encountered several challenges: ensuring quality and compliance across studies; developing necessary skills (often, non-programmatic skills); and managing a steep learning curve (even for programmers with previous SDTM experience).

We developed several strategies to address these concerns, including developing training focused on familiarizing new folks with where they need to go to look for details , a mentor system to help prevent people from getting stuck, focusing extra attention on domains that consistently caused problems, and creating flexible and robust internal tools to assist in the creation of the submission.


Managing and Measuring the Value of Big Data and Analytics Focused Projects
Rob Phelps
PA-31

Big data and analytic focused projects have undetermined scope and changing requirements at their core.  There is high risk of loss of business value if the project is managed with an IT centric waterfall approach and classical project management methods.  Simply deploying technology on time, to plan, and within budget does not produce business value for Big data projects.  A different approach in managing projects and stakeholders are required to execute and deliver business value for big data and analytically focused initiatives.

Introduction:   Projects that are designed to drive better decision in organization can deploy technology on time, to plan, and within budget and completely fail to deliver business value.  In the race to extract insights from the massive amounts of data now available, many companies are spending heavily on IT tools and hiring data scientists.  Most are struggling to achieve a worthwhile return. Big data and analytically focused projects that are treated the same way as IT projects for the most part fail to demonstrate the hoped for business value.  This paper will discuss why Big Data and Analytic project must be treated differently to achieve business changing outcomes.

Discovery driven Projects:   To obtain value from analysis projects focus must be on solving business problems rather than managing risk of deploying technology.  The desire to move to more scientific based management practices using analysis of large and disparate data results in the possibility of change for business processes and the way information is used.  This is in contrast to simply optimizing technical processes which is a historical IT strong suit.  Organization learning and organizational change are the outcomes to show value from analysis projects.  Standard project management tool and measures are not sufficient to track the delivery or insure the value of analytical efforts.  Tools focused on mission and vision driven measures are well suited and can be tied to show applicability to business needs.  These include concepts from formal program evaluation and the creation of logic models which are methods for framing change measures.


Calculating the Most Expensive Printing Jobs
Roger Goodwin, PMP
PA-51

As the SAS manual states, the macro facility is a tool for extending and customizing SAS and for reducing the amount of text that the programmer must enter to do common tasks.  Programmers use SAS macros for mundane, repetitive tasks.  In this application, we present a SAS macro that calculates the top twenty most expensive printing jobs for each Federal agency.

Given the request for the top twenty most expensive printing jobs, it became apparent that this request would become repetitive [H. Paulson 2008].  The US Government Printing Office anticipated an increase for financial summaries from government agencies and developed the following SAS macro specifically for Treasury's request.  GPO has contracts with most Federal government agencies to procure print. GPO can produce a report for the top twenty most expensive printing jobs for each Federal agency.  This, of course, assumes the agency does business with GPO.


Securing SAS OLAP Cubes with Authorization Permissions and Member-Level Security
Stephen Overton
PA-62

SAS OLAP technology is used to organize and present summarized data for business intelligence applications.  It features flexible options for creating and storing aggregations to improve performance and brings a powerful multi-dimensional approach to querying data.  This paper focuses on managing security features available to OLAP cubes through the combination of SAS metadata and MDX logic.


Debugging and Tuning SAS Stored Processes
Tricia Aanderud
PA-116

You don't have to be with the CIA to discover why your SAS® stored process is producing clandestine results.  In this talk, you will learn how to use prompts to get the results you want, work with the metadata to ensure correct results, and even pick up simple coding tricks to improve performance.  You will walk away with a new decoder ring that allows you to discover the secrets of the SAS logs!


Teaching SAS Using SAS OnDemand Web Editor and Enterprise Guide
Charlotte Baker and Perry Brown
PA-85

The server based SAS OnDemand offerings are excellent tools for teaching SAS coding to graduate students.  SAS OnDemand Enterprise Guide and SAS OnDemand Web Editor can be used to accomplish similar educational objectives but the resources required to use each program can be different.  This paper will discuss why one might use a SAS OnDemand program for education and the pros and cons of using each program for instruction.



Posters

Overview of Analysis of Covariance (ANCOVA) Using GLM in SAS
Abbas Tavakoli
PO-15

Analysis of covariance (ANCOVA) is a more sophisticated method of analysis of variance.  Analysis of covariance is used to compare response means among two or more groups (Categorical variables) adjusted for a quantitative variable (Covariate), thought to influence the outcome (Dependent).  A covariate is a continuous variable that can be used to reduce the Sum Square Error (SSE) and subsequently increase the statistical power of an ANOVA design.  There may be more than one covariate.  The purpose of this paper is to overview of Analysis of Covariance (ANCOVA) using GLM with two examples in SAS with interpretation to use for publication.


Trash to Treasures: Salvaging Variables of Extremely Low Coverage for Modeling
Alec Zhixiao Lin
PO-88

Variables with an extremely low occurrences either exhibit very low Information Values in scorecard development or fail to be selected by a regression model, and hence are usually discarded at the stage of data cleaning.  However, some of these variables could contain valuable information and are worth retaining.  We can aggregate different rare occurrences into a single predictor which can be used in a subsequent regression or analysis.  This paper introduces a SAS macro that tries to discover and salvage these variables in hope of turning them into potentially useful predictors.


Design of Experiments (DOE) Using JMP®
Charlie Shipp
PO-47

JMP has provided some of the best design of experiment software for years.  The JMP team continues the tradition of providing state-of-the-art DOE support.  In addition to the full range of classical and modern design of experiment approaches, JMP provides a template for Custom Design for specific requirements.  The other choices include: Screening Design; Response Surface Design; Choice Design; Accelerated Life Test Design; Nonlinear Design; Space Filling Design; Full Factorial Design; Taguchi Arrays; Mixture Design; and Augmented Design.  Further, sample size and power plots are available.

We show an interactive tool for n-Factor Optimization in a single plot.


Analysis of Zero Inflated Longitudinal Data Using PROC NLMIXED
Delia Voronca and Mulugeta Gebregziabher
PO-147

Background:  Commonly used parametric models may lead to erroneous inference when analyzing count or continuous data with excess of zeroes.  For non-clustered data, the most common models used to address the issue for count outcomes are zero inflated Poisson (ZIP), zero inflated negative binomial (ZINB), hurdle Poisson (HP) and hurdle negative binomial (HNB) and Gamma Hurdle (HGamma), truncated Normal Hurdle (HTGauss), hurdle Weibull (HWeibull) and zero inflated Gaussian (ZIGauss) are used for for continuous outcomes.

Objective:  Our goal is to expand these for modeling clustered data by developing a unified SAS macro based on PROC NLMIXED.

Data and Methods:  The motivating data set comes from a longitudinal study in an African America population with poorly controlled type 2 diabetes conducted at VA and MUSC centers in SC between 2008 and 2011.  A total of 256 subjects were followed for one year and measures were taken at baseline and at month 3, 6 and 12. post baseline after the subjects were randomly assigned to four treatment groups: Telephone-delivered diabetes knowledge/information, Telephone-delivered motivation/behavioral skills training intervention, Telephone-delivered diabetes knowledge/information and motivation/behavioral intervention and Usual Care.  The main goal of the study was to determine the efficacy of the treatment groups in relation to the usual care group in reducing the levels of hemoglobin A1C at 12 months.  We use these data to demonstrate the application of the unified SAS macro.

Results:  We show that using the unified SAS macro improves the efficiency of analyzing multiple outcomes with zero-inflation and facilitates model comparison.


Using Regression Model to Predict Earthquake Magnitude and Ground Acceleration at South Carolina Coastal Plain (SCCP)
Emad Gheibi, Sarah Gassman and Abbas Tavakoli
PO-65

Seismically-induced liquefaction is one of the most hazardous geotechnical phenomenons from earthquakes that can cause loss of lives and devastating damages to infrastructures.  In 1964, 7.5 Richter earthquake magnitudes in Nigata, Japan destroyed numerous buildings and structures and initiated studies to understand soil liquefaction.  One major outcome of these studies has been the development of correlations that are used to determine liquefaction resistance of soil deposits from in-situ soil indices.  These relations are based on the Holocene soils (<10,000 years old) while the sand deposits encountered in the South Carolina Coastal Plain (SCCP) are older than 100,000 years and thus the current empirical correlations are not valid for measuring soil resistance against liquefaction.  Researchers have developed methodology that considers the effect of aging on liquefaction potential of sands.  In-situ and geotechnical laboratory tests have performed in the vicinity of sand blows which date back to 6000 years ago at Fort Dorchester, Sampit, Gapway, Hollywood and Four Hole Swamp sites in the SCCP.  Paleoliquefaction studies have been performed to back analyze the earthquake magnitude and the required maximum acceleration to initiate liquefaction at the time of the prehistoric earthquake at these 5 sites.  In this paper, descriptive statistics include frequency distribution for categorical variables and summary statistics for continuous variables is carried out.  Statistical analysis using regression models are performed for selected variables on the calculated values of earthquake magnitude and maximum acceleration (dependent variables).  SAS 9.4 used to analyze the data.


PROC MEANS for Disaggregating Statistics in SAS: One Input Data Set and One Output Data Set with Everything You Need
Imelda Go and Abbas Tavakoli
PO-133

The need to calculate statistics for various groups or classifications is ever present.  Calculating such statistics may involve different strategies with some being less efficient than others.  A common approach by new SAS programmers who are not familiar with PROC MEANS is to create a SAS data set for each group of interest and to execute PROC MEANS for each group.  This strategy can be resource-intensive when large data sets are involved.  It requires multiple PROC MEANS statements due to multiple input data sets and involves multiple output data sets (one per group of interest).  In lieu of this, an economy of programming code can be achieved using a simple coding strategy in the DATA step to take advantage of PROC MEANS capabilities.  Variables that indicate group membership (1 for group membership, blank for non-group membership) can be created for each group of interest in a master data set.  The master data set with these blank/1 indicator variables can then be processed with PROC MEANS and its different statements (i.e., CLASS and TYPES) to produce one data set with all the statistics generated for each group of interest.


Exploring the Use of Negative Binomial Regression Modeling for Pediatric Peripheral Intravenous Catheterization
Jennifer Mann, Jason Brinkley and Pamela Larsen
PO-109

A large study conducted at two southeastern US hospitals from October 2007 through October 2008 sought to identify predictive variables for successful intravenous catheter (IV) insertion, a crucial procedure that is potentially difficult and time consuming in young children.  The data was collected on a sample of 592 children that received a total of 1195 attempts to start peripheral IV catheters in the inpatient setting.  The median age of children was 2.25 years, with an age range of 2 days to 18 years.  The outcome here is number of attempts to successful IV placement for which the underlying data appears to have a negative binomial structure.  The goal here is to illustrate the appropriateness of a negative binomial assumption using visuals obtained from Proc SGPLOT and to determine the goodness of fit for a negative binomial model.

Negative binomial regression output from Proc GENMOD will be contrasted with traditional ordinary least squares output.  Akaike’s Information Criterion (AIC) illustrates that the negative binomial model has a better fit and comparisons are made in the inferences of covariate impact.  Many scenarios of negative binomial regression follow from an application to overdispersed Poisson data; however, this project demonstrates a dataset that fits well under the traditional ideology and purpose of a negative binomial model.


To Foam or not to Foam: A Survival Analysis of the Foam Head that Forms when a Soda is Poured
Kate Phillips
PO-74

The goal of this study is to determine which factors influence the dissolve time of the foam head that forms after a soda is poured.  This study proposes a hierarchical logistic model in order to estimate a particular soda’s probability of being a “small fizzer” (the desired outcome) as opposed to a “big fizzer,” with the median dissolve time of 12 seconds serving as the cut point for the binary outcome.  A standard procedure for testing foam head dissolve time was developed in order to collect the study data.  A sample of 80 Coke products was then tested during fall 2013; characteristics of each product sampled were also recorded.  All analyses were then conducted using Base SAS 9.3.  After conducting a univariate analysis for each factor of interest, the continuous response variable was then dichotomized into the binary outcome of interest.  A bivariate analysis was then conducted; odds ratios with their confidence intervals were examined in order to determine a predictor’s significance with respect to the binary outcome.  Table row percentages were examined for factors where odds ratios were not given by SAS.  It was discovered that the most significant factors were sweetener type and a previously undiscovered (to the author’s best knowledge) interaction between test container material and the presence/absence of caffeine (“test container material” refers to the material that the beverage was poured into for testing).  According to the study results, this interaction was the most influential factor with respect to foam head dissolve time.  The odds ratio for sweetener type was 2.25 (95% CI: 0.91, 5.54).  With caffeine present, the odds ratio for test container material was 0.76 (95% CI: 0.23, 2.53).  With caffeine absent, the odds ratio for test container material jumped to 11.70 (95% CI: 1.85, 74.19).  The final hierarchical logistic model retains the factors “sweetener type,” “test container material,” and the interaction between test container material and the presence/absence of caffeine.


Connect with SAS® Professionals Around the World with LinkedIn and sasCommunity.org
Kirk Paul Lafler and Charles Edwin Shipp
PO-55

Accelerate your career and professional development with LinkedIn and sasCommunity.org.  Establish and manage a professional network of trusted contacts, colleagues and experts.  These exciting social networking and collaborative online communities enable users to connect with millions of SAS users worldwide, anytime and anywhere.  This presentation explores how to create a LinkedIn profile and social networking content, develop a professional network of friends and colleagues, join special-interest groups, access a Wiki-based web site where anyone can add or change content on any page on the web site, share biographical information between both communities using a built-in widget, exchange ideas in Bloggers Corner, view scheduled and unscheduled events, use a built-in search facility to search for desired wiki-content, collaborate on projects and file sharing, read and respond to specific forum topics, and more.


Evaluating Additivity of Health Effects of Exposure to Multiple Air Pollutants Given Only Summary Data
Laura Williams, Elizabeth Oesterling Owens and Jean-Jacques Dubois
PO-139

A research team is interested in determining if health effects of exposure to a mixture of air pollutants is additive or not based on data provided by toxicology studies.  Additivity is defined as the effects of exposure to the mixture being statistically equal to the sum of the effects of exposure to each individual component of that mixture.  The studies of interest typically did not explicitly test for differences between the effects of the mixture and the sum of effects of each component of that mixture, however many did provide summary data for the observed effects.  The summary data from individual studies (e.g. number of subjects [n], mean response, standard deviation) was extracted.  SAS was used to reconstruct representative datasets for each study by randomly generating n values, which were then normalized to the mean and standard deviation given.  The effect of the mixture of pollutants was tested against the sum of the effects of each component of the mixture using proc glm.  A relative difference between the mixture and the sum was calculated so results could be compared even if the endpoints were different. Confidence intervals were calculated using proc iml.  A forest plot of all the results that were not simply additive was created using proc sgplot.  The study details can also be added to the plot as data points.  The method described here allowed us to test for the effect of interest, as if we had the primary data generated by the original authors.  The views expressed in this abstract are those of the authors and do not necessarily represent the views or policies of the U.S. Environmental Protection Agency.


Build your Metadata with PROC CONTENTS and ODS OUTPUT
Louise Hadden
PO-29

Simply using an ODS destination to replay PROC CONTENTS output does not provide the user with attractive, usable metadata.  Harness the power of SAS® and ODS output objects to create designer multi-tab metadata workbooks with the click of a mouse!


A National Study of Health Services Utilization and Cost of Care with SAS: Analyses from the 2011 Medical Expenditure Panel Survey
Seungyoung Hwang
PO-68

Objective:  To show how to examine the health services utilization and cost of care associated with mood disorders among the older population aged 65 or older in the United States.

Research Design and Methods:  A cross-sectional study design was used to identify two groups of elders with mood disorders (n = 441) and without mood disorders (n = 3,822) using the 2011 Medical Expenditure Panel Survey (MEPS).  A multivariate regression analysis using PROC SURVEYREG procedure in SAS was conducted to estimate the incremental health services and direct medical costs (inpatient, outpatient, emergency room, prescription drugs, and other) attributable to mood disorders.

Measures:  Clinical Classification code aggregating ICD-9-CM codes for depression and bipolar disorders.

Results:  The prevalence of mood disorders among individuals aged 65 or older in 2011 was estimated at 11.38% (5.17 million persons) and their total direct medical costs were estimated at approximately $81.82 billion in 2011 U.S. dollars.  After adjustment for demographic, socioeconomic, and clinical characteristics, the additional incremental health services utilization associated with mood disorders for hospital discharges, number of prescriptions filled, and office-based visits were 0.14 ± 0.04, 4.76 ± 1.04, and 17.29 ± 2.07, respectively (all p<0.001).  The annual adjusted mean incremental total cost associated with mood disorders was $5,957 (SE: $1,294; p<0.0001) per person.  Inpatient, prescription medications, and office-based visits together accounted for approximately 78% of the total incremental cost.

Conclusion:  The presence of mood disorders for older adults has a substantial influence on health services utilization and cost of care in the U.S.   Significant savings associated with mood disorders could be realized by cost effective prescription medications which might reduce the need for subsequent inpatient or office-based visits.

Key words:  SAS; health services utilization; healthcare costs; mood disorders; older adults.



Reporting and Information Visualization

Design of Experiments (DOE) Using JMP®
Charlie Shipp
RIV-47

JMP has provided some of the best design of experiment software for years.  The JMP team continues the tradition of providing state-of-the-art DOE support.  In addition to the full range of classical and modern design of experiment approaches, JMP provides a template for Custom Design for specific requirements.  The other choices include: Screening Design; Response Surface Design; Choice Design; Accelerated Life Test Design; Nonlinear Design; Space Filling Design; Full Factorial Design; Taguchi Arrays; Mixture Design; and Augmented Design.  Further, sample size and power plots are available.

We give an introduction to these methods followed by a few examples with factors.


Secrets from a SAS(E9) Technical Support Guy: Combining the Power of the Output Deliver System with Microsoft Excel Worksheets
Chevell Parker
RIV-144

Business analysts commonly use Microsoft Excel with the SAS® System to answer difficult business questions.  While you can use these applications independently of each other to obtain the information you need, you can also combine the power of those applications, using the SAS Output Delivery System (ODS) tagsets, to completely automate the process.  This combination delivers a more efficient process that enables you to create fully functional and highly customized Excel worksheets within SAS.  This paper starts by discussing common questions and problems that SAS Technical Support receives from users when they try to generate Excel worksheets.  The discussion continues with methods for automating Excel worksheets using ODS tagsets and customizing your worksheets using the CSS style engine and extended tagsets.  In addition, the paper discusses tips and techniques for moving from the current MSOffice2K and ExcelXP tagsets to the new Excel destination, which generates output in the native Excel 2010 forma.


Tricks and Tips for Using the Bootstrap in JMP Pro 11
Jason Brinkley and Jennifer Mann
RIV-49

The bootstrap has become a very popular technique for assessing the variability of many different unusual estimators.  Starting in JMP Pro 10 the bootstrap feature was added to a wide variety of output options, however there has not been much development as to the possible uses of this somewhat hidden feature.  This paper will discuss a handful of uses that can be added to routine analyses.  Examples include confidence interval estimates of the 5% trimmed mean and median survival, validation of covariates in regression analysis, comparing the differences in Spearman correlation estimates across two groups, and eigenvalues in principal components analysis.  The examples will show the extra depth that can be easily added to routine analyses.


Build your Metadata with PROC CONTENTS and ODS OUTPUT
Louise Hadden
RIV-29

Simply using an ODS destination to replay PROC CONTENTS output does not provide the user with attractive, usable metadata.  Harness the power of SAS® and ODS output objects to create designer multi-tab metadata workbooks with the click of a mouse!


Where in the World Are SAS/GRAPH® Maps? An Exploration of the Old and New SAS® Mapping Capacities
Louise Hadden
RIV-30

SAS® has an amazing arsenal of tools to use and display geographic information that is relatively unknown and underutilized.  This presentation highlights both new and existing capacities for creating stunning, informative maps as well as using geographic data in other ways.  SAS provided map data files, functions, format libraries and other geographic data files are explored in detail.  Custom mapping of geographic areas are discussed.  Maps produced include use of both the annotate facility (including some new functions) and PROC GREPLAY. Products used are Base SAS® and SAS/GRAPH®.  SAS programmers of any skill level will benefit from this presentation.


Integrating SAS with JMP to Build an Interactive Application
Merve Gurlu
RIV-50

This presentation will demonstrate how to bring various JMP visuals into one platform to build an appealing, informative, and interactive dashboard using JMP Application Builder and make the application more effective by adding data filters to analyze subgroups of your population with a simple click.  Even though all the data visualizations are done in JMP, importing and merging large data files, data manipulations, creating new variables and all other data processing steps are performed by connecting JMP to SAS.  This presentation will demo connecting to SAS to create a data file ready for visualization using SAS data manipulation capabilities and macros; building interactive visuals using JMP; and, building an application using JMP application builder.  For attendees who would like to be able to print the visuals in the application, a few tips and tricks for building PowerPoint presentations will be provided at the end of the presentation.


Penalizing your Models: An Overview of the Generalized Regression Platform
Michael Crotty and Clay Barker
RIV-151

We will provide an overview of the Generalized Regression personality of the Fit Model platform, added in JMP Pro version 11.  The motivation for using penalized regression will be discussed, and multiple examples will show how the platform can be used for variable selection on continuous or count data.


Web Scraping with JMP for Fun and Profit
Michael Hecht
RIV-150

JMP includes powerful tools for importing data from web pages.  This talk walks through a case study that retrieves OS usage share data from the web, and transforms it into a JMP graph showing usage changes over time.  When combined with JMP’s built-in formulas, value labels, and summarization methods, the end result is a tool that can be used to quickly evaluate and make decisions based on OS usage trends.


Enhancements to Basic Patient Profiles
Scott Burroughs
RIV-97

Patient Data Viewers are becoming more prevalent in the pharmaceutical industry, but not all companies use them nor need them for all situations.  Old-fashioned patient profiles still have use in today’s industry, but how can they be enhanced?

Missing data, bad data, and outliers can affect the output and/or the running of the program.  Also, relying on analysis data sets that need to be run first by others can affect timing (vacations, out-of-office, busy, etc.).  As always, there are things you can do to make them look prettier in general.  This paper will show how to solve these issues and make the program more robust.


Creating Health Maps Using SAS
Shamarial Roberson and Charlotte Baker
RIV-107

There are many different programs that have been developed to map data.  However, SAS users do not need to always go outside of their SAS installation to map data.  SAS has many built-in options for mapping that, with a bit of knowledge, can be just as good as advanced external programs.  This paper will give an introduction to how to create health maps using PROC GMAP and compare the results to maps created in ArcGIS.


A Strip Plot Gets Jittered into a Beeswarm
Shane Rosanbalm
RIV-52

The beeswarm is a relatively new type of plot and one that SAS does not yet produce automatically (as of version 9.4).  For those unfamiliar with beeswarm plots, they are related to strip plots and jittered strip plots.  Strip plots are scatter plots with a continuous variable on the vertical axis and a categorical variable on the horizontal axis (e.g., systolic blood pressure vs. treatment group).  The strip plot is hamstrung by the fact that tightly packed data points start overlaying one another, obscuring the story that the data are trying to tell.  A jittered strip plot seeks to remedy this problem by randomly moving data points off of the categorical center line.  Depending on the volume of data and the particular sequence of random jitters, this technique does not always eliminate all overlays.  In order to guarantee no overlays we must adopt a non-random approach.  This is where the beeswarm comes in.  The beeswarm approach is to plot data points one at a time, testing candidate locations for each new data point until one is found that does not conflict with any previously plotted data points.  The macro presented in this paper performs the preliminary calculations necessary to avoid overlays and thereby produce a beeswarm plot.


How To Make An Impressive Map of the United States with SAS/Graph® for Beginners
Sharon Avrunin-Becker
RIV-27

Have you ever been given a map downloaded from the internet and asked to reproduce the same thing using SAS complete with labels and annotations?  As you stare at all the previously written, brilliant SAS/Graph conference papers, you start feeling completely overwhelmed.  The papers assume you already know how to get started and you feel like a clueless chimpanzee not understanding what you are missing.  This paper will walk you through the steps to getting started with your map and how to add colors and annotations that will not only impress your manager, but most importantly yourself that you could do it too!


Dashboards with SAS Visual Analytics
Tricia Aanderud
RIV-118

Learn the simple steps for creating a dashboard for your company and then see how SAS Visual Analytics makes it a simple process.



Statistics and Data Analysis

Using SAS to Examine Mediator, Direct and Indirect Effects of Isolation and Fear on Social Support Using Baron& Kenny Combined with Bootstrapping Methods
Abbas Tavakoli and Sue Heiney
SD-64

This study presentation examines mediator, direct and indirect effects of isolation and fear on social support by using two methods: Baron & Kenny, and Bootstrapping.  This paper used a cross-sectional data from the longitudinal study randomized trial design in which 185 participants were assigned to the therapeutic group (n=93) who received by teleconference with participants interacting in real time with each other and control group (n=92) who received usual psychosocial care (any support used by the patient in the course of cancer treatment.  Baron and Kenny (1986) steps and Hayes (2004) were used to examine for direct and indirect effects.  Results of Baron indicated that the relationship between fear and social support was significant (c =-1.151 (total effect) (p=.0001)) and that there was significant relationship between isolation and fear (α =1.22 (p=.0001)).  Also, previously significant relationship between fear and social support was not significant (c’ =-.40 (direct effect) (p=.1876) when both fear and isolation were in the model.  The indirect effect was -1.11 and Sobel test was significant (P=.0001).  The results of bootstrapping methods indicated the direct effect wares -.41 (95% CI: -.42, -.40 for normal theory and -.41 (95% CI: -.99, .14 for percentile) and indirect effect was -1.06 (95% CI: -1.09, -1.08 for normal theory and -1.09, -1.55 for percentile).  The result showed both methods had significant indirect effect.


Don't be binary! Tales of Non-binary Categorical Regression
Charlotte Baker
SD-87

It is not always optimal to reorganize your data into two levels for regression.  To prevent the loss of information that occurs when categories are collapsed, polytomous regression can be used.  This paper will discuss situations in which polytomous regression can be used and how you can write the code.


Maximizing Confidence and Coverage for a Nonparametric Upper Tolerance Limit on the Second Largest Order Statistic for a Fixed Number of Samples
Dennis Beal
SD-93

A nonparametric upper tolerance limit (UTL) bounds a specified percentage of the population distribution with specified confidence.  The confidence and coverage of a UTL based on the second largest order statistic is evaluated for an infinitely large population.  This relationship can be used to determine the number of samples prior to sampling to achieve a given confidence and coverage.  However, often statisticians are given a data set and asked to calculate a UTL for the second largest order statistic for the number of samples provided.  Since the number of samples usually cannot be increased to increase confidence or coverage for the UTL, the maximum confidence and coverage for the given number of samples is desired.  This paper derives the maximum confidence and coverage for the second largest order statistic for a fixed number of samples.  This relationship is demonstrated both graphically and in tabular form.  The maximum confidence and coverage are calculated for several sample sizes using results from the maximization.  This paper is for intermediate SAS® users of Base SAS® who understand statistical intervals.


Strimmed_t: A SAS® Macro for the Symmetric Trimmed t Test
Diep Nguyen, Anh Kellermann, Patricia Rodríguez de Gil, Eun Sook Kim and Jeffrey Kromrey
SD-79

It is common to use the independent means t-test to test the equality of two population means.  However, this test is very sensitive to violations of the population normality and homogeneity of variance assumptions.  In such situations, Yuen’s (1974) trimmed t-test is recommended as a robust alternative.  The aim of this paper is to provide a SAS macro that allows easy computation of Yuen’s symmetric trimmed t-test.  The macro output includes a table with trimmed means for each of two groups, Winsorized variance estimates, degrees of freedom, and obtained value of t (with two-tailed p-value).

In addition, the results of a simulation study are presented and provide empirical comparisons of the Type I error rates and statistical power of the independent samples t-test, Satterthwaite’s approximate t-test and the trimmed t-test when the assumptions of normality and homogeneity of variance are violated.


ANOVA_HOV: A SAS® Macro for Testing Homogeneity of Variance in One-Factor ANOVA Models
Diep Nguyen, Thanh Pham, Patricia Rodríguez de Gil, Tyler Hicks, Yan Wang, Isaac Li, Aarti Bellara, Jeanine Romano, Eun Sook Kim, Harold Holmes, Yi-Hsin Chen and Jeffrey Kromrey
SD-81

Variance homogeneity is one of the critical assumptions when conducting ANOVA as violations may lead to perturbations in Type I error rates.  Previous empirical research suggests minimal consensus among studies as to which test is appropriate for a particular analysis.  This paper provides a SAS macro for testing the homogeneity of variance assumption in one-way ANOVA models using ten different approaches.  In addition, this paper describes the rationale associated with examining the variance assumption in ANOVA and whether the results could inform decisions regarding the selection of a valid test for mean differences.  Using simulation methods, the ten tests evaluating the variance homogeneity assumption were compared in terms of their Type I error rate and statistical power.


Text Analytics using High Performance SAS Text Miner
Edward Jones
SD-112

The latest release of SAS Enterprise Miner, version 12.3, contains high performance modules, including a new module for text mining.  Paper compares this new module to the SAS Text Miner modules for text mining.  The advantages and disadvantages of HP Text Miner are discussed.  This is illustrated using customer survey data.


How does Q-matrix Misspecification Affect the Linear Logistic Test Model’s Parameter Estimates?
George MacDonald and Jeffrey Kromrey
SD-99

Cognitive diagnostic assessment (CDA) is an important thrust in measurement designed to assess students’ cognitive knowledge structures and processing skills in relation to item difficulty (Leighton & Gierl, 2007).  If the goal of assessing students’ strengths and weaknesses is to be accomplished, it will be important to develop standardized assessments that measure the psychological processes involved in conceptual understanding.  The field of CDA in general and the linear logistic test model in particular can be thought of as a response to these emerging educational needs (MacDonald, G., 2014).  A simulation study was conducted to explore the performance of the linear logistic test model (LLTM) when the relationships between items and cognitive components were misspecified.  Factors manipulated included percent of misspecification (0%, 1%, 5%, 10%, and 15%), form of misspecification (under-specification, balanced misspecification, and over-specification), sample size (20, 40, 80, 160, 320, 640, and 1280), Q-matrix density (60% and 46%), number of items (20, 40, and 60 items), and skewness of person ability distribution (-0.5, 0, and 0.5).  Statistical bias, root mean squared error, confidence interval coverage, and confidence interval width were computed to interpret the impact of the design factors on the cognitive components, item difficulty, and person ability parameter estimates.  The simulation provided rich results and selected key conclusions include (a) SAS works superbly when estimating LLTM using a marginal maximum likelihood approach for cognitive components and an empirical Bayes estimation for person ability, (b) parameter estimates are sensitive to misspecification, (c) under-specification is preferred to over-specification of the Q-matrix, (d) when properly specified the cognitive components parameter estimates often have tolerable amounts of root mean squared error when the sample size is greater than 80, (e) LLTM is robust to the density of Q-matrix specification, (f) the LLTM works well when the number of items is 40 or greater, and (g) LLTM is robust to a slight skewness of the person ability distribution.  In sum, the LLTM is capable of identifying conceptual knowledge when the Q-matrix is properly specified, which is a rich area for applied empirical research (MacDonald, 2014).


Modeling Cognitive Processes of Learning with SAS® Procedures
Isaac Li, Yi-Hsin Chen, Chunhua Cao and Yan Wang
SD-83

Traditionally, the primary goal of educational assessments has been to evaluate students’ academic achievement or proficiency in comparison to their peers or against promulgated standards. Both classical test theory (CTT) and item response theory (IRT) modeling frameworks provide measurement in the form of a summative estimate of the outcome variable.  In recent years, understanding and exploring the complex cognitive processes that contribute to the learning outcome received growing interest in the field of psychometrics, where various item response modeling approaches have been devised to describe the relationship between the outcome and its componential attributes.  Such approaches include the linear logistic test model (LLTM), the crossed random-effects linear logistic test model (CRELLTM), and the two-stage multiple regression method (MR).  This paper will not only introduce these statistical models but also demonstrate how to obtain parameter estimates and model-data fit indices for cognitive processes under each model by employing the GLM, NLMIXED, and GLIMMIX procedures.


Power and Sample Size Computations
John Castellon
SD-145

Power determination and sample size computations are an important aspect of study planning and help produce studies with useful results for minimum resources.  This tutorial reviews basic methodology for power and sample size computations for a number of analyses including proportion tests, t tests, confidence intervals, equivalence and noninferiority tests, survival analyses, correlation, regression, ANOVA, and more complex linear models.  The tutorial illustrates these methods with numerous examples using the POWER and GLMPOWER procedures in SAS/STAT® software as well as the Power and Sample Size Application.  Learn how to compute power and sample size, perform sensitivity analyses for other factors such as variability and type I error rate, and produce customized tables, graphs, and narratives.  Special attention will be given to the newer power and sample size analysis features in SAS/STAT software for logistic regression and the Wilcoxon-Mann-Whitney (rank-sum) test.

Prior exposure to power and sample size computations is assumed.


Gestational Diabetes Mellitus and changes in offspring’s weight during infancy: A longitudinal analysis
Marsha Samson, Olubunmi Orekoya, Dumbiri Onyeajam and Tushar Trivedi
SD-128

Background: Gestational Diabetes Mellitus (GDM) is the most common metabolic disorder during pregnancy in the United States with an incidence ranging between 7-14% of all pregnancies.  GDM has been associated with various adverse effects including macrosomia and the heightened risk of type 2 diabetes mellitus (T2DM) in mothers.  Many studies have shown the association between GDM and offspring birth weight but very few studies have assessed the longitudinal relationship between GDM and offspring’s weight at different time points.

Objectives: The purpose of this study is to determine the association between GDM and changes in infant’s weight at 0, 3, 5, 7 and 12 months.

Methods: We used data from the Infant Feeding Practices Survey II, a large prospective study of pregnant women living in the United States.  We examined GDM and babies weight at 0, 3, 5, 7 and 12 months among 1,072 mothers.  We used general linear models to assess the impact of GDM on infant weight, adjusted for socio-demographic variables and other potential confounders.

Results: The mean age of our sample is 30.5 years.  Infants from mothers with GDM had no significant difference in mean weight during infancy compared with infants whose mothers did not have GDM (adjusted coefficient: 0.1870, 95 % Confidence Interval: -0.0431, 0.4171).  Mean weight during infancy for those born to non-Hispanic Black mothers was significantly higher than those born to non-Hispanic Whites (Adjusted coefficeint-0.4787, 95% CI: -0.8533, -0.1040).  During infancy, mean weight for boys was significantly higher than in girls (adjusted coefficient 0.4698, 95% CI: 0.3491, 0.5906).  With every one more cigarette smoked per day during pregnancy, mean infantile weight decreased by 4% (95% CI: -0.06145, -0.0166).

Conclusion: Our study did not show a significant association between GDM and mean weight during infancy.  However, this result should be treated with caution because of the small number of GDM in our sample, and also because of unique demographic composition of our study sample which consisted mostly of white women.


Multilevel Models for Categorical Data using SAS® PROC GLIMMIX: The Basics
Mihaela Ene, Elizabeth Leighton, Genine Blue and Bethany Bell
SD-134

Multilevel models (MLMs) are frequently used in social and health sciences where data are typically hierarchical in nature.  However, the commonly used hierarchical linear models (HLMs) are only appropriate when the outcome of interest is continuous; when dealing with categorical outcomes, a transformation and an appropriate error distribution for the response variable need to be incorporated into the model and therefore, hierarchical generalized linear models (HGLMs) need to be used.  This paper provides an introduction to specifying hierarchical generalized linear models using PROC GLIMMIX, following the structure of the primer for hierarchical linear models previously presented by Bell, Ene, Smiley, and Schoeneberger (2013).  A brief introduction into the field of multilevel modeling and HGLMs with both dichotomous and polytomous outcomes is followed by a discussion of the model building process and appropriate ways to assess the fit of these models.  Next, the paper provides a discussion of PROC GLIMMIX statements and options as well as concrete examples of how PROC GLIMMIX can be used to estimate (a) two-level organizational models with dichotomous outcomes and (b) two-level organizational models with polytomous outcomes.  These examples use data from High School and Beyond (HS&B), a nationally-representative longitudinal study of American youth.  For each example, narrative explanations accompany annotated examples of the GLIMMIX code and corresponding output.


Analyzing Multilevel Models with the GLIMMIX Procedure
Min Zhu
SD-141

Hierarchical data are common in many fields, from pharmaceuticals to agriculture to sociology.  As data sizes and sources grow, information is likely to be observed on nested units at multiple levels, calling for the multilevel modeling approach.  This paper describes how to use the GLIMMIX procedure in SAS/STAT® to analyze hierarchical data that have a wide variety of distributions.  Examples are included to illustrate the flexibility that PROC GLIMMIX offers for modeling within-unit correlation, disentangling explanatory variables at different levels, and handling unbalanced data.  Also discussed are enhanced weighting options, new in SAS/STAT 13.1, for both the MODEL and RANDOM statements.  These weighting options enable PROC GLIMMIX to handle weights at different levels.  PROC GLIMMIX uses a pseudolikelihood approach to estimate parameters, and it computes robust standard error estimators.  This new feature is applied to an example of complex survey data that are collected from multistage sampling and have unequal sampling probabilities.


%DISCIT Macro: Pre-screening continuous variables for subsequent binary logistic regression analysis through visualization
Mohamed Anany
SD-92

Prescreening variables to identify potential predictors to enter into the model is an important stage in any modeling process.  The goal is to select the variables that will result in the “best” model.  In binary logistic regression, when there are many independent variables that could potentially be included in the model, it is always a good practice to perform bivariate analysis between the dichotomous variable (dependent) and the independent variables.  The independent variables come in many forms; binary, continuous, nominal categorical, and/or ordinal categorical variables.  This presentation is concerned with identifying candidate continuous variables by performing a bivariate analysis.  The analysis is based on a two-sample t -test, a graphical panel to visualize the relationship of the continuous variable with the dichotomous dependent variable, recoding the continuous variable into two different ordinal forms and adjusting their scale through odds and log odds transformations if needed, and collapsing similar groups of the recoded form of the continuous variable to ensure (or improve) their linear relationship if exists.  Also, we make use of the information value which gives an indication of the predictive power of the independent variable in capturing the dichotomous variable.  The Information value is specific to the credit and financial industries.  The SAS DISCIT macro was developed to make this prescreening process easier for analysts.


Using SAS to Create a p-value Resampling Distribution for a Statistical Test
Peter Wludyka and Carmen Smotherman
SD-91

One starts with data to perform a statistical test of a hypothesis.  A p-value is associated with a particular test and this p-value can be used to decide whether to reject a null hypothesis in favor of some alternative.  Since the data (the sample) is usually all a researcher knows factually regarding the phenomenon under study, one can imagine that by sampling (resampling) with replacement from that original data that additional information about the hypothesis and phenomenon/study can be acquired.  One way to acquire such information is to repeatedly resample for the original data set (using, for example, PROC SURVEYSELECT) and at each iteration (replication of the data set) perform the statistical test of interest and calculate the corresponding p-value.  At the end of this stage one has r p-values (r is typically greater than 1,000), one for each performance of the statistical test.  Thinking of the original p-value as a quantile of this distribution of p-values allows one to assess the likelihood that the original hypothesis would have been rejected, which helps put the actual decision in perspective.  The resampling distribution of p-values also allows one to retrospectively assess the power of the test by finding the proportion of the p-values that are less than a specified level of significance (alpha).  By creating a p-value resampling distribution for a selection of sample sizes one can create a power curve which can be used prospectively to gather sample size information for follow up studies.


Making Comparisons Fair: How LS-Means Unify the Analysis of Linear Models
Weijie Cai
SD-142

How do you compare group responses when the data are unbalanced or when covariates come into play?  Simple averages will not do, but LS-means are just the ticket.  Central to postfitting analysis in SAS/STAT® linear modeling procedures, LS-means generalize the simple average for unbalanced data and complicated models.  They play a key role both in standard treatment comparisons and Type III tests and in newer techniques such as sliced interaction effects and diffograms.  This paper reviews the definition of LS-means, focusing on their interpretation as predicted population marginal means, and it illustrates their broad range of use with numerous examples.


Annotation utilizations in customized SAS/Graph Bar Charts
Yong Liu, Hua Lu, Liang Wei, Xingyou Zhang, Paul Eke and James Holt
SD-16

Bar graphs are generated by using SAS/Gchart to present the distribution of health behaviors or health outcomes among adults aged ≥18 years by selected characteristics and each of 50 states using 2011 Behavioral Risk Factor Surveillance Survey (BRFSS).  Due to missing data or unreliable estimates of parameters, annotation facilities are utilized to make the charts more presentable by adding data labels and footnotes.  Further, incorporating a SAS Macro variable into the program can definitely make the development of 50 charts for 50 states more achievable and efficient.