Using Principal Components Analysis (PCA) to Analyze Latino Stress by Agricultural Season and Occupation
Principal Components Analysis (PCA) is a commonly used unsupervised machine learning technique. In this presentation, I describe the PCA method with a general description and geometric interpretation using simulated two and three-dimensional data. The description of the PCA methodology is followed by an application of PCA to analyze Latino stress by agricultural season and occupation in a majority-minority agricultural area of eastern Washington State.
Inference and Estimation of the Treatment Effect in a Two-Arm Parallel Randomized Controlled Trial that is Marginalized Over Time
In a 2-arm parallel randomized controlled trial (RCT) an outcome of interest is measured at least twice, once before the treatment is administered and once after. However, to measure the stability of an effect over time, additional time points can be added after the first follow-up. In this presentation, I consider the case where an outcome of interest is measured at a baseline, a second follow-up, and a third follow-up. In this design, it might be of interest to know the treatment effect at the third follow-up that is unconditional on the treatment effect at the second follow-up. We derive such an effect and its standard error and apply the theory to a simulated outcome that is correlated over time.
Many investigators, project managers, and data managers have turned to REDCap to manage their data. Eventually, it falls to the statistician to take the REDCap data and load it into their statistical analysis program of choice. In this presentation, I show how to use the CSV and R script file downloaded from REDCap to create a clean R data set.
Average Marginal Effects in a 2-Arm Parallel Randomized Controlled Trial with Heterogeneity of Effects by Strata
The utility of the method of Average Marginal Effects in many contexts of statistical modeling makes the lack of accessible resources in the literature surrounding them a tragedy for both statisticians and those who consume statistics. In this presentation, I attempt to solve this problem by deriving the estimate of the AME, and its standard error in the context of a common experimental design; namely, the 2-arm parallel randomized controlled trial (RCT) with heterogeneity of effects by site. We follow each section with straight forward programming techniques to apply this method to real data.
The foreign R package is useful for exporting data sets to all kinds of formats including files for the proprietary SPSS program from IBM. However, the default method for writing to SPSS doesn’t allow for variable labels. Instead, it defaults to labeling all of the variables by their name from the column headings. In this presentation, we will use the Hmisc R package for its variable labeling functionality and write a modification to the original SPSS export function from the foreign R package.
The general linear model (GLM), more commonly called linear regression, is the most common statistical modeling tool a statistician or data scientist will use. As such, it is crucial to know how to present results from a GLM in a way that is understandable to your audience.
An essential part of any data analysis project is to understand the data at hand.
Creating an analytic data set is very important when doing data analysis and will be used to reproduce the results.