As volume and complexity of the data used in public health rapidly increase, more advanced methods in statistical modelling and computation are needed. For example, computer-intensive processing and analysis of high-frequency data require efficient programming approaches; Bayesian statistics are becoming increasingly popular in infectious disease modeling and inference; novel computational methods based on object-oriented and topological data analysis are also being developed for biological signal and network analysis.
In disease mapping, data tend to be correlated due to spatial and temporal proximity, while in social and neuroimaging networks, sites or regions of interest typically possess high spatial and temporal correlation. Multivariate statistics provide powerful tools for simultaneously analyzing multiple, interdependent outcomes.
Longitudinal data consists measurements collected over time repeatedly on the same subjects or different subjects from the same cluster (e.g. family), which are very common in many public health studies such as cohort data and survey data.
Survival analysis is particularly designed for analyzing the time-to-event data (e.g., time to disease onset, time to death, time in treatment etc.) when the outcome is partially unobserved due to the loss to follow up or end of study and is powerful in examining the mortality risk over time.
Missing data are data that were supposed to be collected but that, for some reason, were not. This is a statistical problem, very common in the social, behavioral, epidemiological, and medical sciences, that may introduce bias in the analysis and increase the uncertainty of the estimates. Limited dependent outcomes (e.g., binary, count, categorical), bounded outcomes (e.g., clinical scores, academic grades), and circular outcomes (e.g., protein structures, circadian rhythm, sleep cycles) require ad-hoc analytic approaches.
Non- and semi-parametric methods relax modelling assumptions that often limit the applicability of parametric models. For this reason, non- and semi-parametric models provide flexible analytical tools to discover patterns and associations in complex data. .
Modern biological technologies (such as microarray and next generation sequencing) call for the development of powerful and efficient statistical methods for drawing inferences from high throughput genetic, genomic data generated from public health and biomedical research. The analysis includes family and population based human genetic/genomic data such as single nucleotide polymorphism (SNP), gene expression and protein expression data.
The methodology is widely applied to the following areas.
• Infectious disease mapping
• Disease prevalence forecasting
• Disease cluster detection
Novel statistical frameworks based on object-oriented and topological data analysis tools are being developed for electroencephalography (EEG) and diffusion and functional magnetic resonance imaging (dMRI and fMRI) data in post-stroke aphasia and epilepsy to better understand the neural deficts in these brain network disorders
In collaboration with medical and scientific researchers at USC, as well as other national and international institutions, faculty are developing procedures for the analysis of genetics/genomics data such as microarray analysis, functional genomics, next-generation sequencing analysis, integrative analysis of omics data, epigenetics and complex bioinformatic approaches (RNA-seq analysis, pathway analysis and differential expression analysis).
Accelerometry for physical activity and sleep monitoring
NHANES, UK cohort studies
SEER, South Carolina Cancer Registry data
• Data from the SC-RFA Office
• ICD9/10 codes