I am a data analyst focused on building interpretable, uncertainty-aware analytics systems using statistical modeling and machine learning.
My work involves transforming real-world data into structured, analysis-ready datasets and developing models that remain reliable under uncertainty and dataset variability. I primarily work in healthcare and operational analytics, with experience in real-world data (RWD), EHR-based modeling, and decision-oriented analytics.
Most data systems optimize for predictive performance on static datasets. My work instead focuses on reliability in real-world settings, where data is sparse, noisy, and operational decisions carry asymmetric risks.
Across projects, I focus on:
Manuscript under peer review
Developed an interpretable ICU mortality risk modeling framework using MIMIC-IV and eICU data, with cohorts and analysis-ready datasets built from longitudinal clinical measurements including vitals, labs, and GCS. The project uses likelihood-based feature transformations to standardize heterogeneous clinical variables against population baselines, enabling grouped evidence aggregation and interpretable risk characterization. This technique is explained more in detail in Project 2.
Internal and external validation across hospitals showed stable mortality risk stratification, including consistent identification of low-risk patient populations without retraining. The framework also supports scalable survival analysis across 300+ clinical features, with Kaplan–Meier patterns preserved across datasets.
The transformation enabled aggregation of disparate features into comparable groups such that the strength of evidence from each measurement can be viewed relative to the population baseline. This enabled representation of atypicality of each measurement per patient in addition to the population-relative overall mortality risk.
Keywords: R, SQL, survival analysis, model evaluation, external validation, MIMIC-IV, eICU, interpretable clinical risk modeling.
This project explores transforming heterogeneous features into a common, comparable representation using log-likelihood ratios.
Instead of relying solely on model outputs, the approach enables analysis of feature-level contributions and data structure in a standardized space.
Each feature contributes log-likelihood ratio evidence:
log p(x_i | positive class) − log p(x_i | negative class)
This produces a representation where:
Derived signals:
d_dist — relative proximity to class distributionsproj — directional accumulation of feature-level deviationsThis project explores inventory forecasting under limited and variable data using a Bayesian approach.
Key components:
Artifacts:
Takeaway: Explicit uncertainty modeling revealed limitations of fully automated forecasting under non-stationary conditions, highlighting the importance of human-in-the-loop decision-making.