About
Data Enthusiast (Self-driven / Versatile / Highly motivated)
의료 데이터 축적, 머신 러닝, 재현 가능한 데이터 분석, 업무 자동화에 관심이 있습니다.
환자와 의료인 모두 만족하는 매칭 플랫폼을 꿈 꿉니다.
Projects
Age-matching of kidney donors and candidates with ESRD
In this project, we tried to figure out the optimal age-matching strategy applying the XGBoost on the national registry data. I present the demo of the results. Try various conditions of candidates, and see how our model predicts mortality and graft survivorship by donor age.
It’ll take a while for the app to be fully loaded.
Standardizing the National Drug Codes of FDA
The national drug codes (NDCs) of FDA are in two forms: 10-digit or 11-digit. My colleagues wanted to standardize the codes into the 11-digit form before modelling, so I made an R package for this task. This small package streamlines the conversion process.
Project generator for Stata
It’s often a pain in the neck to structure a data analysis project. This Stata package simplifies this annoying part and allows analysts to focus solely on their projects.
Gists
Make It Purrr
Often, we need to apply the same data processing to multiple data frames.
I thought it would be a good idea for this particular task to make the dplyr
functions applicable to a list of data frames by emulating the python
’s decorator.
# Required packages
library(dplyr)
library(purrr)
library(rlang)
# Function decorators to make a function for a list of data frames
# The first place is for a list of data frames so that it can work in piping
make_it_purrr <- function(fn) {
wrapper <- function(list_df, ...) {
exprs <- enquos(...)
list_df %>%
map(
~ fn(.x, !!!exprs)
)
}
wrapper
}
# Examples:
# Extensions of some `dplyr` functions for a list of data frames
map_mutate <- make_it_purrr(mutate)
list_dfs <- list(mtcars, mtcars, mtcars)
list_dfs %>%
map_mutate(mpg2 = mpg^2) %>%
map_mutate(mpg3 = mpg^3)
map_group_by <- make_it_purrr(group_by)
map_summarize <- make_it_purrr(summarize)
list_dfs %>%
map_group_by(am) %>%
map_summarize(mean_mpg_by_am = mean(mpg))
Makefile
This Makefile
is one of my favorite gists for my data analysis projects.
# Targets for project preparation
dirs:
mkdir -p data/interim
mkdir -p data/processed
mkdir -p data/raw
mkdir -p models
mkdir -p notebooks
mkdir -p docs
mkdir -p jobs
mkdir -p reports/figures
mkdir -p reports/tables
mkdir -p reports/slides
mkdir -p reports/papers
mkdir -p src/data
mkdir -p src/models
mkdir -p src/figures
mkdir -p src/tables
start:
git init
echo '.*\n!/.gitignore\nnotebooks/\nreports/\njobs/\ndocs/\ndata/' > .gitignore
git add .
git commit -m 'initial commit'
git branch -M main
make dirs
# Targets for data analysis
# `lr`: from `littler`
data:
lr
figures:
lr
tables:
lr