About

Data Enthusiast (Self-driven / Versatile / Highly motivated)

의료 데이터 축적, 머신 러닝, 재현 가능한 데이터 분석, 업무 자동화에 관심이 있습니다.

환자와 의료인 모두 만족하는 매칭 플랫폼을 꿈 꿉니다.

Projects

Age-matching of kidney donors and candidates with ESRD

In this project, we tried to figure out the optimal age-matching strategy applying the XGBoost on the national registry data. I present the demo of the results. Try various conditions of candidates, and see how our model predicts mortality and graft survivorship by donor age.

It’ll take a while for the app to be fully loaded.

Standardizing the National Drug Codes of FDA

The national drug codes (NDCs) of FDA are in two forms: 10-digit or 11-digit. My colleagues wanted to standardize the codes into the 11-digit form before modelling, so I made an R package for this task. This small package streamlines the conversion process.

https://github.com/jaylkim/ndc

Project generator for Stata

It’s often a pain in the neck to structure a data analysis project. This Stata package simplifies this annoying part and allows analysts to focus solely on their projects.

https://github.com/jaylkim/mold

Gists

Make It Purrr

Often, we need to apply the same data processing to multiple data frames. I thought it would be a good idea for this particular task to make the dplyr functions applicable to a list of data frames by emulating the python’s decorator.

# Required packages
library(dplyr)
library(purrr)
library(rlang)

# Function decorators to make a function for a list of data frames
# The first place is for a list of data frames so that it can work in piping
make_it_purrr <- function(fn) {
  wrapper <- function(list_df, ...) {
    exprs <- enquos(...)
    list_df %>%
      map(
        ~ fn(.x, !!!exprs)
      )
  }
  wrapper
}

# Examples:
# Extensions of some `dplyr` functions for a list of data frames
map_mutate <- make_it_purrr(mutate)
list_dfs <- list(mtcars, mtcars, mtcars)
list_dfs %>%
  map_mutate(mpg2 = mpg^2) %>%
  map_mutate(mpg3 = mpg^3)
map_group_by <- make_it_purrr(group_by)
map_summarize <- make_it_purrr(summarize)
list_dfs %>%
  map_group_by(am) %>%
  map_summarize(mean_mpg_by_am = mean(mpg))

Makefile

This Makefile is one of my favorite gists for my data analysis projects.

# Targets for project preparation
dirs:
    mkdir -p data/interim
    mkdir -p data/processed
    mkdir -p data/raw
    mkdir -p models
    mkdir -p notebooks
    mkdir -p docs
    mkdir -p jobs
    mkdir -p reports/figures
    mkdir -p reports/tables
    mkdir -p reports/slides
    mkdir -p reports/papers
    mkdir -p src/data
    mkdir -p src/models
    mkdir -p src/figures
    mkdir -p src/tables

start:
    git init
    echo '.*\n!/.gitignore\nnotebooks/\nreports/\njobs/\ndocs/\ndata/' > .gitignore
    git add .
    git commit -m 'initial commit'
    git branch -M main
    make dirs

# Targets for data analysis
# `lr`: from `littler` 
data:
    lr 

figures:
    lr 

tables:
    lr