2016-Present: Ph.D. Dept. of Computer Science, Johns Hopkins University
Advisor: Ben Langmead
Research Interests: Computational Genomics, Next Generation Sequencing, Text Indexing, Succinct Data Structures, Big Data, Sketching.
2012-2016: B.S. Tufts University
Majors: Computer Science and Biology
2016-Present: Johns Hopkins University. Baltimore, MD
Develop efficient and scalable tools for analysis of next generation sequencing reads (see “Projects”).
Published 3 papers (2 as first author) and presented at 3 conferences (see “Publications”).
2 semesters as teaching assistant for “Computational Genomics: Sequences”.
Summer 2020: Illumina, Inc. San Deigo, CA
Developed tool to detect false positives in population-scale database of thousands of structural variants.
Leveraged information from short-read (Illumina) and long-read (PacBio HiFi/CCS) technologies.
Added initial support for extending to short tandem repeats (STRs).
Presented research findings internally to multiple departments.
Summer 2016: Berg Health. Framingham, MA
Developed web portal to streamline analysis of multi ’omics data (PHP, R).
Created NLP machine learning tool to extract information from clinical health records (Python).
2012-2016: Tufts Academic Resource Center. Medford, MA
Duties: 3 one-on-one tutoring sessions every week with beginner comp. sci. students.
Topics covered: programming basics, data structures, discrete mathematics, algorithms, C/C++.
rowbowt (C++) : Query large, repetitive genomic collections quickly with space sublin- ear to input size.
Collaboration with researchers across multiple universities.
levioSAM (C++) : Lift over alignments from variant-aware alternate references.
Personal Genome Constructor : Use low-coverage imputation to improve NGS read alignment accuracy and alleviate reference bias in downstream analyses (e.g. variant calling, allele-specfic expression).
Draws upon alignment data from SRA and variant data from the 1000 Genomes Project.
pfbwt-f (fork) (C++) : Efficiently build a Burrows-Wheeler Transform from a se- quence containing high amounts of repitition.
varcount (C++) : Calculate NGS alignment coverage over a predefined set of variants.
- Led team of fellow interns from wide range of disciplines to provide non-profit with playbook for teaching STEM and professional development skills to students in Zimbabwe
Programming Languages: Bash, C++, C, LaTeX, Python, R, Rust
Tools/Frameworks: Unix, docker, HPC (SLURM), tidyverse, numpy/scipy, SnakeMake