Data and Statistical Analyst
Columbia
Wednesday, 22 April 2026
To ensure the ESD Lab remains on track with the five -year NIH grant timeline, a dedicated Data and Statistical Analyst is required to manage the critical transition from raw data to high-impact results. This role is essential for bypassing the technical backlogs that often delay papers and reports. Working closely with the Principal Investigator (PI) and Research Associates, the analyst will provide the specialized computational support needed to execute machine learning (ML) aims, handling large, complex datasets that exceed the capacity of general staff. By integrating daily data cleaning with advanced modeling, this position directly helps the team meet grant milestones and ensures the lab stays in constant compliance with modern NIH Data Management and Sharing (DMS) standards. Job Related Minimum Required Education and Experience Requires a bachelor’s degree in a job related field and 2 or more years of job related experience, which may be substituted by an equivalent combination of job related certification, training, education, and/or experience. Required Certification, Licensure/ Other Credentials Preferred Qualifications Technical Expertise: Proficiency in R or Python for longitudinal modeling (e.g., Latent Growth Curve Modeling or Mixed-Effects models) to track developmental changes over 36 months. Machine Learning: Experience applying ML to heterogeneous datasets (e.g., combining behavioral scores, clinical observations, and environmental variables) to predict diagnostic outcomes. Domain Knowledge: Familiarity with standard autism assessments (e.g., ADOS-2, M-CHAT, Mullen Scales) and the unique challenges of “noisy” behavioral data in infants and toddlers. NIH Compliance: Expertise in formatting and uploading data to the National Database for Autism Research (NDA), which has specific, rigorous requirements for every six-month data submission cycle. Data Cleaning: Specialized skills in handling missing data and attrition, which are common in 3-year longitudinal studies with families and young children. Knowledge/ Skills/ Abilities Advanced Programming: Expert-level proficiency in R (Tidyverse, lme 4) or Python (Pandas, Scikit-learn) for reproducible data pipelines. Trajectory Modeling: Ability to perform growth curve analysis, mixed-effects modeling, and latent class growth analysis to track development over 36 months. Machine Learning Implementation: Skill in building and validating models (e.g., Random Forests, XG - Boost, Clustering) to identify early risk factors for autism. Data Harmonization: Expertise in cleaning and merging multi-source data (e.g. behavioral scores, environmental exposures, and medical records.) Data Visualization: Ability to create clear, publication-ready visuals that “translate” complex ML findings for the PI and research team. Job Duties Job Duty Machine Learning Pipeline Development & Execution Essential Function: Build, test, and deploy supervised and unsupervised ML models (e.g., Random Forests, XG - Boost, or Clustering) using R or Python to identify early autism risk markers. Perform feature engineering, cross-validation, and sensitivity analyses on high-dimensional infant datasets to meet specific RO 1 grant aims. Essential Function Yes Percentage of Time 30 Job Duty Analytical Strategy & Manuscript Dissemination Essential Function: Collaborate with the PI and Research Associates to translate research questions into formal statistical plans. Execute longitudinal analyses (e.g., mixed-effects modeling, latent growth curves), generate publication-quality data visualizations, and draft technical methodology and results sections for peer-reviewed journals. Essential Function Yes Percentage of Time 25 Job Duty Programming Novel Data Collection & Integration Tools Essential Function: Design and program custom digital tools and APIs (e.g., Datayu JavaScript, Red. Cap hooks, Mat. Lab pulls, or mobile-syncing scripts) to automate the capture of behavioral and environmental data. Ensure new data collection streams are scalable, validated for accuracy, and integrated directly into the lab’s primary longitudinal database. Essential Function Yes Percentage of Time 20 Job Duty Automated Data Cleaning & Trajectory Harmonization Essential Function: Author reproducible scripts to clean, merge, and de-identify multi-source data across three years of infant development. Implement complex data-wrangling procedures to handle non-random attrition, perform multiple imputations for missing values, and harmonize phenotypic scores from diverse clinical assessments. Essential Function Yes Percentage of Time 15 Job Duty Technical Leadership & Peer Consultation Essential Function: Serve as the technical lead for the data team, providing expert troubleshooting for code errors and model convergence issues. Advise the PI and Research Associates on the computational feasibility of new hypotheses and maintain version control (Git) to ensure research reproducibility across the lab. Essential Function Percentage of Time 5 Job Duty Data Infrastructure & HIPAA Security Management Essential Function: Manage the secure storage, backup, and access permissions for the longitudinal autism dataset in compliance with HIPAA and institutional ethics standards. Document all data transformations and maintain the “master” data dictionary to ensure the long-term integrity of the 0–3 year old cohort information. Essential Function Yes Percentage of Time 5 Position Attributes Hazardous weather category Non-Essential Employees in Safety-Sensitive or Security-Sensitive positions will be subject to pre-employment and post-employment drug testing in accordance with University policy HR 1.95 Drug and Alcohol Testing. No Posting Detail Information Number of Vacancies 1 Desired Start Date 05/16/2026 Position End Date 05/15/2029 Job Open Date 04/21/2026 Job Close Date 06/05/2026 Open Until Filled No