Sirui Chen·sirui.dev

Experience

Where I've worked

Research and engineering roles across NLP, ML, full-stack, ETL, and embedded toolchains.

  1. Research Assistant — Prof. Chris Callison-Burch

    University of Pennsylvania · Philadelphia, PA

    May 2025Aug 2025

    • Built and evaluated retrieval-augmented generation workflows for DARPA SciFy and OpenScholar using score-based filtering, contrastive model work, and domain data preparation.
    • Turned research goals into reusable scripts, datasets, and documentation so model and retriever changes could be tested consistently across the lab.
    PythonRAGRetrievalContrastive LearningDatasetsReproducible Pipelines
  2. Python Developer Intern

    CambioML · Remote

    Dec 2023Mar 2024

    • Built an LLM-powered data portal with Danswer AI to search datasets from AWS, Datarade, and Snowflake across ingestion, indexing, and user-facing discovery workflows.
    • Wrote Python crawlers and cleaners for 5,000+ datasets; resolved metadata issues that affected search relevance and downstream data quality.
    • Automated AWS search infrastructure with CDK and deployment scripts for reproducible development, testing, and release support.
    PythonAWSAWS CDKSnowflakeDanswer AIETLSearch
  3. Lab Assistant — Vision-Assisted Self-Driving F1Tenth Car

    Washington University in St. Louis · St. Louis, MO

    Aug 2023May 2024

    • Improved model robustness in glare and shadow conditions through targeted data augmentation, experiment tracking, and evaluation.
    • Built a ROS accelerometer driver for sensor integration and applied self-attention distillation to improve convergence speed by 20%.
    PythonPyTorchROSSelf-Attention DistillationC++
  4. Data Processing Intern

    Guwave Technology Co., Ltd. · Shanghai, China

    May 2023Aug 2023

    • Designed Python ETL modules and a unified schema for 10+ semiconductor clients, supporting analytics and reporting workflows.
    • Reduced data processing time by 50% through modularization, stronger error handling, and reusable parsing logic.
    PythonETLSchema DesignPandasSQL
  5. Toolchain Development Intern

    Black Sesame Technology Co., Ltd. · Shanghai, China

    Jun 2022Jul 2022

    • Built test automation for quantized neural networks on embedded systems, with MD5 checks, concurrent multi-device deployment, and structured client reports.
    PythonEmbeddedQuantized NNTest AutomationConcurrency