Yicong Huang

Software Engineer @ Databricks | Adjunct Assistant Professor @ UMass Amherst

prof_pic.jpeg

Mountain View, CA

Welcome to my personal website!

I currently serve as a Software Engineer at Databricks Inc. working on the Apache Spark team to advance one of the fastest and most scalable data engines in the world. I also serve as an Adjunct Assistant Professor at UMass Amherst.

I obtained my Ph.D. in Computer Science from the Information Systems Group (ISG), Computer Science department, University of California, Irvine, advised by Dr. Chen Li . My dissertation title is “UDF-Centric Dataflow Systems for Supporting User-Defined Functions in Collaborative Data Science, AI, and ML.” I received my B.S. in Computer Science from University of California, Irvine as well.

My research focuses on big data management, data-processing systems, and machine learning systems. I take a systems-driven approach by co-designing the key components of modern data-intensive pipelines, including workflow engines, UDF debugging frameworks, pipelining optimizers, and machine learning acceleration systems for streaming data. To optimize performance, usability, and scalability, I integrate techniques across data management, distributed systems, program analysis, and machine learning.

I have contributed extensively to the Apache Texera (Incubating) project, a collaborative and interactive system for data science and AI/ML using workflows. My research has been published in database venues such as SIGMOD, VLDB and ICDE, and my interdisciplinary work spans venues including TOCHI, PNAS Nexus, JAMIA, AMIA, and PLOS ONE.

My research was recognized with a SIGMOD 2024 Best Demo Runner-Up Award and has been supported by fellowships including the 2025 Joseph & Dorothy Fischer Memorial Endowed Fellowship, the 2025 Beall Family Foundation Graduate Student Entrepreneur Award in Computer Science, the 2024 Graduate Dean’s Dissertation Fellowship, and the 2023 Public Impact Fellowship. I also received a 2025 Most Promising Future Faculty Award at 33rd UCI Teach Day.

I have interned at Observe, ByteDance, VISA and Observe. I have also led collaborative research projects and mentored over 50 students across various levels.

Education

Work

  • 2025.08 - Present
    Software Engineer
    Databricks Inc.
    • Working on PySpark.
  • 2024.06 - 2024.09
    Software Engineer Intern
    Observe Inc.
    • Contributed to the development of a dataset transformer for log store analytics, especially focusing on window maintenance of live data streams.
    • Investigated the use of Snowflake Time Travel to optimize data partitioning and clustering for transformed datasets.
  • 2022.06 - 2022.09
    Research Intern
    VISA Inc.
    • Developed real-time window aggregation framework with out-of-order event support.
    • Designed space-efficient, versatile list for in-memory raw-event storage.
    • Proposed out-of-order handling algorithms, outperforming existing Flink designs.
  • 2020.06 - 2020.09
    Research Intern
    ByteDance Inc.
    • Worked on an HTAP database to support instant query on the real-time data.
    • Implemented TP (MySQL) metadata to AP (Kudu) schema conversion.
    • Integrated lock-free data structure for heap implementation.

Awards

News (Selected)

Nov 9, 2025 Glad to attend PyData Seattle 2025. Thanks for Databricks for sponsoring!
Sep 4, 2025 Our Tutorial titled “ML-Asset Management: Curation, Discovery, and Utilization” was presented at the VLDB, 2025 in London. Thanks for all the great effort from Mengying Wang, Moming Duan, Chen Li, Bingsheng He and Yinghui Wu,we did a successful presentation!
Aug 18, 2025 Today is my first day joining Databricks Inc., as a Software Engineer in the Spark Core team. I am excited to start this new journey and contribute to the development of Apache Spark, one of the fastest and most scalable data engines in the world! Looking forward to the challenges and opportunities ahead!
Aug 7, 2025 I have successfully defended my Ph.D. thesis titled “UDF-Centric Dataflow Systems for Supporting User-Defined Functions in Collaborative Data Science, AI, and ML”. I would like to express my deepest gratitude to my advisor, Dr. Chen Li, for his unwavering support and guidance throughout my Ph.D. journey. I am officially a Dr. now!
Aug 6, 2025 The Texera codebase has been officially donated to the Apache Software Foundation. The project name becomes “Apache Texera (Incubating)”. The repository has changed from Texera/texera to Apache/texera. Our team will continue the development and graduate from the incubator!

Publications

2025

  1. ML-Asset Management: Curation, Discovery, and Utilization
    Mengying Wang, Moming Duan, Yicong Huang, Chen Li, Bingsheng He, and 1 more author
    Proc. VLDB Endow., 2025
  2. Dissertation
    UDF-Centric Dataflow Systems for Supporting User-Defined Functions in Collaborative Data Science, AI, and ML
    Yicong Huang
    University of California, Irvine, 2025
  3. DSE-K12
    DS4ALL: Teaching High-School Students Data Science and AI/ML Using the Texera Workflow Platform as a Service
    Jiadong Bai, Xiaozhen Liu, Anthony Cuturrufo, Alexander Kundu Taylor, Jeehyun Hwang, and 7 more authors
    In Data Science Education K-12: Research to Practice Annual Conference, Feb 2025

2024

  1. IcedTea: Efficient and Responsive Time-Travel Debugging in Dataflow Systems
    Shengquan Ni, Yicong Huang, Zuozhi Wang, and Chen Li
    Proc. VLDB Endow., Feb 2024
  2. Pasta: A Cost-Based Optimizer for Generating Pipelining Schedules for Dataflow DAGs
    Xiaozhen Liu, Yicong Huang, Xinyuan Lin, Avinash Kumar, Sadeem Alsudais, and 1 more author
    Proc. ACM Manag. Data, Feb 2024
  3. Texera: A System for Collaborative and Interactive Data Analytics Using Workflows
    Zuozhi Wang, Yicong Huang, Shengquan Ni, Avinash Kumar, Sadeem Alsudais, and 4 more authors
    Proc. VLDB Endow., Feb 2024
  4. Demonstration of Udon: Line-by-line Debugging of User-Defined Functions in Data Workflows
    Yicong Huang, Zuozhi Wang, and Chen Li
    In Companion of the 2024 International Conference on Management of Data, SIGMOD/PODS 2024, Santiago AA, Chile, June 9-15, 2024, Feb 2024
  5. Data Science Tasks Implemented with Scripts versus GUI-Based Workflows: The Good, the Bad, and the Ugly
    Alexander K. Taylor, Yicong Huang, Junheng Hao, Xinyuan Lin, Xiusi Chen, and 2 more authors
    In 40th International Conference on Data Engineering, ICDE 2024 - Workshops, Utrecht, Netherlands, May 13-16, 2024, Feb 2024
  6. fncir
    Brain image data processing using collaborative data workflows on Texera
    Yunyan Ding, Yicong Huang, Pan Gao, Andy Thai, Atchuth Naveen Chilaparasetti, and 3 more authors
    Frontiers in Neural Circuits, Feb 2024

2023

  1. Udon: Efficient Debugging of User-Defined Functions in Big Data Systems with Line-by-Line Control
    Yicong Huang, Zuozhi Wang, and Chen Li
    Proc. ACM Manag. Data, Feb 2023
  2. PNAS Nexus
    Understanding underlying moral values and language use of COVID-19 vaccine attitudes on twitter
    Judith Borghouts, Yicong Huang, Sydney Gibbs, Suellen Hopfer, Chen Li, and 1 more author
    PNAS nexus, Feb 2023
  3. Subst. Use Misuse
    The marketing and perceptions of non-tobacco blunt wraps on Twitter
    Joshua Rhee, Yicong Huang, Sadeem Alsudais, Shengquan Ni, Avinash Kumar, and 2 more authors
    Substance Use and Misuse, Feb 2023
  4. TOCHI
    Wording Matters: the Effect of Linguistic Characteristics and Political Ideology on Resharing of COVID-19 Vaccine Tweets
    Judith Borghouts, Yicong Huang, Suellen Hopfer, Chen Li, and Gloria Mark
    Transactions on Computer-Human Interaction, Feb 2023

2022

  1. Demonstration of Collaborative and Interactive Workflow-Based Data Analytics in Texera
    Xiaozhen Liu, Zuozhi Wang, Shengquan Ni, Sadeem Alsudais, Yicong Huang, and 2 more authors
    Proc. VLDB Endow., Feb 2022
  2. Optimizing Machine Learning Inference Queries with Correlative Proxy Models
    Zhihui Yang, Zuozhi Wang, Yicong Huang, Yao Lu, Chen Li, and 1 more author
    Proc. VLDB Endow., Feb 2022
  3. Demonstration of Accelerating Machine Learning Inference Queries with Correlative Proxy Models
    Zhihui Yang, Yicong Huang, Zuozhi Wang, Feng Gao, Yao Lu, and 2 more authors
    Proc. VLDB Endow., Feb 2022
  4. arXiv
    Reshape: Adaptive Result-aware Skew Handling for Exploratory Analysis on Big Data
    Avinash Kumar, Sadeem Alsudais, Shengquan Ni, Zuozhi Wang, Yicong Huang, and 1 more author
    CoRR, Feb 2022
  5. AMIA
    Public Opinions toward COVID-19 Vaccine Mandates: A Machine Learning-based Analysis of U.S. Tweets
    Yawen Guo, Jun Zhu, Yicong Huang, Lu He, Changyang He, and 2 more authors
    In AMIA 2022, American Medical Informatics Association Annual Symposium, Washington, DC, USA, November 5-9, 2022, Feb 2022
  6. MED ONCOL
    c-myc-mediated upregulation of NAT10 facilitates tumor development via cell cycle regulation in non-small cell lung cancer
    Zimu Wang, Yicong Huang, Wanjun Lu, Jiaxin Liu, Xinying Li, and 3 more authors
    Medical Oncology, Feb 2022

2021

  1. JAMIA
    Why do people oppose mask wearing? A comprehensive analysis of U.S. tweets during the COVID-19 pandemic
    Lu He, Changyang He, Tera L. Reynolds, Qiushi Bai, Yicong Huang, and 3 more authors
    J. Am. Medical Informatics Assoc., Feb 2021
  2. PloS One
    The social amplification and attenuation of COVID-19 risk perception shaping mask wearing behavior: A longitudinal twitter analysis
    Suellen Hopfer, Emilia J Fields, Yuwen Lu, Ganesh Ramakrishnan, Ted Grover, and 4 more authors
    PloS one, Feb 2021