Yicong Huang
Ph.D. Candidate @ UC Irvine | Incoming Software Engineer @ Databricks | Adjunct Assistant Professor @ UMass Amherst

2059 Donald Bren Hall
Irvine, CA 92617
Welcome to my personal website!
I am currently a sixth-year Ph.D. candidate at the Information Systems Group (ISG), Computer Science department, University of California, Irvine. I am advised by Dr. Chen Li and expect to graduate in July 2025. I received my B.S. in Computer Science from University of California, Irvine as well.
My research focuses on big data management, data-processing systems, and machine learning systems. I take a systems-driven approach by co-designing the key components of modern data-intensive pipelines, including workflow engines, UDF debugging frameworks, pipelining optimizers, and machine learning acceleration systems for streaming data. To optimize performance, usability, and scalability, I integrate techniques across data management, distributed systems, program analysis, and machine learning.
I have contributed extensively to the Texera project (Apache Incubating), a collaborative and interactive system for data science and AI/ML using workflows. My research has been published in database venues such as SIGMOD (Udon, Pasta, etc) and VLDB (Texera, IcedTea, CORE, etc), and my interdisciplinary work spans venues including TOCHI, PNAS Nexus, JAMIA, AMIA, and PLOS ONE.
My research was recognized with a SIGMOD 2024 Best Demo Runner-Up Award and has been supported by fellowships including the 2025 Joseph & Dorothy Fischer Memorial Endowed Fellowship, the 2025 Beall Family Foundation Graduate Student Entrepreneur Award in Computer Science, the 2024 Graduate Dean’s Dissertation Fellowship, and the 2023 Public Impact Fellowship.
My expertise has been further shaped through internships at Observe, ByteDance, VISA and Observe. I have also led collaborative research projects and mentored over 50 students across various levels.
Updates:
- 🚀 Thrilled to share that I’ll be joining Databricks in 2025, working on the Apache Spark Runtime team to advance one of the fastest and most scalable data engines in the world!
- 🎓 Looking ahead, in Fall 2027, I’ll transition to the Manning College of Information and Computer Sciences (CICS) at UMass Amherst as a Tenure-Track Assistant Professor!
Education
-
2019.09 - 2025.07 Ph.D. of Computer Science
University of California, Irvine
Big data management, data-processing system, machine learning system
-
2015.09 - 2019.06
Work
- 2024.06 - 2024.09
Software Engineer Intern
Observe Inc.
- Contributed to the development of a dataset transformer for log store analytics, especially focusing on window maintenance of live data streams.
- Investigated the use of Snowflake Time Travel to optimize data partitioning and clustering for transformed datasets.
- 2022.06 - 2022.09
Research Intern
VISA Inc.
- Developed real-time window aggregation framework with out-of-order event support.
- Designed space-efficient, versatile list for in-memory raw-event storage.
- Proposed out-of-order handling algorithms, outperforming existing Flink designs.
- 2020.06 - 2020.09
Research Intern
ByteDance Inc.
- Worked on an HTAP database to support instant query on the real-time data.
- Implemented TP (MySQL) metadata to AP (Kudu) schema conversion.
- Integrated lock-free data structure for heap implementation.
News (Selected)
Apr 1, 2025 | I received two fellowships: Beall Family Foundation Graduate Student Entrepreneur Award in Computer Science for $5,000 and Joseph & Dorothy Fischer Memorial Endowed Fellowship for $3,000. Thanks for the support! |
---|---|
Apr 1, 2025 | I will serve as the Web/Publicity Chair for SIGMOD 2027. Let’s make SIGMOD 2027 a great success! |
Mar 14, 2025 | I gave an invited talk at the University at Buffalo. |
Mar 11, 2025 | I gave an invited talk at Lehigh University. |
Mar 8, 2025 | I gave an invited talk at the University of Maryland, Baltimore County. |
Publications
2025
- DSE-K12DS4ALL: Teaching High-School Students Data Science and AI/ML Using the Texera Workflow Platform as a ServiceIn Data Science Education K-12: Research to Practice Annual Conference, Feb 2025To appear
2024
- IcedTea: Efficient and Responsive Time-Travel Debugging in Dataflow SystemsProc. VLDB Endow., Feb 2024
- Pasta: A Cost-Based Optimizer for Generating Pipelining Schedules for Dataflow DAGsProc. ACM Manag. Data, Feb 2024
- Texera: A System for Collaborative and Interactive Data Analytics Using WorkflowsProc. VLDB Endow., Feb 2024
- Demonstration of Udon: Line-by-line Debugging of User-Defined Functions in Data WorkflowsIn Companion of the 2024 International Conference on Management of Data, SIGMOD/PODS 2024, Santiago AA, Chile, June 9-15, 2024, Feb 2024
- Data Science Tasks Implemented with Scripts versus GUI-Based Workflows: The Good, the Bad, and the UglyIn 40th International Conference on Data Engineering, ICDE 2024 - Workshops, Utrecht, Netherlands, May 13-16, 2024, Feb 2024
- fncirBrain image data processing using collaborative data workflows on TexeraFrontiers in Neural Circuits, Feb 2024
2023
- PNAS NexusUnderstanding underlying moral values and language use of COVID-19 vaccine attitudes on twitterPNAS nexus, Feb 2023
- Subst. Use MisuseThe marketing and perceptions of non-tobacco blunt wraps on TwitterSubstance Use and Misuse, Feb 2023
- TOCHIWording Matters: the Effect of Linguistic Characteristics and Political Ideology on Resharing of COVID-19 Vaccine TweetsTransactions on Computer-Human Interaction, Feb 2023
2022
- Demonstration of Collaborative and Interactive Workflow-Based Data Analytics in TexeraProc. VLDB Endow., Feb 2022
- Optimizing Machine Learning Inference Queries with Correlative Proxy ModelsProc. VLDB Endow., Feb 2022
- Demonstration of Accelerating Machine Learning Inference Queries with Correlative Proxy ModelsProc. VLDB Endow., Feb 2022
- arXivReshape: Adaptive Result-aware Skew Handling for Exploratory Analysis on Big DataCoRR, Feb 2022
- AMIAPublic Opinions toward COVID-19 Vaccine Mandates: A Machine Learning-based Analysis of U.S. TweetsIn AMIA 2022, American Medical Informatics Association Annual Symposium, Washington, DC, USA, November 5-9, 2022, Feb 2022
- MED ONCOLc-myc-mediated upregulation of NAT10 facilitates tumor development via cell cycle regulation in non-small cell lung cancerMedical Oncology, Feb 2022
2021
- JAMIAWhy do people oppose mask wearing? A comprehensive analysis of U.S. tweets during the COVID-19 pandemicJ. Am. Medical Informatics Assoc., Feb 2021
- PloS One