Yicong Huang
Information Systems Group, University of California, Irvine.
2059 Donald Bren Hall
Irvine, CA 92617
Yicong Huang is a 6th-year Ph.D. candidate at the Information Systems Group (ISG), Computer Science department, University of California, Irvine. He is set to graduate in June 2025. Guided by Dr. Chen Li, he specializes in big data management, data processing systems, and machine learning systems. He received his B.S. in Computer Science from University of California, Irvine as well.
Yicong’s journey includes vital contributions to the Texera project—an interactive workflow-based big data analytics system, showcasing extensive coding expertise and groundbreaking research. His publications in esteemed database venues such as VLDB and SIGMOD feature standout works like “Udon,” an interactive UDF debugger, “Pasta,” an optimizer for pipelining schedules, and “CORE,” a machine learning acceleration framework for streaming data. His interdisciplinary reach spans venues such as TOCHI, PNAS Nexus, JAMIA, AMIA, and PloS ONE.
Beyond academia, Yicong’s expertise is refined through research internships at Bytedance and VISA, contributing to projects like the HTAP database and windowed aggregation operators, leading to system patents and papers. Notably, his leadership skills shine through managing collaborative research projects and mentoring over 40 students. In the dynamic field of computer science, Yicong Huang embodies a unique fusion of academic excellence and leadership.
Education
-
2019.09 - 2025.03 Ph.D. of Computer Science
University of California, Irvine
Big data management, data-processing system, machine learning system
-
2015.09 - 2019.06
Work
- 2024.06 - 2024.09
Software Engineer Intern
Observe Inc.
- Contributed to the development of a dataset transformer for log store analytics, especially focusing on window maintenance of live data streams.
- Investigated the use of Snowflake Time Travel to optimize data partitioning and clustering for transformed datasets.
- 2022.06 - 2022.09
Research Intern
VISA Inc.
- Developed real-time window aggregation framework with out-of-order event support.
- Designed space-efficient, versatile list for in-memory raw-event storage.
- Proposed out-of-order handling algorithms, outperforming existing Flink designs.
- 2020.06 - 2020.09
Research Intern
ByteDance Inc.
- Worked on an HTAP database to support instant query on the real-time data.
- Implemented TP (MySQL) metadata to AP (Kudu) schema conversion.
- Integrated lock-free data structure for heap implementation.
News (Selected)
Oct 29, 2024 | Our paper “DS4ALL: Teaching High-School Students Data Science and AI/ML Using the Texera Workflow Platform as a Service” is accepted by Data Science Education K-12: Research to Practice Conference (DSE-K12’25)! |
---|---|
Aug 31, 2024 | I attended VLDB 2024 in Guangzhou, China, presenting the research paper for Texera, the system we have been building for the past 8 years! |
Aug 6, 2024 | Our paper “Pasta: A Cost-Based Optimizer for Generating Pipelining Schedules for Dataflow DAGs”, is accepted by SIGMOD 2025! |
Jun 13, 2024 | I attended SIGMOD 2024 in Santiago, Chile, presenting a research paper and a demo for Udon, the line-by-line UDF debugger on big data systems. Our demonstration won the Best Demo Runner Up Award! |
Apr 26, 2024 | I received UCI Graduate Dean’s Dissertation Fellowship for 2024! |
Publications
2025
- DSE-K12DS4ALL: Teaching High-School Students Data Science and AI/ML Using the Texera Workflow Platform as a ServiceIn Data Science Education K-12: Research to Practice Annual Conference, Feb 2025
- Pasta: A Cost-Based Optimizer for Generating Pipelining Schedules for Dataflow DAGsIn Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD), Feb 2025To appear
2024
- Texera: A System for Collaborative and Interactive Data Analytics Using WorkflowsProceedings of the VLDB Endowment (PVLDB), Feb 2024
- Demonstration of Udon: Line-by-line Debugging of User-Defined Functions in Data WorkflowsIn Companion of the 2024 International Conference on Management of Data, SIGMOD/PODS 2024, Santiago AA, Chile, June 9-15, 2024, Feb 2024
- Data Science Tasks Implemented with Scripts versus GUI-Based Workflows: The Good, the Bad, and the UglyIn 40th International Conference on Data Engineering, ICDE 2024 - Workshops, Utrecht, Netherlands, May 13-16, 2024, Feb 2024
- fncirBrain image data processing using collaborative data workflows on TexeraFrontiers in Neural Circuits, Feb 2024
2023
- Udon: Efficient Debugging of User-Defined Functions in Big Data Systems with Line-by-Line ControlProc. ACM Manag. Data, Feb 2023
- PNAS NexusUnderstanding underlying moral values and language use of COVID-19 vaccine attitudes on twitterPNAS nexus, Feb 2023
- The marketing and perceptions of non-tobacco blunt wraps on TwitterSubstance Use and Misuse, Feb 2023
- TOCHIWording Matters: the Effect of Linguistic Characteristics and Political Ideology on Resharing of COVID-19 Vaccine TweetsTransactions on Computer-Human Interaction, Feb 2023
2022
- Demonstration of Collaborative and Interactive Workflow-Based Data Analytics in TexeraProc. VLDB Endow., Feb 2022
- Optimizing Machine Learning Inference Queries with Correlative Proxy ModelsProc. VLDB Endow., Feb 2022
- Demonstration of Accelerating Machine Learning Inference Queries with Correlative Proxy ModelsProc. VLDB Endow., Feb 2022
- arXivReshape: Adaptive Result-aware Skew Handling for Exploratory Analysis on Big DataCoRR, Feb 2022
- AMIAPublic Opinions toward COVID-19 Vaccine Mandates: A Machine Learning-based Analysis of U.S. TweetsIn AMIA 2022, American Medical Informatics Association Annual Symposium, Washington, DC, USA, November 5-9, 2022, Feb 2022
- Medical Oncologyc-myc-mediated upregulation of NAT10 facilitates tumor development via cell cycle regulation in non-small cell lung cancerMedical Oncology, Feb 2022
2021
- JAMIAWhy do people oppose mask wearing? A comprehensive analysis of U.S. tweets during the COVID-19 pandemicJ. Am. Medical Informatics Assoc., Feb 2021
- PloS One