Mars Su
https://www.linkedin.com/in/huei-yuan-su-a16458134/
Work Experience
- Trendmicro - Staff Data Engineer (Present)
- Gogolook - Sr. Data/ML Engineer
- Wavenet - Data Scientist
- adGeek - Advertising AI Engineer
Important Experience
- PyCon APAC 2022 Speaker
- Publish of Book - Apache NiFi 讓你輕鬆設計 Data Pipeline
- itHome 2021 AI&Data - Champion
Education
- National Taiwan University of Science and Technology - Master's Degree, IM
- National Changhua University of Education - Bachelor's Degree, IM
Sessions
In the current data-driven world, we are always face on large data volumn storage, analytics and machine-learning application problem. In ths past, we always use database, data lake or data warehouse to store different data, includes structured data, unstructured data or semi-structured data. Although current have many related storage and tool can solve corresponding problems and scenraio, still have some limitation and imperfection.
In order to improve these, one concept gradually is discussed in these year. That is a Lakehouse, which integrate data lake and data warehouse advantages so that become a powerful architecture to implement modern data stack. Based on this concept, have some completed service and tool can implement it. Includes Databricks - Delta Lake, Apache Iceberg or Apache Hudi.
In this session, i will quickly describe and analyze these concept, benefits and drawbacks about database, data lake, data warehouse and lakehouse. And introduce some represent service. Lastly, i will show some demo about lakehouse so that attendees can more understand it specifically.