&

學術看板

當前位置：首頁 > 學術交流 > 學術看板 > 正文

How big data enhance AI

日期：2018-05-15 來源：本站作者：關注：次

時間：2018年5月16日上午9:30

地點：望江校區(qū)東三教503會議室

報告人：唐明潔

報告人簡介：2007年永利yl23411官網計算機本科畢業(yè)，2010年從中國科學院研究生院取得計算機碩士學位，2013年從美國普渡大學獲得計算機碩士學位，2016年從美國普渡大學取得計算機博士學位。曾就職于美國微軟，IBM研究院。現就職于大數據公司Hortonworks做研究科學家,主要從事Spark和TensorFlow的研究和開發(fā)。博士期間在包括VLDB, TKDE, ICDE, EDBT, SIGSPATILA, IEEEIntelj在內的會議雜志發(fā)篇論文20余篇，曾獲得數據庫會議SISAP201最佳論文，數據挖掘會議ADMA2009最佳應用論文，部分研究成果已經被開源社區(qū)PostgreSQL和Spark所采用。

學術報告摘要：TensorFlow and XGBoost are state-of-the-art platform for Deep learning and Machine learning. However, either of them are suit for big data processing in real production environment. For example, TensorFlow fail to provide OLAP or ETL over big data, thus, it impedes TensorFlow to train a deep learning model with clean and enough data in more efficient way. Similarly, despite better performance compared with other gradient-boosting implementations, it’s still a time-consuming task to train XGBoost model when the data is big. And it usually requires extensive parameter tuning to get a highly accurate model, which brings the strong requirement to speed up the whole process.

In this talk, we will mainly introduce how Spark to improve TensorFlow and XGBoost in the real application, and demonstrate how these platforms could be benefit from big data techniques. More specifically, we at first introduce how Spark ML come to support auto parameter tuning, and apply transfer learning to enhance the real application like recommendation system and image searching. Secondly, we cover the implementation and performance improvement of GPU-based XGBoost algorithm, summarize model tuning experience and best practice, share the insights on how to build a heterogeneous data analytic and machine learning pipeline based on Spark in a GPU-equipped YARN cluster, and show how to push model into production.

【關閉】

国产成人无码精品露脸_亚洲Av成人片乱码色午夜麻豆_国产乱色熟女沈阳91_AV在线播放观看18禁