2024年05月05日 星期日 登录 EN

学术活动
The evolution history of large language models and several topics related to optimization
首页 - 学术活动
报告人:
Xin Shen, Doctor, The Chinese University of Hong Kong
邀请人:
Yuhong Dai, Professor
题目:
The evolution history of large language models and several topics related to optimization
时间地点:
15:30-16:30 October 25 (Wednesday), N109
摘要:

Recently, ChatGPT has received attention for its ability to generate intelligent conversations. The backbone of ChatGPT is the large language model (LLM) pre-trained on Transformer network, which has dominated the community of natural language processing (NLP). This talk is made up of two sections. In the first section, I would review the evolution history of LLMs. The milestone works in the field of NLP over the past ten years would be discussed. These topics include word embedding, language modeling, sequence-to-sequence learning, pre-trained language models (BERT, GPT, etc.), finetuning, and some recent emerging techniques. In the second section of this talk, I would share some preliminary ideas at the intersection of NLP and optimization. Specifically, learning to optimize (L2O) leverages machine learning methods to solve optimization problems, aiming at reducing the manually designed optimization algorithms. Currently, graph neural network (GNN) is the mainstream neural architecture for learning to optimize linear programming (LP) and mixed-integer linear programming (MILP) problems, and the theoretical justification is provided by some researchers. Meanwhile, Transformer network has dominated NLP, computer vision, and massive other research fields, which indicates its universality. Then, a natural question is: is it suitable to model LP and MILP with Transformer networks? In the talk, I would share some attempts in this direction. Another topic is about the pretrain-finetune paradigm, which has brought great success to the NLP   field. Motivated by this, I would share some idea of pretraining a universal L2O optimizer.

Curriculum Vitae: Xin Shen received bachelor's degree in software engineering from Fudan University, master's degree in operations research from University of Chinese Academy of Sciences, and Ph.D. degree from department of system engineering and engineering management, the Chinese University of Hong Kong. His research interests include machine learning, natural language processing, and optimization.