Seminars of 2023 - 学术活动 - 计算数学与科学工程计算研究所

学术活动

< 返回

The evolution history of large language models and several topics related to optimization

首页 - 学术活动

报告人：

Xin Shen, Doctor, The Chinese University of Hong Kong

邀请人：

Yuhong Dai, Professor

题目：

The evolution history of large language models and several topics related to optimization

时间地点：

15:30-16:30 October 25 (Wednesday), N109

摘要：

Recently, ChatGPT has received attention for its ability to generate intelligent conversations. The backbone of ChatGPT is the large language model (LLM) pre-trained on Transformer network, which has dominated the community of natural language processing (NLP). This talk is made up of two sections. In the first section, I would review the evolution history of LLMs. The milestone works in the field of NLP over the past ten years would be discussed. These topics include word embedding, language modeling, sequence-to-sequence learning, pre-trained language models (BERT, GPT, etc.), finetuning, and some recent emerging techniques. In the second section of this talk, I would share some preliminary ideas at the intersection of NLP and optimization. Specifically, learning to optimize (L2O) leverages machine learning methods to solve optimization problems, aiming at reducing the manually designed optimization algorithms. Currently, graph neural network (GNN) is the mainstream neural architecture for learning to optimize linear programming (LP) and mixed-integer linear programming (MILP) problems, and the theoretical justification is provided by some researchers. Meanwhile, Transformer network has dominated NLP, computer vision, and massive other research fields, which indicates its universality. Then, a natural question is: is it suitable to model LP and MILP with Transformer networks? In the talk, I would share some attempts in this direction. Another topic is about the pretrain-finetune paradigm, which has brought great success to the NLP field. Motivated by this, I would share some idea of pretraining a universal L2O optimizer.

Curriculum Vitae: Xin Shen received bachelor's degree in software engineering from Fudan University, master's degree in operations research from University of Chinese Academy of Sciences, and Ph.D. degree from department of system engineering and engineering management, the Chinese University of Hong Kong. His research interests include machine learning, natural language processing, and optimization.