2022-05-19 게시 됨2022-08-30 업데이트 됨paper14분안에 읽기 (약 2164 단어)

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Author

저자: Yinhan Liu∗§ Myle Ott∗§ Naman Goyal∗§ Jingfei Du∗§ Mandar Joshi† Danqi Chen§ Omer Levy§ Mike Lewis§ Luke Zettlemoyer†§ Veselin Stoyanov§
- † Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA
- § Facebook AI

느낀점

Abstract

hyperparameter choices have significant impact on the final results
carefully measures the impact of many key hyperparameters and training data size
find that BERT was significantly undertrained, and can match or exceed the performance of every model published after it

Introduction

We present a replication study of BERT pretraining (Devlin et al., 2019), which includes a careful evaluation of the effects of hyperparmeter tuning and training set size.
modifications
- (1) training the model longer, with bigger batches, over more data;
- (2) removing the next sentence prediction objective;
- (3) training on longer sequences; and
- (4) dynamically changing the masking pattern applied to the training data.
contributions
- (1) We present a set of important BERT design choices and training strategies and introduce alternatives that lead to better downstream task performance;
- (2) We use a novel dataset, CC-NEWS, and confirm that using more data for pretraining further improves performance on downstream tasks;
- (3) Our training improvements show that masked language model pretraining, under the right design choices, is competitive with all other recently published methods.

자세히 보기

2022-01-10 게시 됨2022-08-30 업데이트 됨paper2분안에 읽기 (약 369 단어)

A Neural Network Solves and Generates Mathematics Problems by Program Synthesis: Calculus, Differential Equations, Linear Algebra, and More

Author

저자:
- Iddo Drori1,a,b, Sunny Trana, Roman Wangb, Newman Chengb, Kevin Liua, Leonard Tangc, Elizabeth Kea, Nikhil Singha, Taylor L. Pattic, Jayson Lynchd, Avi Shporera, Nakul Vermab, Eugene Wub, and Gilbert Strang(아니 그 유명한 길버트 스트랭..)a
- aMIT; bColumbia University; cHarvard University; dUniversity of Waterloo

느낀점

large scale 모델이 생각보다 할줄아는게 많다는걸 알게됨.. 코드로 파인튜닝하면 수학문제 푸는 코드도 만드는구나 (그런 코드가 깃헙에 있었겠지만..!)

Abstract

program synthesis을 통해 PLM & code에 finetune된 모델(Codex Transformer model)이 수학문제를 풀수있음을 논함
university-level Mathematics course questions을 생성하는 연구(?)

Introduction

자세히 보기

2021-12-20 게시 됨2022-08-30 업데이트 됨paper12분안에 읽기 (약 1834 단어)

CLINE: Contrastive Learning with Semantic Negative Examples for Natural Language Understanding

Author

저자:
- Dong Wang1,2∗ , Ning Ding1,2∗, Piji Li3† , Hai-Tao Zheng1,2†
- 1Department of Computer Science and Technology, Tsinghua University 2Tsinghua ShenZhen International Graduate School, Tsinghua University 3Tencent AI Lab
- google scholar에서 찾긴 어려웠음

느낀점

이 논문에서는 adversarial을 같은말이라고 쓰는거 같고, constrastive를 반대말이라고 쓰는듯..
PLM을 학습할때 두번째 pair에 아무 문장이나 넣는게 아니라 의미적으로 다른 문장을 넣겠다가 핵심임
https://github.com/kandorm/CLINE

Abstract

PLM이 양질의 semantic representation을 만들어주지만 simple perturbations에 취약함
PLM을 강건하게 하기위해 adversarial training에 포커스를 맞추고 있음
이미지 프로세싱과는 다르게 텍스트는 discrete하기 때문에 몇개의 단어 교체는 매우 큰 차이를 만들어내기도함
이러한 결과를 연구하기 위해 perturbation 관련 여러 파일럿 실험을 진행했음
adversarial training이 useless하거나 오히려 모델에 안좋다는 사실을 발견함
이러한 문제를 해결하기 위해 Contrastive Learning withg semantIc Negative Examples (CLINE)을 제안함
unsupervised 방식의 의미적으로 네거티브한 샘플들을 구축했고, 이를 통해 semantically adversarial attacking에 robust하도록 개선하려함
실험적 결과로는 sentiment analysis, reasoning, MRC 등 태스크에서 개선효과가 있었음
문장레벨에서 CLINE이 서로 다른 의미에 대해서 분리되고 같은 의미에 대해서는 모이는 것도 확인할 수 있었음(임베딩얘긴듯..)

Introduction

자세히 보기

2021-12-13 게시 됨2022-08-30 업데이트 됨paper5분안에 읽기 (약 812 단어)

GPT Understands, Too

Author

저자:
- Xiao Liu* 1 2 Yanan Zheng* 1 2 Zhengxiao Du1 2

느낀점

neural model은.. 작은 변화에 너무 민감하다?!

Abstract

GPTs 계열에서 기존의 fine-tuning 방법이 NLU task에서 좋은 결과를 내기 어려웠음
새로운 방법인 p-tuning이라는 방법을 제안해서 BERT와 비슷한 크기의 모델에서는 좋은 결과를 내게함
knowledge probing (LAMA) benchmark에서 64%(P@1)를 기록했음, SuperGlue에선는 BERT의 지도학습보다 좋은 결과를 냄
p-tuning이 BERT 성능도 좋게함을 발견함(few-sho & 지도학습 셋팅에서)
p-tuning은 few-shot SuperGlue에서 SOTA임

Introduction

자세히 보기

2021-12-06 게시 됨2022-08-30 업데이트 됨paper7분안에 읽기 (약 1089 단어)

WARP: Word-level Adversarial ReProgramming

Author

저자:
- Karen Hambardzumyan1, Hrant Khachatrian1,2, Jonathan May3 (1YerevaNN, 2Yerevan State University,
  3Information Sciences Institute, University of Southern California), 2021

느낀점

PET + p-tuning

Abstract

대부분의 transfer learning은 params sharing을 최대화해서, 하나 혹은 여러 task-specific layers를 LM 위에 쌓아서 학습하는 형태임
본 논문에서는 다른 형태로 automatic propmpt generation이라는 선행연구 기반의 adversarial reprogramming 방법을 사용함
Adversarial reprogramming에서는 task-specific word embeddings 을 학습하는데, 이는 특정 input text가 합쳐져서 입력으로 들어올때 LM이 specified task를 해결하게 하는 것임 (이래서 propmpt연구의 확장이라 했나..)
25K trainable params로 25M trainable params 모델까지 outperform했음 (GLUE benchmark 기준)
task-specific human-readable prompts로 few-shot setting(32 training samples)에서 2개의 SuperGLUE task에서 GPT-3보다 좋은 성능을 냄

Introduction

자세히 보기

2021-11-29 게시 됨2022-08-30 업데이트 됨paper7분안에 읽기 (약 1079 단어)

Knowledge Graph Based Synthetic Corpus Generation for Knowledge-Enhanced Language Model Pre-training

Author

저자:
- Oshin Agarwal∗1 Heming Ge2 Siamak Shakeri2 Rami Al-Rfou2
  (1 University of Pennsylvania 2Google Research), 2021

느낀점

기존에 KG triples을 자연어 문장으로 바꾸는 연구가 생각보다 적었었나? 싶었음 (혹은 잘 안되었던것 같음. explicit하게 표현이 안된다던지)

Abstract

KG triples를 자연어로 바꾸는 연구들(Data-To-Text Generation)은 주로 도메인특화된 벤치마크셋 중심으로 연구되었음
wikidata와 같은 데이터셋도 structetured KGs와 natural language corpora를 결합하는데 쓸수있음을 본 연구에서 보였음 (existing LM과 결합가능)

Introduction

자세히 보기

2020-10-22 게시 됨2022-08-28 업데이트 됨paper6분안에 읽기 (약 948 단어)

Document Expansion by Query Prediction

Author

저자:
- Rodrigo Nogueira, Wei Yang, Jimmy Lin, and Kyunghyun Cho
  (New York University, Facebook AI Research), 2019
조경현 교수님과 co-work
같은 년도에 쓴 Passage Re-ranking with BERT도 인용수가 높은 편 (4페이지 짜리인데도)
{: height=”50%” width=”50%”}

느낀점

요즘엔 T5로 시도한 방법들이 결과가 좋다고 나오고 있음
DE (document expansion) 관련 논문들이 은근히 없다.. 다 QE (Query expansion)
BERT를 검색에 적용한 논문들은 거의 re rank 수준.. inverted index에 적용한건 거의 없고 약간 흑마법처럼 보이기도..
참고
- https://paperswithcode.com/paper/document-expansion-by-query-prediction
- https://github.com/castorini/docTTTTTquery

Abstract

검색을 효과적으로 개선하는 방법중 하나는 문서 텀을 확장하는 것임
QA 시스템의 관점에서는 문서가 질문을 잠재적으로 포함한다고도 볼 수 있음
(query, relevant documents) pair 셋으로 seq2seq 모델 학습해서 query 예측하는 방법 제안
re-ranking component와 결합하면 two retrieval task에서 SOTA 결과 나옴

Introduction

자세히 보기

2020-06-18 게시 됨2022-08-28 업데이트 됨paper10분안에 읽기 (약 1443 단어)

BERT-based Lexical Substitution

Author

논문 매우 많이 씀
AAAI, ACL, ICLR 등 탑티어 컨퍼런스 논문냄
마소에서 인턴했고 2021 fall 박사과정 자리 구하는중 (아직 석사라는 뜻)
개인블로그 운영: https://michaelzhouwang.github.io/

{: height=”50%” width=”50%”} {: height=”30%” width=”30%”}

저자:
- Wangchunshu Zhou, Ke Xu (Beihang University)
- Tao Ge, Furu Wei, Ming Zhou (Microsoft Research Asia)

Abstract

이전 연구들은 lexical resources (e.g. WordNet)으로 부터 타겟의 동의어를 찾아서 substitute candidates를 얻어서 context를 보고 랭킹하는 식의 연구였음
이런 연구들은 두가지 한계점이 있음
- 타겟 단어의 synonyms 사전에 없는 good substitute candidates를 찾아내지 못함
- substitution이 문장의 global context에 주는 영향을 고려하지 못함
이 문제를 해결하기 위해, end-to-end BERT-based lexical substitution approach를 제안함
annotated data or manually curated resources 없이 만든 substitute candidates 제안하고 검증함
target word’s embedding 에 dropout 적용해서 target word’s semantics and contexts for proposing substitute candidates를 고려할 수 있게함
SOTA 찍음 (LS07, LS14 benchmark)

Introduction

자세히 보기

2020-05-26 게시 됨2022-08-28 업데이트 됨paper14분안에 읽기 (약 2071 단어)

Deeper Text Understanding for IR with Contextual Neural Language Modeling"

목차

Author
Abstract
Introduction
Related Work
Document Search with BERT
Experimental Setup
Results and Discussion
Conclusion

Author

CMU 박사괴정 (https://www.cs.cmu.edu/~zhuyund/)
IR에 적용하는 Language Understanding쪽 연구
Three papers in deep retrieval and conversational search got accepted into SIGIR 2020!

{: height=”50%” width=”50%”}

Abstract

뉴럴넷은 복잡한 언어 패턴과 query-document relation을 자동으로 학습할 수 있는 새로운 가능성을 제공하고 있음
Neural IR models은 query-document relevance pattern을 학습하는데 좋은 결과를 보여주지만, query 또는 document의 text content를 이해하는 것에 대한 연구는 많지 않았음 (?)
본 논문에서는 최근에 제안되었던 contextual neural LM, BERT 등이 IR에서 deeper text understanding에 얼마나 효과 있는지를 알아보고함
실험 결과는 전통적인 word embedding보다 BERT가 제공하는 contextual text representations이 더 효과있음을 보여주었음
BoW retrieval 모델에 비해 contextual LM은 더 나은 language structure를 사용하고, 자연어 형태의 query에 대해 큰 성능향상을 가져올 수 있음
text understanding ability를 search knowledge와 결합시키는 것은 제한적인 학습셋을 갖는 조건에서 search task를 Ptr BERT가 더 잘할 수 있게 해줌 (정확한해석은 아닌데 대략 이런의미)

자세히 보기

2020-02-06 게시 됨2022-08-28 업데이트 됨ML33분안에 읽기 (약 4937 단어)

Towards a Human-like Open-Domain Chatbot

Author

저자:
- Daniel Adiwardana, Minh-Thang Luong, David R. So, Jamie Hall, Noah Fiedel, Romal Thoppilan, Zi Yang, Apoorv Kulshreshtha, Gaurav Nemade, Yifeng Lu, Quoc V. Le (Google Research, Brain Team)

Who is an Author?

구글스칼라나 다른 곳에 딱히 프로필이 없음
그의 행적은 트위터에.. https://twitter.com/xpearhead
미디엄도.. https://medium.com/@dmail07

{: height=”50%” width=”50%”}

느낀점

일단 논문이 꽤 길다
모델쪽보단 automatic evaluation metric을 제안했다는것에 은근 더 중점을 맞추는 느낌
모델쪽 얘기는 Evolved Transformer논문을 더 봐야할듯
뭐랄까.. 설명이 많고 장황한 논문이다. 새로운 개념을 정의하는게 많은 논문임. 제안하는 개념이 필요한 이유등을 주로 설명함.
Metric + large scale + tip이 본 논문의 주요 contribution인듯 modeling적인 부분은 별로 기술되어있지 않음

자세히 보기