2022-05-19 게시 됨2022-08-30 업데이트 됨paper14분안에 읽기 (약 2164 단어)

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Author

저자: Yinhan Liu∗§ Myle Ott∗§ Naman Goyal∗§ Jingfei Du∗§ Mandar Joshi† Danqi Chen§ Omer Levy§ Mike Lewis§ Luke Zettlemoyer†§ Veselin Stoyanov§
- † Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA
- § Facebook AI

느낀점

Abstract

hyperparameter choices have significant impact on the final results
carefully measures the impact of many key hyperparameters and training data size
find that BERT was significantly undertrained, and can match or exceed the performance of every model published after it

Introduction

We present a replication study of BERT pretraining (Devlin et al., 2019), which includes a careful evaluation of the effects of hyperparmeter tuning and training set size.
modifications
- (1) training the model longer, with bigger batches, over more data;
- (2) removing the next sentence prediction objective;
- (3) training on longer sequences; and
- (4) dynamically changing the masking pattern applied to the training data.
contributions
- (1) We present a set of important BERT design choices and training strategies and introduce alternatives that lead to better downstream task performance;
- (2) We use a novel dataset, CC-NEWS, and confirm that using more data for pretraining further improves performance on downstream tasks;
- (3) Our training improvements show that masked language model pretraining, under the right design choices, is competitive with all other recently published methods.

자세히 보기

2022-01-10 게시 됨2022-08-30 업데이트 됨paper2분안에 읽기 (약 369 단어)

A Neural Network Solves and Generates Mathematics Problems by Program Synthesis: Calculus, Differential Equations, Linear Algebra, and More

Author

저자:
- Iddo Drori1,a,b, Sunny Trana, Roman Wangb, Newman Chengb, Kevin Liua, Leonard Tangc, Elizabeth Kea, Nikhil Singha, Taylor L. Pattic, Jayson Lynchd, Avi Shporera, Nakul Vermab, Eugene Wub, and Gilbert Strang(아니 그 유명한 길버트 스트랭..)a
- aMIT; bColumbia University; cHarvard University; dUniversity of Waterloo

느낀점

large scale 모델이 생각보다 할줄아는게 많다는걸 알게됨.. 코드로 파인튜닝하면 수학문제 푸는 코드도 만드는구나 (그런 코드가 깃헙에 있었겠지만..!)

Abstract

program synthesis을 통해 PLM & code에 finetune된 모델(Codex Transformer model)이 수학문제를 풀수있음을 논함
university-level Mathematics course questions을 생성하는 연구(?)

Introduction

자세히 보기

2021-12-20 게시 됨2022-08-30 업데이트 됨paper12분안에 읽기 (약 1834 단어)

CLINE: Contrastive Learning with Semantic Negative Examples for Natural Language Understanding

Author

저자:
- Dong Wang1,2∗ , Ning Ding1,2∗, Piji Li3† , Hai-Tao Zheng1,2†
- 1Department of Computer Science and Technology, Tsinghua University 2Tsinghua ShenZhen International Graduate School, Tsinghua University 3Tencent AI Lab
- google scholar에서 찾긴 어려웠음

느낀점

이 논문에서는 adversarial을 같은말이라고 쓰는거 같고, constrastive를 반대말이라고 쓰는듯..
PLM을 학습할때 두번째 pair에 아무 문장이나 넣는게 아니라 의미적으로 다른 문장을 넣겠다가 핵심임
https://github.com/kandorm/CLINE

Abstract

PLM이 양질의 semantic representation을 만들어주지만 simple perturbations에 취약함
PLM을 강건하게 하기위해 adversarial training에 포커스를 맞추고 있음
이미지 프로세싱과는 다르게 텍스트는 discrete하기 때문에 몇개의 단어 교체는 매우 큰 차이를 만들어내기도함
이러한 결과를 연구하기 위해 perturbation 관련 여러 파일럿 실험을 진행했음
adversarial training이 useless하거나 오히려 모델에 안좋다는 사실을 발견함
이러한 문제를 해결하기 위해 Contrastive Learning withg semantIc Negative Examples (CLINE)을 제안함
unsupervised 방식의 의미적으로 네거티브한 샘플들을 구축했고, 이를 통해 semantically adversarial attacking에 robust하도록 개선하려함
실험적 결과로는 sentiment analysis, reasoning, MRC 등 태스크에서 개선효과가 있었음
문장레벨에서 CLINE이 서로 다른 의미에 대해서 분리되고 같은 의미에 대해서는 모이는 것도 확인할 수 있었음(임베딩얘긴듯..)

Introduction

자세히 보기

2021-12-13 게시 됨2022-08-30 업데이트 됨paper5분안에 읽기 (약 812 단어)

GPT Understands, Too

Author

저자:
- Xiao Liu* 1 2 Yanan Zheng* 1 2 Zhengxiao Du1 2

느낀점

neural model은.. 작은 변화에 너무 민감하다?!

Abstract

GPTs 계열에서 기존의 fine-tuning 방법이 NLU task에서 좋은 결과를 내기 어려웠음
새로운 방법인 p-tuning이라는 방법을 제안해서 BERT와 비슷한 크기의 모델에서는 좋은 결과를 내게함
knowledge probing (LAMA) benchmark에서 64%(P@1)를 기록했음, SuperGlue에선는 BERT의 지도학습보다 좋은 결과를 냄
p-tuning이 BERT 성능도 좋게함을 발견함(few-sho & 지도학습 셋팅에서)
p-tuning은 few-shot SuperGlue에서 SOTA임

Introduction

자세히 보기

2021-12-06 게시 됨2022-08-30 업데이트 됨paper7분안에 읽기 (약 1089 단어)

WARP: Word-level Adversarial ReProgramming

Author

저자:
- Karen Hambardzumyan1, Hrant Khachatrian1,2, Jonathan May3 (1YerevaNN, 2Yerevan State University,
  3Information Sciences Institute, University of Southern California), 2021

느낀점

PET + p-tuning

Abstract

대부분의 transfer learning은 params sharing을 최대화해서, 하나 혹은 여러 task-specific layers를 LM 위에 쌓아서 학습하는 형태임
본 논문에서는 다른 형태로 automatic propmpt generation이라는 선행연구 기반의 adversarial reprogramming 방법을 사용함
Adversarial reprogramming에서는 task-specific word embeddings 을 학습하는데, 이는 특정 input text가 합쳐져서 입력으로 들어올때 LM이 specified task를 해결하게 하는 것임 (이래서 propmpt연구의 확장이라 했나..)
25K trainable params로 25M trainable params 모델까지 outperform했음 (GLUE benchmark 기준)
task-specific human-readable prompts로 few-shot setting(32 training samples)에서 2개의 SuperGLUE task에서 GPT-3보다 좋은 성능을 냄

Introduction

자세히 보기

2021-11-29 게시 됨2022-08-30 업데이트 됨paper7분안에 읽기 (약 1079 단어)

Knowledge Graph Based Synthetic Corpus Generation for Knowledge-Enhanced Language Model Pre-training

Author

저자:
- Oshin Agarwal∗1 Heming Ge2 Siamak Shakeri2 Rami Al-Rfou2
  (1 University of Pennsylvania 2Google Research), 2021

느낀점

기존에 KG triples을 자연어 문장으로 바꾸는 연구가 생각보다 적었었나? 싶었음 (혹은 잘 안되었던것 같음. explicit하게 표현이 안된다던지)

Abstract

KG triples를 자연어로 바꾸는 연구들(Data-To-Text Generation)은 주로 도메인특화된 벤치마크셋 중심으로 연구되었음
wikidata와 같은 데이터셋도 structetured KGs와 natural language corpora를 결합하는데 쓸수있음을 본 연구에서 보였음 (existing LM과 결합가능)

Introduction

자세히 보기

2020-10-22 게시 됨2022-08-28 업데이트 됨paper6분안에 읽기 (약 948 단어)

Document Expansion by Query Prediction

Author

저자:
- Rodrigo Nogueira, Wei Yang, Jimmy Lin, and Kyunghyun Cho
  (New York University, Facebook AI Research), 2019
조경현 교수님과 co-work
같은 년도에 쓴 Passage Re-ranking with BERT도 인용수가 높은 편 (4페이지 짜리인데도)
{: height=”50%” width=”50%”}

느낀점

요즘엔 T5로 시도한 방법들이 결과가 좋다고 나오고 있음
DE (document expansion) 관련 논문들이 은근히 없다.. 다 QE (Query expansion)
BERT를 검색에 적용한 논문들은 거의 re rank 수준.. inverted index에 적용한건 거의 없고 약간 흑마법처럼 보이기도..
참고
- https://paperswithcode.com/paper/document-expansion-by-query-prediction
- https://github.com/castorini/docTTTTTquery

Abstract

검색을 효과적으로 개선하는 방법중 하나는 문서 텀을 확장하는 것임
QA 시스템의 관점에서는 문서가 질문을 잠재적으로 포함한다고도 볼 수 있음
(query, relevant documents) pair 셋으로 seq2seq 모델 학습해서 query 예측하는 방법 제안
re-ranking component와 결합하면 two retrieval task에서 SOTA 결과 나옴

Introduction

자세히 보기

2020-06-18 게시 됨2022-08-28 업데이트 됨paper10분안에 읽기 (약 1443 단어)

BERT-based Lexical Substitution

Author

논문 매우 많이 씀
AAAI, ACL, ICLR 등 탑티어 컨퍼런스 논문냄
마소에서 인턴했고 2021 fall 박사과정 자리 구하는중 (아직 석사라는 뜻)
개인블로그 운영: https://michaelzhouwang.github.io/

{: height=”50%” width=”50%”} {: height=”30%” width=”30%”}

저자:
- Wangchunshu Zhou, Ke Xu (Beihang University)
- Tao Ge, Furu Wei, Ming Zhou (Microsoft Research Asia)

Abstract

이전 연구들은 lexical resources (e.g. WordNet)으로 부터 타겟의 동의어를 찾아서 substitute candidates를 얻어서 context를 보고 랭킹하는 식의 연구였음
이런 연구들은 두가지 한계점이 있음
- 타겟 단어의 synonyms 사전에 없는 good substitute candidates를 찾아내지 못함
- substitution이 문장의 global context에 주는 영향을 고려하지 못함
이 문제를 해결하기 위해, end-to-end BERT-based lexical substitution approach를 제안함
annotated data or manually curated resources 없이 만든 substitute candidates 제안하고 검증함
target word’s embedding 에 dropout 적용해서 target word’s semantics and contexts for proposing substitute candidates를 고려할 수 있게함
SOTA 찍음 (LS07, LS14 benchmark)

Introduction

자세히 보기

2020-05-27 게시 됨2022-08-28 업데이트 됨cslog9분안에 읽기 (약 1302 단어)

Elastic Search 정리

아파치 루씬 기반의 Java 오픈소스 분산 검색엔진임
ES를 통해 루씬 라이브러리를 단독으로 사용할 수 있게됨
- 많은 양의 데이터를 빠르게, 거의 실시간(NRT, Near Real Time)으로 저장, 검색, 분석할 수 있음
ELK 스택이란 다음과 같음
- Logstash
  - 다양한 소스(DB, csv, …)의 로그 또는 트랸쟉션 데이터를 수집, 집계, 파싱하여 ES로 전달
- Elasticsearch
  - Logstash 로 전달 받은 데이터를 검색 및 집계해서 필요한 관심 정보를 획득
  - http://localhost:9200/
- Kibana
  - ES의 빠른 검색을 통해 데이터를 시각화 및 모니터링함
  - 키바나는 JVM에서 실행되는 엘라스틱서치와 로그스태시와 달리 node.js로 실행하는 웹애플리케이션임
  - http://localhost:5601/

{: height=”50%” width=”50%”}

RDB와 Es 비교
- Database -> Index
- Table -> Type
- column -> Field
- row -> Document
  {: height=”50%” width=”50%”}
  {: height=”50%” width=”50%”}
ES 아키텍쳐 / 용어 정리
{: height=”50%” width=”50%”}
- 클러스터
  - 노드들의 집합
  - 서로 다른 클러스터는 데이터의 접근, 교환이 불가
- 노드
  - ES를 구성하는 하나이 단위 프로세스임
  - 역할에 따라 Master-eligible, Data, Ingest, Tribe 등으로 구분 가능
    - Master-eligible: 클러스터 제어하는 마스터 노드 (인덱스 생상, 삭제 / 클러스터 노드 추적, 관리 / 데이터 입력시 어느 샤드에 할당할지 결정)
    - Data node: CRUD 작업과 관련있는 노드 (CPU, 메모리를 많이 써서 모니터링 필요함, Master node와 분리되는 것이 좋음)
    - Ingest node: 데이터 변환, 사전 처리 파이프라인
    - Coordination only node: 로드밸런서와 비슷한 역할
- 인덱스 (index), 샤드 (Shard), 복제 (Replica)
  - 인덱스: RDB의 DB와 대응됨
  - 샤드: 데이터 분산해서 저장하는 방법임. scale out을 위해 index를 여러 shard로 쪼갬. 기본적으로는 1개 존재하고 나중에 개수 조정가능
  - 복제: 또 다른 형태의 shard라 할 수 있음. 노드를 손실했을 경우 데이터 신뢰성을 위해 샤드들 복제하는 것. 그러므로 replica는 서로 다른 노드에 존재하는 것이 좋음
    {: height=”50%” width=”50%”}
ES 특징
- Scale out: 샤드를 통해 규모가 수평적으로 늘어날 수 있음
- 고가용성: replica를 통해 데이터 안정성 보장
- Schema Free: json 문서를 통해 데이터 검색을 수행하므로 스키마 개념이 없음
- Restful: 데이터 CRUD 작업은 HTTP Restful API를 통해 수행함
  {: height=”50%” width=”50%”}
예시 (document (row) 생성)
1
# curl -XPOST 'localhost:9200/victolee/blog/1?pretty' -d '{"postName" : "elasticsearch", "category" : "IT"}' -H 'Content-Type: application/json'
- -d 옵션
  - 추가할 데이터를 json 포맷으로 전달합니다.
- -H 옵션
  - 헤더를 명시합니다. 예제에서는 json으로 전달하기 위해서 application/json으로 작성했습니다.
- ?pretty
  - 결과를 예쁘게 보여주도록 요청
- 결과:
  - 이렇게 curl 요청을 하면, victolee 인덱스에, blog 타입으로 id 값이 1인 document가 저장됨
    {: height=”50%” width=”50%”}
역색인 (Inverted Index)
- https://www.slideshare.net/kjmorc/ss-80803233

키바나에서 데이터 삽입 예제

PUT /my_playlist/song/6
{
"title" : "1000 years",
"artist" : "Christina Perri",
"album" : "Breaking Dawn",
"year" : 2011
}

명령어 설명
- my_playlist : 여러분의 데이터가 들어갈 인덱스의 이름입니다.
- song : 만들어질 document의 이름입니다.
- 6 : 엘리먼트 인스턴스의 아이디입니다. 이 경우에는 song id입니다.
만일 my_playlist가 존재하지 않았다면, 새로운 인덱스인 my_playlist가 만들어짐. document인 song과 id인 6도 똑같이 만들어짐.
값을 업데이트 하기 위해서는 PUT 명령어를 동일한 document에 사용하면 됨. 새로운 필드도 추가 가능함
{: height=”50%” width=”50%”}
GET 명령어 쓰면 값 불러옴
1
GET /my_playlist/song/6
{: height=”50%” width=”50%”}

데이터 선택하는 조건문 예시

# state가 UT인 데이터 가져오기
GET /bank/_search?q=state:UT

# state가 UT이거나 CA인 데이터 가져오기
GET /bank/_search?q=state:UT OR CA

# state가 TN이면서 여성(female)인 데이터 가져오기
GET /bank/_search?q=state:TN AND gender:F

# 20살보다 많은 나이를 가진 사람들 가져오기
GET /bank/_search?q=age:>20

# 20살과 25살 사이의 데이터 가져오기
GET /bank/_search?q=age:(>=20 AND <=25)

좀 더 복잡한 질의

# address 필드에서 Street이라는 단어가 포함되어야 함
# gender 필드에서 f가 정확히 일치하여야 함
# age 필드에서 숫자는 25보다 크거나 같아야 함
GET /_search
{
"query": {                                  //1
    "bool": {                                 //2
    "must": [
    { "match":{"address":"Street"}}        //3
    ],
    "filter": [                             //4
        { "term":{"gender":"f"}},             //5
        { "range": { "age": { "gte": 25 }}}   //6
    ]
    }
}
}

Kibana

데이터는 ES에 올라가 있어야함
ES 인덱스(DB)에 저자된 데이터를 키바나가 인식할 수 있도록 인덱스를 설정해야함

데이터 복구

스냅샷을 이용해야함~!
https://kay0426.tistory.com/46

자세히 보기

2020-05-26 게시 됨2022-08-28 업데이트 됨paper14분안에 읽기 (약 2071 단어)

Deeper Text Understanding for IR with Contextual Neural Language Modeling"

목차

Author
Abstract
Introduction
Related Work
Document Search with BERT
Experimental Setup
Results and Discussion
Conclusion

Author

CMU 박사괴정 (https://www.cs.cmu.edu/~zhuyund/)
IR에 적용하는 Language Understanding쪽 연구
Three papers in deep retrieval and conversational search got accepted into SIGIR 2020!

{: height=”50%” width=”50%”}

Abstract

뉴럴넷은 복잡한 언어 패턴과 query-document relation을 자동으로 학습할 수 있는 새로운 가능성을 제공하고 있음
Neural IR models은 query-document relevance pattern을 학습하는데 좋은 결과를 보여주지만, query 또는 document의 text content를 이해하는 것에 대한 연구는 많지 않았음 (?)
본 논문에서는 최근에 제안되었던 contextual neural LM, BERT 등이 IR에서 deeper text understanding에 얼마나 효과 있는지를 알아보고함
실험 결과는 전통적인 word embedding보다 BERT가 제공하는 contextual text representations이 더 효과있음을 보여주었음
BoW retrieval 모델에 비해 contextual LM은 더 나은 language structure를 사용하고, 자연어 형태의 query에 대해 큰 성능향상을 가져올 수 있음
text understanding ability를 search knowledge와 결합시키는 것은 제한적인 학습셋을 갖는 조건에서 search task를 Ptr BERT가 더 잘할 수 있게 해줌 (정확한해석은 아닌데 대략 이런의미)

자세히 보기