October 24, 2020 1 min to read

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Get the ultimate pre-trained features applicable to any language model

Video

The major limitation is that standard language models are unidirectional, and this limits the choice of architectures that can be used during pre-training. Such restrictions are sub-optimal for sentence-level tasks, and could be very harmful when applying fine-tuning based approaches to token-level tasks such as question answering, where it is crucial to incorporate context from both directions.

BERT alleviates the previously mentioned unidirectionality constraint by using a “masked language model” (MLM) pre-training objective, inspired by the Cloze task (Taylor, 1953). In addition to the masked language model, we also use a “next sentence prediction” task that jointly pretrains text-pair representations.

GrootAI v3.1.0

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Video

Progressive Layered Extraction (PLE): A Novel Multi-Task Learning (MTL) Model for Personalized Recommendations

Kyuyong Shin

Comments

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Video

Progressive Layered Extraction (PLE): A Novel Multi-Task Learning (MTL) Model for Personalized Recommendations

Share

Kyuyong Shin

Comments