1 min to read
Attention Is All You Need
What’s the structure in my dataset or what are the symmetries in my dataset and is there a model that exists that has the inductive biases to model these properties that exist in my dataset.
Video
Slide
Deep learning is all about representation learning, building the right tools for learning representations is important factor in achieving empirical success.
They propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train.
Comments