
We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Attention is all you need! Attention is all you need! Yeah, yeah yeah Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Attention is all you need! Attention is all you need! Yeah, yeah yeah Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles, by over 2 BLEU Attention is all you need! Attention is all you need! Yeah, yeah yeah
No comments yet!
Be the first one to show your love for this song