The prediction and avoidance of large-scale disruptions is a crucial step towards successful magnetic confinement fusion power production in tokamaks. With a priori simulations unable to address this critical issue in a practical way, data-driven statistical methods have proven more successful. Nevertheless, they have up to now: (i) been unable to employ the majority of high-dimensional sensory data; (ii) required prohibitively long times for training; and (iii) failed to deliver adequate predictions on experimental devices other than the ones they were trained on. Since powerful future reactors will not be able to afford major disruptions, such generalization is critical. Inspired by the recent success of deep learning (DL) methods in learning and generalizing from multi-modal and high-dimensional data across diverse domains, we present a fundamentally different DL approach using convolutional and recurrent neural networks to forecast disruptions. Trained on experimental data from the JET and DIII-D tokamaks, our approach: (i) overcomes the key limitations of past approaches; (ii) provides new insights into fusion physics and associated discovery science; and (iii) is capable of engaging supercomputing at the largest scale to deliver solutions with greater accuracy and speed.