Abstract: Modern-day data analysis systems have been hugely successful in making accurate data-driven predictions in several domains such as computer vision, natural language processing, etc. One can attribute much of this success to the automatic feature discovery enabled by frameworks such as Deep Neural Networks (DNN), resulting in predictive representations of data, for the task at hand (for instance, classification). A DNN model is said to be (well) trained if it has learned to make accurate predictions on the training data; in other words, the training error approaches zero. Naturally, one would assume that if the training is successful (training error is almost zero), the representations learned by the model are also *optimal* for the task. Unfortunately, it turns out, understanding when such representations are guaranteed to be optimal, for the prediction task at hand, is a hard problem --- even for a two-layer neural network. The quest is, therefore, to find models that can learn rich representations of data that are guaranteed to be optimal. In this talk, I will introduce one approach to tackle this problem using convex relaxations for a two-layer neural network setting with an activation function. This paper, presented as an oral talk at ICML 2018, was joint work with Zhan Shi, Xinhua Zhang, and Yaoliang Yu (Univ. of Waterloo).