Over the last decade, Deep Neural Networks (DNNs) have proven successful in a wide range of applications and hold the promise to have a positive impact on our lives. However, especially in high-stakes situations in which a wrong decision can be disastrous, it is imperative that we can understand and obtain an explanation for a model’s ‘decision’. This thesis studies this problem for image classification models from three directions. First, we evaluate methods that explain DNNs in a post-hoc fashion and highlight promises and shortcomings of existing approaches. Second, we study how to design inherently interpretable DNNs. In contrast to explaining the models post hoc, this approach not only takes the training procedure and the DNN architecture into account, but also modifies them to ensure that the decision process becomes inherently more transparent. In particular, two novel DNN architectures are introduced: the CoDA and the B-cos Networks. For every prediction, the computations of those models can be expressed by an equivalent linear transformation. As the corresponding linear matrix is optimised during training to align with task-relevant input patterns, it is shown to localise relevant input features well and thus lends itself to be used as an explanation for humans. Finally, we investigate how to leverage explanations to guide models during training, e.g., to suppress reliance on spuriously correlated features or to increase the fidelity of knowledge distillation approaches.