The high performance of modern computer vision methods has resulted in considerable interest in applications to radiology. To galvanize research in this area, a number of research groups have released large publicly available datasets, particularly for chest radiographs that benefit from large resources such as NIH ChestX-ray14, CheXpert, PadChest, and MIMIC-CXR. These images particularly benefit from a free-text interpretation provided by a practicing domain expert, which provides a human interpretable label of the image. However, caution must be taken when developing models using data acquired during routine clinical practice. A number of implicit biases exist: the acquisition of the image is based on clinical need, the interpretation of the image is a response to a specific clinical question, and the structuring of the data is not intended for retrospective research. In this tutorial, we will build high-performance computer vision models using large publicly available datasets. We will evaluate the performance of these classifiers on distinct institutions, and highlight generalization issues. We further use class-dependent model interpretation methods to inspect our classifier and highlight the source of its biases. We will end with suggestions for researchers who aim to build machine learning models on retrospectively collected clinical data.