Abstract
Deep learning has become increasingly common in aerial imagery analysis. As its use continues to grow, it is crucial that we understand and can explain its behavior. One eXplainable AI (XAI) approach is to generate linguistic summarizations of data and/or models. However, the number of summaries can increase significantly with the number of data attributes, posing a challenge. Herein, we proposed a hierarchical approach for generating and evaluating linguistic statements of black box deep learning models. Our approach scores and ranks statements according to user-specified criteria. A systematic process was outlined for the evaluation of an object detector on a low altitude aerial drone. A deep learning model trained on real imagery was evaluated on a photorealistic simulated dataset with known ground truth across different contexts. The effectiveness and versatility of our approach was demonstrated by showing tailored linguistic summaries for different user types. Ultimately, this process is an efficient human-centric way of identifying successes, shortcomings, and biases in data and deep learning models.