Abstract

: The field of digital forensics relies on expertise from multiple domains, including computer science, criminology, and law. It also relies on different toolsets and an analyst’s expertise to parse enormous amounts of user-generated data to find clues that help crack a case. This process of investigative analysis is often done manually. Artificial Intelligence (AI) can provide practical solutions to efficiently mine enormous amounts of data to find useful patterns that can be leveraged to investigate crimes. Natural Language Processing (NLP) is a subdomain of research under AI that deals with problems involving unstructured data, specifically language. The domain of NLP includes several tools to parse text, including topic modeling, pairwise correlation, word vector cosine distance measurement, and sentiment analysis. In this research, we propose a digital forensic investigative technique that uses an ensemble of NLP tools to identify a person of interest list based on a corpus of text. Our proposed method serves as a type of human feature reduction, where a total pool of suspects is filtered down to a short list of candidates who possess a higher correlation with the crime being investigated.