Automatically Analyzing Text, Categorizing Documents, and Managing Electronic Records to Meet NARA Directives
Erudite is microservice-based solution that automatically analyzes documents and intelligently deciphers crucial metadata.Erudite determines which topic or category best aligns with a given document – in the machine learning field this is called document classification or categorization.These categories can be comprised from any records management framework, including the Bucket-Records Control Schedule (B-RCS) – the primary framework flowed down by the National Archives and Records Administration (NARA). Erudite also provides intuitive visualizations to allow records officers to understand what they manage, more easily create file plans, identify misplaced records, and support the incoming flood of new electronic records.
Erudite beat out three other vendors during a government bake off by demonstrating a high level of document categorization accuracy – achieving ~97% average accuracy on a benchmark dataset, while also being resilient to noise.It also demonstrated the ability to elastically scale in a cloud environment.Erudite was chosen as the most promising solution and our team was awarded a follow-on contract to deploy Erudite to the IC’s Commercial Cloud Services (C2S) – an example of Stratagem taking a program from concept to operations.
- Traditional and Cutting-Edge: Erudite employs a unique combination of traditional and state-of-the-art technology. It combines a robust Latent Dirichlet Allocation (LDA) technique with cutting-edge models that top the General Language Understanding Evaluation (GLUE) benchmark index. The Erudite ensemble combines these complementary techniques to deliver greater accuracy, superior resiliency, and more secure operations compared to any single approach.
- Proven Accuracy: Erudite delivers 97% accuracy on the open source 20-newsgroup dataset. It also outperforms AWS Comprehend with the same data. This high level of accuracy led to the selection of Erudite for delivery into an operational environment.
- Doesn’t Require Labelled Data: Our solution takes full advantage of any labelled data, but it is also able to support unlabeled data with unsupervised machine learning. Erudite generates “fingerprints” for documents and can analyze whether documents designated for the same B-RCS category are in the correct place.
- Records Management Dashboards: Erudite provides intuitive visualizations to help records managers organize and understand the proper disposition strategy for documents. With this web-based UI, it is easy to create or evaluate a file plan by highlighting the composition of records an officer needs to manage.