Text and Data Mining

Recognizing the benefit of data analysis for the digital humanities field, Gale, a Cengage company, has made content available from its Gale Primary Sources to academic researchers for data mining and text mining purposes. Text and data mining – the process by which text or datasets are crawled by software that recognizes entities, relationships, and action – helps researchers draw new conclusions among disparate data and is emerging as an important area of scholarly research.

Gale is the first publisher to make content sets such as these available in this format to academics; as a visionary market leader, it aims to be a vibrant part of moving the digital humanities forward.

Gale delivers this content upon customer request and in a cost-effective manner for the use of text and data mining. Content from most Gale Primary Sources, including essential research databases like Eighteenth Century Collections Online and Nineteenth Century Collections Online, as well as content from Gale’s extensive newspaper archives and other collections are available.

Textual Analysis

In addition to content, Gale offers enhanced textual analysis tools within the digital archives to assist researchers who may not have programming experience or digital humanities programs at their institution. Gale Artemis:  Primary Sources, in particular, offers a cutting-edge interface that allows both advanced textual analysis and a means of cross-searching over fifty Gale Primary Sources within a single interface. Available textual analysis tools include:

Term Clusters assist researchers in thoroughly developing their research topic. By identifying and organizing commonly occurring themes, this tool reveals hidden connections to search terms—helping scholars shape their research and integrate diverse content with relevant information. This tool is now more powerful than ever with the addition of Term Cluster Tiles as a new way to visually organize results.

Textual Analysis Tools include Term Clusters, Term Frequency Graphs, and More.

The Term Frequency tool aids researchers in tracking central themes and ideas. Researchers can now see the frequency of their search term within content to begin assessing how individuals, events, and ideas interacted and developed over time.

The primary source documents within Gale Primary Sources are keyword and full text searchable thanks to Optical Character Recognition (OCR). Users can now download this OCR in a .txt format, enabling a new level of access to their search results.

Please keep an eye on this page for more exciting updates regarding Gale’s ongoing efforts to bring this content to digital humanities scholars.