Soutron Summarizer

Manual data entry (cataloguing) is a process that all information professionals and archivists have to perform. It’s an essential but time-consuming task that can be a drain on limited resources.

Dynamic real-time bibliographic lookup and auto cataloguing solutions are available in Soutron. We can index the full text of any document added to a Soutron database, but it necessitates the reader downloading the file and reading the entire document to gain an understanding of what it contains. The conventional route to capturing this additional content is manual creation of abstracts or summaries using intellectual effort. A slow and sometimes tedious task that can introduce inconsistencies, errors and omissions if different people carry out this task on a document set.

Soutron provides a better way to help automate and speed up such a process.

Word / PDF Metadata Summarizer

Soutron Summarizer identifies metadata elements in a text-based document like PDF or Microsoft Word Document and extracts the text. The process automatically creates the record within Soutron; the metadata matched to the correct fields in the record.

Next, the tool creates a summary of the document. This part of the process begins by teaching the Summarizer about the documents to be processed. Different types of documents with different content may need to be separately analyzed and learned.

Using the very latest in machine learning technology the Summarizer application learns how to recognize the meta content to be catalogued.

Analyse, Learn & Extract
Summariser Metadata Extraction

This is approached in two ways:

  • Automatically extract the text of a document in an existing Soutron record and import the metadata back into the record
  • Automatically import a document and create a new record, with the associated metadata in Soutron.

As an example, by uploading PDF documents to a designated file location in Soutron, the summarizer can then get to work and extract metadata including; Title, Authors, Corporate Authors, Keywords, ISBN/ISSN. It can also generate an Abstract from text within the PDF.

Soutron Word / PDF Metadata Summarizer

Note that any content uploaded must have the permission of the copyrighting owner to carry out this activity. Also the PDFs must be text based, not text held within an image.

The end results? The PDF collections of meetings, papers, reports, corporate information or historic records are automatically created and catalogued into Soutron ready to be published and searched.

Users of the Summarizer solution have been very impressed with this new technology, in terms of labor saving and accuracy.

Next Steps

If you have a large collection of materials that require cataloguing, then get in touch with Soutron today so we can introduce you to Soutron Summarizer. Save hours, days, weeks and perhaps even months of manual data entry.