How to handle duplicate Journals, title variations, and unusual identifiers

Overview

When reviewing your reports in CELUS, you may occasionally notice the same journal listed multiple times. These duplicates often appear with slight variations in the title (e.g., with or without "The", or using "and" vs. "&"), or they may have different—and sometimes unusual—identifiers, such as a journal having multiple ISBNs instead of a standard ISSN.

This article explains why this happens, how it affects your statistics, and how you can manage these duplicates to get accurate usage data.

Why does this happen?

The short answer is that publisher metadata can be highly inconsistent. CELUS processes and handles more than 3 million journal titles. Because of this massive volume, it is impossible to manually curate or alter the data. We display the data exactly as we receive it from the publishers. It is entirely up to the publishers to assign identifiers and format their titles. This leads to a few common anomalies:

  • Platform Discrepancies: A journal available on one platform might have the exact same name but a different ISSN on another.
  • Unusual Identifiers: Sometimes publishers mistakenly assign ISBNs to journals (which typically only use ISSNs), leading to multiple records for the same journal, each with a different ISBN.
  • Formatting Variations: Slight naming differences, such as The Quarterly Review of Biology vs. Quarterly Review of Biology.

How should I interpret the statistics?

When you see multiple listings for the same journal, these are duplicate records of the same overarching title.

To understand the true total usage of that journal across your institution, you should sum up the usage statistics of all the overlapping records. They should not be treated as distinct publications, but rather as fragmented data for a single publication.

Managing Duplicates: Best Practices

1. Use the "All Available" Date Range to Find Variations

To ensure you are capturing every variation of a messy title, adjust your search parameters.

  • In the top menu where you set the Date range, select All available.
  • Search for the core keywords of the journal. This will reveal all historical variations, spelling differences, and differing identifiers associated with that publication.

2. Aggregate Usage Using Tags (Workaround)

Currently, the most effective way to combine the usage of duplicate titles into a single, clean statistic is to use CELUS's Tagging feature.

  1. Identify all the duplicate records for a specific journal (e.g., all 6 variations of Journal of Aesthetics and Art Criticism).
  2. Create a unified tag for them (e.g., label them all as Journal of Aesthetics and Art Criticism).
  3. When running your reports, filter or group your data specifically by this tag.

Pro-Tip: Using this tagging method will automatically aggregate the usage of all tagged variations into one combined metric, saving you from having to manually add the numbers together in an exported spreadsheet.

Learn how to use Tags in CELUS

Coming Soon: Built-in Deduplication

We understand that dealing with publisher data inconsistencies can be frustrating. We are actively working on a new reporting feature, expected to launch in the coming months.

This update will introduce a "Merge rows by" function. Once you generate a list of results, you will be able to easily merge rows by Title, ISSN, or other identifiers directly within the platform, allowing for simple, automated deduplication of your results.