Skip to contents

Normalizes UTF-8 character strings to a deterministic, ASCII-only, uppercase representation suitable for identifier-style matching and comparison (e.g. CISAC IPI–style name matching).

Usage

normalize_for_ipi(x, sep = "\\|")

normalise_for_ipi(x, sep = "\\|")

Arguments

x

A character vector to be normalized.

sep

A regular expression used to split multiple name variants within a single string. Defaults to a pipe separator ("\\|").

Value

A character vector of normalized strings.

Details

The normalization:

  • transliterates accented Latin characters,

  • applies deterministic Cyrillic-to-Latin transliteration aligned with common CISAC / CMO practice,

  • removes punctuation and non-alphanumeric characters,

  • standardizes whitespace,

  • preserves pipe-separated name variants.

This function produces IPI-style normalized strings for internal matching and reconciliation. It does not generate official CISAC IPI Names and carries no CISAC or ISO authority.

Examples

normalize_for_ipi("Björk Guðmundsdóttir")
#> [1] "BJORK GUDMUNDSDOTTIR"
#> "BJORK GUDMUNDSDOTTIR"

normalize_for_ipi("Седой Урал|Ольга Тихонова")
#> [1] "SEDOY URAL|OLGA TIKHONOVA"
#> "SEDOY URAL|OLGA TIKHONOVA"