Normalize character strings for IPI-style matching — normalize_for

Normalizes UTF-8 character strings to a deterministic, ASCII-only, uppercase representation suitable for identifier-style matching and comparison (e.g. CISAC IPI–style name matching).

Usage

normalize_for_ipi(x, sep = "\\|")

normalise_for_ipi(x, sep = "\\|")

Arguments

x: A character vector to be normalized.
sep: A regular expression used to split multiple name variants within a single string. Defaults to a pipe separator ("\\|").

Value

A character vector of normalized strings.

Details

The normalization:

transliterates accented Latin characters,
applies deterministic Cyrillic-to-Latin transliteration aligned with common CISAC / CMO practice,
removes punctuation and non-alphanumeric characters,
standardizes whitespace,
preserves pipe-separated name variants.

This function produces IPI-style normalized strings for internal matching and reconciliation. It does not generate official CISAC IPI Names and carries no CISAC or ISO authority.

Examples

normalize_for_ipi("Björk Guðmundsdóttir")
#> [1] "BJORK GUDMUNDSDOTTIR"
#> "BJORK GUDMUNDSDOTTIR"

normalize_for_ipi("Седой Урал|Ольга Тихонова")
#> [1] "SEDOY URAL|OLGA TIKHONOVA"
#> "SEDOY URAL|OLGA TIKHONOVA"