International Support

Most matching tools corrupt international characters or strip them entirely. The FoundryOps Sheets addon preserves Unicode end-to-end — your CJK, Cyrillic, and Arabic data stays intact.

What the Sheets Addon Handles

Unicode Preservation

Chinese (中文), Japanese (日本語), Korean (한국어), Arabic (العربية), Cyrillic (Русский), Hebrew, Thai, and all Latin scripts pass through without corruption.

UTF-8 end-to-end. No character stripping, no mojibake.

International Corporate Markers

Automatically strips 株式会社, 有限公司, 控股, ООО, ПАО, GmbH, S.A., and dozens more so "ソニー株式会社" matches "ソニー" cleanly.

Markers are removed before comparison — the actual company name is preserved.

IDN Domain Handling

Internationalized Domain Names (.рф, .中国, etc.) and Punycode inputs are normalized correctly. Domains extracted from URLs and email addresses.

Handles www, ports, paths, and registrable-root extraction.

Script-Aware Normalization

Latin diacritics are folded when appropriate ("José" → "jose"). CJK, Cyrillic, and Arabic characters are never modified — only Latin script gets accent folding.

NFKC normalization ensures consistent Unicode handling.

RTL Script Support

Arabic and Hebrew text passes through matching and deduplication without corruption or reordering. Right-to-left rendering is preserved.

No special configuration needed — just paste your data.

Graph-Backed Coverage

Foundry Graph includes multi-language labels from Wikidata across 230 countries. Strongest coverage in US/EU with growing APAC representation.

Graph enrichment adds context the fuzzy matcher alone can't provide.

Real-World Examples

These transformations happen in the Sheets addon before matching.

Japanese Corporate Markers
ソニー株式会社ソニー

株式会社 stripped, company name preserved intact

Chinese Corporate Markers
腾讯控股有限公司腾讯

控股 and 有限公司 stripped as corporate markers

Cyrillic Corporate Markers
ООО "Рога и копыта"рога и копыта

ООО stripped, quotes cleaned, Cyrillic preserved

Latin Diacritics
Société Généralesociete generale

Accents folded for Latin scripts only — CJK/Cyrillic untouched

What We're Honest About

We'd rather tell you what works today than overpromise. Here's where we stand:

What works in Sheets today
  • Full Unicode preservation — no data loss, no corruption
  • International corporate marker cleanup (JP, CN, RU, DE, FR, etc.)
  • IDN domain normalization and extraction
  • Script-aware accent folding (Latin only)
  • RTL script preservation (Arabic, Hebrew)
Limitations to know about
  • Fuzzy matching uses standard algorithms — no CJK-specific matching in Sheets yet
  • No cross-script matching (e.g., Chinese name to English name)
  • Title/industry classification uses English keywords — won't infer non-English semantics
  • UI is English-only (data processing works in any language)