International Support
Most matching tools corrupt international characters or strip them entirely. The FoundryOps Sheets addon preserves Unicode end-to-end — your CJK, Cyrillic, and Arabic data stays intact.
What the Sheets Addon Handles
Chinese (中文), Japanese (日本語), Korean (한국어), Arabic (العربية), Cyrillic (Русский), Hebrew, Thai, and all Latin scripts pass through without corruption.
UTF-8 end-to-end. No character stripping, no mojibake.
Automatically strips 株式会社, 有限公司, 控股, ООО, ПАО, GmbH, S.A., and dozens more so "ソニー株式会社" matches "ソニー" cleanly.
Markers are removed before comparison — the actual company name is preserved.
Internationalized Domain Names (.рф, .中国, etc.) and Punycode inputs are normalized correctly. Domains extracted from URLs and email addresses.
Handles www, ports, paths, and registrable-root extraction.
Latin diacritics are folded when appropriate ("José" → "jose"). CJK, Cyrillic, and Arabic characters are never modified — only Latin script gets accent folding.
NFKC normalization ensures consistent Unicode handling.
Arabic and Hebrew text passes through matching and deduplication without corruption or reordering. Right-to-left rendering is preserved.
No special configuration needed — just paste your data.
Foundry Graph includes multi-language labels from Wikidata across 230 countries. Strongest coverage in US/EU with growing APAC representation.
Graph enrichment adds context the fuzzy matcher alone can't provide.
Real-World Examples
These transformations happen in the Sheets addon before matching.
ソニー株式会社→ソニー株式会社 stripped, company name preserved intact
腾讯控股有限公司→腾讯控股 and 有限公司 stripped as corporate markers
ООО "Рога и копыта"→рога и копытаООО stripped, quotes cleaned, Cyrillic preserved
Société Générale→societe generaleAccents folded for Latin scripts only — CJK/Cyrillic untouched
What We're Honest About
We'd rather tell you what works today than overpromise. Here's where we stand:
- Full Unicode preservation — no data loss, no corruption
- International corporate marker cleanup (JP, CN, RU, DE, FR, etc.)
- IDN domain normalization and extraction
- Script-aware accent folding (Latin only)
- RTL script preservation (Arabic, Hebrew)
- —Fuzzy matching uses standard algorithms — no CJK-specific matching in Sheets yet
- —No cross-script matching (e.g., Chinese name to English name)
- —Title/industry classification uses English keywords — won't infer non-English semantics
- —UI is English-only (data processing works in any language)