International Support

Most matching tools corrupt international characters or strip them entirely. The FoundryOps Sheets addon preserves Unicode end-to-end — your CJK, Cyrillic, and Arabic data stays intact.

See real examples Try the Sheets add-on

What the Sheets Addon Handles

Unicode Preservation

Chinese (中文), Japanese (日本語), Korean (한국어), Arabic (العربية), Cyrillic (Русский), Hebrew, Thai, and all Latin scripts pass through without corruption.

UTF-8 end-to-end. No character stripping, no mojibake.

International Corporate Markers

Automatically strips 株式会社, 有限公司, 控股, ООО, ПАО, GmbH, S.A., and dozens more so "ソニー株式会社" matches "ソニー" cleanly.

Markers are removed before comparison — the actual company name is preserved.

IDN Domain Handling

Internationalized Domain Names (.рф, .中国, etc.) and Punycode inputs are normalized correctly. Domains extracted from URLs and email addresses.

Handles www, ports, paths, and registrable-root extraction.

Script-Aware Normalization

Latin diacritics are folded when appropriate ("José" → "jose"). CJK, Cyrillic, and Arabic characters are never modified — only Latin script gets accent folding.

NFKC normalization ensures consistent Unicode handling.

RTL Script Support

Arabic and Hebrew text passes through matching and deduplication without corruption or reordering. Right-to-left rendering is preserved.

No special configuration needed — just paste your data.

Graph-Backed Coverage

Foundry Graph includes multi-language labels from Wikidata across 230 countries. Strongest coverage in US/EU with growing APAC representation.

Graph enrichment adds context the fuzzy matcher alone can't provide.

Real-World Examples

These transformations happen in the Sheets addon before matching.

Japanese Corporate Markers

ソニー株式会社→ソニー

株式会社 stripped, company name preserved intact

Chinese Corporate Markers

腾讯控股有限公司→腾讯

控股 and 有限公司 stripped as corporate markers

Cyrillic Corporate Markers

ООО "Рога и копыта"→рога и копыта

ООО stripped, quotes cleaned, Cyrillic preserved

Latin Diacritics

Société Générale→societe generale

Accents folded for Latin scripts only — CJK/Cyrillic untouched

What We're Honest About

We'd rather tell you what works today than overpromise. Here's where we stand:

What works in Sheets today

Full Unicode preservation — no data loss, no corruption
International corporate marker cleanup (JP, CN, RU, DE, FR, etc.)
IDN domain normalization and extraction
Script-aware accent folding (Latin only)
RTL script preservation (Arabic, Hebrew)

Limitations to know about

—Fuzzy matching uses standard algorithms — no CJK-specific matching in Sheets yet
—No cross-script matching (e.g., Chinese name to English name)
—Title/industry classification uses English keywords — won't infer non-English semantics
—UI is English-only (data processing works in any language)

Explore More Platform Capabilities

FoundryOne™ Engine

Multi-algorithm matching with reason chips. Built for 10M+ rows and explainable precision.

Learn more

Foundry Graph™

The only corporate graph where every field traces to its source. Included with every FoundryOps plan.

Learn more