How to Measure UTF-8 Byte Length
UTF-8 Bytes
When Byte Length Matters
Database VARCHAR limits, API payload caps, and SMS segments count UTF-8 bytes, not JavaScript string length. Emoji and accented characters often use multiple bytes — a 280-character tweet can exceed byte limits quickly.
Characters vs Bytes
JavaScript string length counts UTF-16 code units. This tool uses TextEncoder for UTF-8 byte counts — the encoding most web APIs and databases use for text storage.
Developer Tips
- Compare with your database collation byte semantics when near limits
- Normalize Unicode before counting if your backend applies NFC/NFD
- Use the Unicode converter tab for per-code-point byte inspection
Frequently asked questions
Why is byte length larger than character count?
Non-ASCII characters encode to multiple UTF-8 bytes. Emoji often use four bytes each.
Is UTF-16 byte length shown?
No. Output is UTF-8 byte length only.