What percent-encoding is and why URLs need it
A URL is a string of ASCII characters, but the characters allowed in different parts of a URL are tightly restricted by RFC 3986. Characters outside the safe set (including spaces, non-ASCII Unicode, and reserved delimiters) must be percent-encoded: replaced by a % followed by the character's hexadecimal byte value.
A space becomes %20. An ampersand becomes %26. The Thai letter ก (U+0E01) becomes %E0%B8%81 (its UTF-8 encoding in hex).
Two contexts: full URL vs component
This is the detail most developers get wrong.
Full URL encoding (like encodeURI in JavaScript) should leave structural characters intact: :, /, ?, #, &, =, and @ are all legal in a URL and must not be encoded in their structural positions. Encoding a slash in a path would corrupt the path.
Component encoding (like encodeURIComponent in JavaScript) encodes everything except the unreserved character set: A–Z a–z 0–9 - _ . ~. Use this when encoding a single value that will be inserted into a URL, such as a search query or a parameter value.
Common mistake:
// Wrong — encodes the slashes and colon too
const url = encodeURIComponent('https://example.com/search?q=hello world');
// Correct — encode only the parameter value
const url = `https://example.com/search?q=${encodeURIComponent('hello world')}`;
%20 vs + for spaces
Both %20 and + represent a space, but in different contexts:
%20is the RFC 3986 standard and works everywhere: in path segments, query strings, and fragments.+as a space is a legacy convention from HTML form encoding (application/x-www-form-urlencoded). It only means a space inside query strings. In a path segment,+is a literal plus sign.
Modern codebases should use %20 unless they are specifically handling HTML form data. Many frameworks decode + as a space in query strings by default, which can cause subtle bugs when your data contains a literal +.
When decoding goes wrong
decodeURIComponent throws a URIError if the input contains a % that is not followed by two valid hex digits. This is common when processing user-supplied input that contains a literal % which was not itself percent-encoded. The solution is to always percent-encode the source before embedding it.
Partial decoding (decoding only the sequences that are valid) is a safer approach when you cannot control the input format.
Unicode in URLs
Modern browsers and most HTTP libraries encode non-ASCII characters as their UTF-8 byte sequences in percent-encoded form. ñ (U+00F1) → %C3%B1 (the two UTF-8 bytes 0xC3 and 0xB1). The Internationalized Domain Names (IDN) standard handles non-ASCII hostnames separately via Punycode encoding.
If you are building an API that receives URLs, always decode on the server side and re-encode consistently before storing. Stored URLs should always be in normalised percent-encoded form.