Why list deduplication is so often a tiny crisis
Ask someone how often they need to dedupe a list and they'll say "rarely." Ask them how often the list they were handed has duplicates in it and the answer flips: every time. CSVs exported from CRMs, lead lists scraped from websites, email subscriber dumps, log lines, branch names, todo lists, duplicates leak in everywhere data is copy-pasted between systems.
The fix is fast: split on newlines, run each line through a Set, write the unique entries back. That's a four-line shell command (sort -u, or awk '!seen[$0]++') and it's also exactly what this tool does in your browser.
The trim-whitespace gotcha
The single biggest reason "duplicates" don't dedupe is invisible whitespace. Two lines that look identical but differ by a trailing space, a leading tab, or a non-breaking space are technically different strings and the deduplicator sees them as two distinct values.
The "Trim whitespace" toggle on this tool is on by default for that exact reason. It strips the leading and trailing whitespace from each line before comparing, so "apple " and "apple" collapse together. The original spacing is preserved in the output for the lines that survive. Only the comparison is normalized.
If you want to dedupe based on exact byte-equal lines (rare, but useful for code or fingerprints), turn the trim toggle off.
Case sensitivity matters more than you think
Email addresses are technically case-insensitive on the local part (RFC 5321 § 2.4 says so but most providers normalize anyway). Domain names are always case-insensitive. Usernames depend on the system. So when you're deduping a list of emails, "User@Example.com" and "user@example.com" should collapse to one. Case-insensitive on by default.
When you're deduping case-sensitive identifiers (commit hashes, generated IDs, base64 tokens), flip the toggle. The tool will treat "Abc" and "abc" as separate entries.
"Keep first" vs "Keep last": the configuration-file problem
Most lists don't care which duplicate gets kept; one of them survives and that's that. But two cases benefit from picking explicitly:
Keep first preserves the original order. The first time a value appears is where it stays. Use this for lead lists where you want to remember the first time someone showed up, or for ordered config where the earliest entry wins.
Keep last preserves the most recent occurrence. Use this for "last value wins" semantics, like environment files where a later definition overrides an earlier one, or log files where you want the most recent state of an entity.
Both options preserve the relative order of the kept lines. Neither sorts unless you flip the "Sort A-Z" toggle.
Empty lines: structure or noise?
The "Ignore empty lines" toggle decides what happens to blanks. By default it is off, so blank lines participate in deduplication like any other line and only the first blank survives.
Turn it on when blank lines are meaningful structure that you want preserved exactly as in the input (paragraph breaks in a draft, section dividers in a config file). Leave it off for shopping lists, email lists, and URL lists where extra blanks are just noise.
Why deduping is only half the job
Dedupe + sort is the canonical pair. sort -u does both in one command on Unix because the two operations are almost always wanted together. The "Sort A-Z" toggle in this tool handles the most common case (alphabetical ascending) inline. For numeric, natural, or reverse sort modes, paste the deduped output into the Sort Lines tool.
For complex data (CSVs with multiple columns, JSON arrays of objects), dedupe in the source format. JavaScript: Array.from(new Set(arr)) for primitives, or Array.from(new Map(arr.map(x => [x.id, x])).values()) for objects. SQL: SELECT DISTINCT. This tool handles the line-by-line case, which is most of what comes up day-to-day.