ToolBook
Support us on Ko-fi
Help us keep this free, forever

Remove Duplicate Lines

How to remove duplicate lines online

Paste your list, pick the comparison rules, copy the deduped output. Free, fast, and processed locally in your browser.

  1. Paste your list

    Drop the text into the input panel. Emails, URLs, CSV column, log lines, anything one-entry-per-line works. The deduped output updates live.

  2. Pick the comparison rules

    Case-insensitive by default for emails and URLs. Trim whitespace to collapse near-duplicates with stray spaces. Ignore empty lines if blanks are noise.

  3. Choose which duplicate to keep

    "Keep first" preserves the original first occurrence. "Keep last" keeps the most recent, useful for last-value-wins configs and environment files.

  4. Sort or copy the result

    Flip "Sort A-Z" to get an alphabetised list. Hit the copy button to send the deduped output to your clipboard, ready to paste anywhere.

Frequently asked questions

How does this duplicate line remover work?

Paste a list, each newline is one entry, and we hash every line into a Set. The first occurrence of each unique line is kept and the rest get removed. The output updates live as you type, and the original line order is preserved unless you flip the sort toggle.

Does this work for huge lists?

Yes, up to a few hundred thousand lines comfortably. The algorithm is O(n): each line is hashed once into a Set. Browsers handle 100k to 500k entries without breaking a sweat. For multi-million-line datasets, use a server-side tool like sort -u.

Why are some lines I think are duplicates not removed?

Usually whitespace. Lines like "apple " and "apple" look the same but differ by a trailing space. Toggle "Trim whitespace" on and they collapse together. Tabs, non-breaking spaces, and zero-width characters cause the same issue.

What's the difference between "Keep first" and "Keep last"?

Keep first preserves the original order. The first time a line appears is where it stays. Keep last preserves the most recent occurrence, which is useful when later entries override earlier ones (like a config file where the last value wins).

Can I sort the output alphabetically while deduping?

Yes. Flip the "Sort A-Z" toggle and the deduped output gets sorted in ascending order. Leave it off to preserve original input order. Sorting happens after deduplication, so the line counts stay accurate either way.

Are empty lines counted as duplicates?

By default, yes. Every blank line after the first is removed. Toggle "Ignore empty lines" on to preserve every blank entry, including consecutive ones. Useful when blank lines are meaningful structure (paragraph breaks in a draft, section dividers in a config file).

Is my data sent to a server?

No. Every line of text you paste is processed entirely in your browser. Nothing leaves your device, nothing is logged, nothing is stored. Safe for confidential lists like customer emails, internal IDs, or proprietary configs.

How do I dedupe a list of emails or URLs with mixed case?

Turn "Case-sensitive" off so "User@Example.com" and "user@example.com" collapse to a single entry. Domain names are case-insensitive by RFC, so this is the right default for email and URL lists. Flip it on only when you need exact byte-equal matching.

The dedupe field guide

Why whitespace breaks dedupe, when case matters, and how to choose between keep-first and keep-last.

Why list deduplication is so often a tiny crisis

Ask someone how often they need to dedupe a list and they'll say "rarely." Ask them how often the list they were handed has duplicates in it and the answer flips: every time. CSVs exported from CRMs, lead lists scraped from websites, email subscriber dumps, log lines, branch names, todo lists, duplicates leak in everywhere data is copy-pasted between systems.

The fix is fast: split on newlines, run each line through a Set, write the unique entries back. That's a four-line shell command (sort -u, or awk '!seen[$0]++') and it's also exactly what this tool does in your browser.

The trim-whitespace gotcha

The single biggest reason "duplicates" don't dedupe is invisible whitespace. Two lines that look identical but differ by a trailing space, a leading tab, or a non-breaking space are technically different strings and the deduplicator sees them as two distinct values.

The "Trim whitespace" toggle on this tool is on by default for that exact reason. It strips the leading and trailing whitespace from each line before comparing, so "apple " and "apple" collapse together. The original spacing is preserved in the output for the lines that survive. Only the comparison is normalized.

If you want to dedupe based on exact byte-equal lines (rare, but useful for code or fingerprints), turn the trim toggle off.

Case sensitivity matters more than you think

Email addresses are technically case-insensitive on the local part (RFC 5321 § 2.4 says so but most providers normalize anyway). Domain names are always case-insensitive. Usernames depend on the system. So when you're deduping a list of emails, "User@Example.com" and "user@example.com" should collapse to one. Case-insensitive on by default.

When you're deduping case-sensitive identifiers (commit hashes, generated IDs, base64 tokens), flip the toggle. The tool will treat "Abc" and "abc" as separate entries.

"Keep first" vs "Keep last": the configuration-file problem

Most lists don't care which duplicate gets kept; one of them survives and that's that. But two cases benefit from picking explicitly:

Keep first preserves the original order. The first time a value appears is where it stays. Use this for lead lists where you want to remember the first time someone showed up, or for ordered config where the earliest entry wins.

Keep last preserves the most recent occurrence. Use this for "last value wins" semantics, like environment files where a later definition overrides an earlier one, or log files where you want the most recent state of an entity.

Both options preserve the relative order of the kept lines. Neither sorts unless you flip the "Sort A-Z" toggle.

Empty lines: structure or noise?

The "Ignore empty lines" toggle decides what happens to blanks. By default it is off, so blank lines participate in deduplication like any other line and only the first blank survives.

Turn it on when blank lines are meaningful structure that you want preserved exactly as in the input (paragraph breaks in a draft, section dividers in a config file). Leave it off for shopping lists, email lists, and URL lists where extra blanks are just noise.

Why deduping is only half the job

Dedupe + sort is the canonical pair. sort -u does both in one command on Unix because the two operations are almost always wanted together. The "Sort A-Z" toggle in this tool handles the most common case (alphabetical ascending) inline. For numeric, natural, or reverse sort modes, paste the deduped output into the Sort Lines tool.

For complex data (CSVs with multiple columns, JSON arrays of objects), dedupe in the source format. JavaScript: Array.from(new Set(arr)) for primitives, or Array.from(new Map(arr.map(x => [x.id, x])).values()) for objects. SQL: SELECT DISTINCT. This tool handles the line-by-line case, which is most of what comes up day-to-day.