What is XML and where is it still used?
XML (Extensible Markup Language) was the dominant data exchange format of the 2000s. While JSON has replaced it in most web API contexts, XML remains ubiquitous in:
- Enterprise integrations: SOAP web services, EDI, SAP, and most ERP systems still use XML heavily.
- Android resources: Layout files, strings, and manifests are all XML.
- Microsoft Office formats: DOCX, XLSX, and PPTX are ZIP archives containing XML files.
- SVG images: SVG is XML. Every icon set and vector graphic on the web is an XML document.
- RSS and Atom feeds: Still widely used for content syndication.
- Maven, Ant, and build tools: Java ecosystem build files are XML.
- Configuration files: Spring (Java), .NET config files, and many IDE settings are XML.
XML structure: the key concepts
An XML document is a tree of elements, each with an opening tag, optional attributes, content, and a closing tag. The document must have exactly one root element.
<?xml version="1.0" encoding="UTF-8"?>
<root>
<element attribute="value">text content</element>
<self-closing />
</root>
Key structural rules:
- Well-formed: Tags are properly nested and closed. Every attribute value is quoted.
- Valid: The document conforms to a schema (DTD or XSD). The formatter checks well-formedness but not schema validity.
- Case-sensitive:
<Element>and<element>are different tags.
Special XML syntax
The formatter handles four special token types beyond regular elements:
- **Processing instructions (
<?...?>): Directives to the XML processor, including the XML declaration at the top (<?xml version="1.0"?>). - Comments (
<!-- ... -->): Preserved in formatted output, stripped in minified output. - CDATA sections (
<![CDATA[...]]>): Raw text that should not be parsed as XML. Used to embed HTML or scripts in XML documents. - DTD declarations (
<!DOCTYPE...>): Treated as a single token and preserved as-is.
Attribute values and encoding
XML has five predefined character entities that must be escaped in attribute values and text content:
&→&<→<>→>"→"'→'
The formatter preserves these entities as-is — it does not decode and re-encode them, so well-formed entities in the input remain well-formed in the output.
Minification savings
XML is verbose by design — tag names repeat in opening and closing tags, attribute names are unabbreviated, and there's no shorthand for empty values. Minification removes all whitespace between tags, producing the smallest possible valid XML document. For average XML documents, minification saves 20–40%. For XML with many short element names and attributes, savings are smaller; for XML with long text nodes between tags, savings can exceed 50%.
XPath and querying minified XML
Minified XML is semantically identical to formatted XML — any XPath query, XSLT transform, or DOM parser will produce the same result from both. Minification only removes insignificant whitespace (whitespace between elements). Whitespace that is significant — like whitespace inside text nodes — is preserved.