DOMDocument allows manipulating HTML/XML documents in PHP. This cheat sheet aims to be the most comprehensive reference possible for working with DOMDocument.
Capabilities Covered
Loading Documents
Initialize DOMDocument and load markup:
From string:
$dom = new DOMDocument;
$dom->loadHTML('<html><body/></html>');
From file:
$dom->loadHTMLFile('page.html');
From URL:
$dom->load('<http://example.com>');
Helper function:
function loadHTML(string $html) : DOMDocument {
$dom = new DOMDocument;
$dom->loadHTML($html);
return $dom;
}
Load XML:
$dom->loadXML($xml);
Force UTF-8:
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
Suppress errors:
libxml_use_internal_errors(true);
Selecting Nodes
Query DOM nodes using CSS selectors or XPath expressions:
CSS Selector:
$selector = new DOMXPath($dom);
$headings = $selector->query('//h2');
XPath:
$paras = $selector->query('//p');
Get element by ID:
$el = $dom->getElementById('header');
By tag name:
$items = $dom->getElementsByTagName('li');
Document node:
$doc = $selector->document;
Document root:
$root = $selector->document->documentElement;
Looping Elements
Iterate through node lists:
foreach($headings as $heading) {
// ...
}
Indexed loop:
for($i = 0; $i < $items->length; $i++) {
$item = $items->item($i);
}
While loop:
while($node = $nodelist->item(++$i)) {
// ...
}
Convert to array:
$itemsArray = iterator_to_array($items);
Creating Elements
Generate new DOM nodes:
Create element:
$para = $dom->createElement('p');
Create text node:
$text = $dom->createTextNode('Hello World');
From HTML:
$frag = $dom->createDocumentFragment();
$frag->appendHTML('<b>Hello</b>');
Helper function:
function createTag(DOMDocument $dom, string $name) : DOMElement {
return $dom->createElement($name);
}
Inserting Elements
Insert nodes into the document:
Append child:
$el->appendChild($new);
Prepend child:
$el->insertBefore($new, $el->firstChild);
Insert after:
$el->parentNode->insertBefore($new, $el->nextSibling);
Insert before:
$el->parentNode->insertBefore($new, $el);
Append HTML:
$el->appendHTML('<span>Text</span>');
Insert adjacent HTML:
$el->insertAdjacentHTML('afterend', '<span>Text</span>');
Removing Elements
Detach nodes from the document:
Remove child:
$el->removeChild($child);
Remove node:
$el->parentNode->removeChild($el);
Replace node:
$el->parentNode->replaceChild($new, $el);
Clear children:
$el->innerHTML = '';
Modifying Elements
Edit nodes and their content:
Get attribute:
echo $el->getAttribute('class');
Set attribute:
$el->setAttribute('class', 'bold');
Set custom attribute:
$el->setAttributeNS('<http://ns.example.com>', 'attr', 'value');
Remove attribute:
$el->removeAttribute('class');
Get text value:
echo $el->textContent;
Set text value:
$el->textContent = 'New text';
Get HTML:
echo $el->innerHTML;
Set HTML:
$el->innerHTML = 'New <strong>HTML</strong>';
Get outer HTML:
echo $el->C14N(); // canonical XML
Namespaces
Work with XML namespaces:
Register namespace:
$dom->registerNodeNamespace('ns', '<http://example.com/ns>');
Create namespaced node:
$node = $dom->createElementNS('<http://example.com/ns>', 'ns:element');
Get namespaced elements:
$elements = $dom->getElementsByTagNameNS('<http://example.com/ns>', 'element');
DOM Events
Attach event listeners to nodes:
$el->addEventListener('click', function() {
echo 'Clicked';
});
Create event:
$event = new DOMEvent('click');
Dispatch event:
$el->dispatchEvent($event);
Cloning Nodes
Import node:
$imported = $dom->importNode($el); // Shallow clone
$imported = $dom->importNode($el, true); // Deep clone
Clone node:
$cloned = $el->cloneNode(); // Shallow
$cloned = $el->cloneNode(true); // Deep
Outputting HTML
Render and save DOM documents:
Get full HTML:
$html = $dom->saveHTML();
Get outer HTML:
$html = $element->C14N(); // Canonical XML
Save to file:
$dom->saveHTMLFile('page.html');
Send in response:
// Headers
echo $dom->saveHTML();
Output text:
echo $dom->textContent;
Pretty print XML:
$dom->formatOutput = true;
echo $dom->saveXML();
Validation
Validate against DTD/XSD schema:
$dom->validate(); // Throws on error
libxml_clear_errors();
$dom->schemaValidate('schema.xsd');
if(libxml_get_errors()) {
// Validation error(s)
}
Disable validation:
libxml_disable_entity_loader(true);
Optimization
Improve performance of DOM scraping:
Cache XPath queries:
$xpath = new DOMXPath($dom);
// Reusable query
$query = '//div/p';
for($i = 0; $i < $loopCount; $i++) {
$results = $xpath->query($query);
// ...
}
Avoid stale lists:
while($node = $nodelist->item(++$i)) {
// Modify child
}
// Not stale
for($i = 0; $i < $nodelist->length; $i++) {
$node = $nodelist->item($i);
}
Real World Use Cases
Example applications:
This covers the full range of capabilities and best practices for DOM manipulation in PHP. With this handy reference, you can traverse, edit, and scrape documents with ease!