XPath is a powerful language that's used for navigating through and selecting nodes in an XML document. In the context of web scraping, it's often used with HTML documents to select elements based on their text content. There are two primary ways you might want to select text in XPath: using the contains function or an exact match.
1. Using the contains Function
The
Example Code:
import requests
from lxml import etree
# Fetching the HTML content
html_content = requests.get('<https://example.com>').content
# Parsing the HTML content with lxml
dom = etree.HTML(html_content)
# XPath query using contains
elements_with_text = dom.xpath('//*[contains(text(), "example")]')
for element in elements_with_text:
print(element.text)
In this example, replace
2. Using Exact Match
The exact match is used when you want to select elements that contain exactly and only the specified text.
Example Code:
import requests
from lxml import etree
# Fetching the HTML content
html_content = requests.get('<https://example.com>').content
# Parsing the HTML content with lxml
dom = etree.HTML(html_content)
# XPath query using exact match
exact_elements = dom.xpath('//*[text() = "Example Domain"]')
for element in exact_elements:
print(element.text)
In this example, replace