The Ultimate Floki Cheatsheet for Elixir

Floki makes it easy to parse and query HTML documents in Elixir. It uses CSS selectors and tree traversal for HTML manipulation.

Getting Started

Add dependency:

def deps do
  [
    {:floki, "~> 0.10.0"}
  ]
end

Parse HTML:

html = File.read!("index.html")
doc = Floki.parse_document!(html)

Find elements:

Floki.find(doc, "div.content")

Get text:

Floki.text(doc)

Selecting

By CSS selector:

Floki.find(doc, "div.main")

By tag name:

Floki.find(doc, "img")

By id:

Floki.find_by_id(doc, "header")

By attribute:

Floki.find_by_attribute(doc, "href")

Traversing

Get parent:

[parent | _] = Floki.parents(element)

Get children:

Floki.children(element)

Get siblings:

Floki.siblings(element)

Manipulation

Insert element:

Floki.insert_after(new_el, target_el)

Replace element:

Floki.replace(new_el, target_el)

Remove element:

Floki.remove(element)

Update attribute:

Floki.update_attribute(element, "src", "new.jpg")

Append html:

Floki.append(doc, "<div>New div</div>")

Parsing HTML

From string:

html = "<html>...</html>"
doc = Floki.parse_document!(html)

From file:

doc = Floki.parse_document!(File.read!("index.html"))

From URL:

doc = Floki.parse_document!(HTTPoison.get!(url).body)

Extracting Data

Extract text:

Floki.text(doc)

Find links:

Floki.find(doc, "a[href]") |> Floki.attribute("href")

Extract images:

Floki.find(doc, "img") |> Floki.attribute("src")

Advanced Usage

Parse fragments:

doc = Floki.parse_fragment(html_fragment)

Encode special chars:

Floki.raw_html(html) # escape HTML

Decode entities:

Floki.unescape_and_decode(html)

Inspect HTML tree:

IO.inspect(doc) # print HTML tree

More Examples

Find by class name:

Floki.find(doc, ".article")

Nest selectors:

Floki.find(doc, "div.content ul li a")

Traverse tree:

parent = Floki.parent(element)
children = Floki.children(element)

Manipulate HTML:

Floki.insert_after(new_div, content_div)
Floki.replace(new_img_el, old_img_el)
Floki.remove(ad_div)

Extract text, links, images:

text = Floki.text(doc)
links = Floki.find(doc, "a[href]") |> Floki.attribute("href")
imgs = Floki.find(doc, "img") |> Floki.attribute("src")

Advanced Usage

Parse fragments:

fragment = "<div>...</div>"
doc = Floki.parse_fragment(fragment)

Escape HTML:

html = "<div>10 > 5</div>"
escaped = Floki.raw_html(html)

Unescape HTML:

html = "&lt;div&gt;Hello&lt;/div&gt;"
unescaped = Floki.unescape_and_decode(html)

Inspect tree:

html
|> Floki.parse_document!
|> IO.inspect

Lazy Loading

Floki.HTMLTree.parse loads HTML lazily to avoid parsing the entire document at once:

html = File.read!("large.html")
tree = Floki.HTMLTree.parse(html)

# Elements loaded as needed
meta = Floki.find(tree, "meta")
head = Floki.find(tree, "head")

This is more efficient for large HTML documents.

Search vs Find

Floki.search searches all nodes while Floki.find only searches subtree at that element:

Floki.search(tree, "meta") # all nodes
Floki.find(tree, "head meta") # only in head

So use find when you can scope the search for better performance.

LiveView Integration

Floki can parse HTML in Phoenix LiveView on the server before sending to client:

def handle_info(%{topic: "new_html"}, socket) do
  html = ExternalApi.fetch_html()
  doc = Floki.parse_document!(html)

  # Manipulate doc

  html = Floki.serialize(doc)
  {:reply, {:ok, html}, socket}
end

HTML to CSV/JSON

Use Floki to extract data from HTML to other formats like CSV/JSON:

html
|> Floki.parse_document!
|> Floki.find("table tr")
|> CSV.encode()
|> IO.write()

html
|> Floki.parse_document!
|> Floki.find("div.post")
|> Enum.map(&post_to_map/1)
|> JSON.encode!()
|> IO.write()

Invalid HTML

Floki can handle invalid/malformed HTML by passing html_trim: false option.

Idempotent HTML

Sort attributes to normalize HTML for consistent re-parsing:

doc
|> Floki.find("div")
|> Floki.update_attributes(fn attributes ->
  Enum.sort(attributes)
end)

The Ultimate Floki Cheatsheet for Elixir

Getting Started

Selecting

Traversing

Manipulation

Parsing HTML

Extracting Data

Advanced Usage

More Examples

Advanced Usage

Lazy Loading

Search vs Find

LiveView Integration

HTML to CSV/JSON

Invalid HTML

Idempotent HTML

Browse by tags:

Browse by language:

The easiest way to do Web Scraping

The Ultimate Floki Cheatsheet for Elixir

Getting Started

Selecting

Traversing

Manipulation

Parsing HTML

Extracting Data

Advanced Usage

More Examples

Advanced Usage

Lazy Loading

Search vs Find

LiveView Integration

HTML to CSV/JSON

Invalid HTML

Idempotent HTML

The easiest way to do Web Scraping

Don't leave just yet!