On this page
class HTML::Selector
Selects HTML elements using CSS 2 selectors.
The Selector
class uses CSS selector expressions to match and select HTML elements.
For example:
selector = HTML::Selector.new "form.login[action=/login]"
creates a new selector that matches any form
element with the class login
and an attribute action
with the value /login
.
Matching Elements
Use the match method to determine if an element matches the selector.
For simple selectors, the method returns an array with that element, or nil
if the element does not match. For complex selectors (see below) the method returns an array with all matched elements, of nil
if no match found.
For example:
if selector.match(element)
puts "Element is a login form"
end
Selecting Elements
Use the select method to select all matching elements starting with one element and going through all children in depth-first order.
This method returns an array of all matching elements, an empty array if no match is found
For example:
selector = HTML::Selector.new "input[type=text]"
matches = selector.select(element)
matches.each do |match|
puts "Found text field with name #{match.attributes['name']}"
end
Expressions
Selectors can match elements using any of the following criteria:
name
– Match an element based on its name (tag name). For example,p
to match a paragraph. You can use*
to match any element.#id
– Match an element based on its identifier (theid
attribute). For example,#page
..class
– Match an element based on its class name, all class names if more than one specified.[attr]
– Match an element that has the specified attribute.[attr=value]
– Match an element that has the specified attribute and value. (More operators are supported see below):pseudo-class
– Match an element based on a pseudo class, such as:nth-child
and:empty
.:not(expr)
– Match an element that does not match the negation expression.
When using a combination of the above, the element name comes first followed by identifier, class names, attributes, pseudo classes and negation in any order. Do not separate these parts with spaces! Space separation is used for descendant selectors.
For example:
selector = HTML::Selector.new "form.login[action=/login]"
The matched element must be of type form
and have the class login
. It may have other classes, but the class login
is required to match. It must also have an attribute called action
with the value /login
.
This selector will match the following element:
<form class="login form" method="post" action="/login">
but will not match the element:
<form method="post" action="/logout">
Attribute Values
Several operators are supported for matching attributes:
name
– The element must have an attribute with that name.name=value
– The element must have an attribute with that name and value.name^=value
– The attribute value must start with the specified value.name$=value
– The attribute value must end with the specified value.name*=value
– The attribute value must contain the specified value.name~=word
– The attribute value must contain the specified word (space separated).name|=word
– The attribute value must start with specified word.
For example, the following two selectors match the same element:
#my_id
[id=my_id]
and so do the following two selectors:
.my_class
[class~=my_class]
Alternatives, siblings, children
Complex selectors use a combination of expressions to match elements:
expr1 expr2
– Match any element against the second expression if it has some parent element that matches the first expression.expr1 > expr2
– Match any element against the second expression if it is the child of an element that matches the first expression.expr1 + expr2
– Match any element against the second expression if it immediately follows an element that matches the first expression.expr1 ~ expr2
– Match any element against the second expression that comes after an element that matches the first expression.expr1, expr2
– Match any element against the first expression, or against the second expression.
Since children and sibling selectors may match more than one element given the first element, the match method may return more than one match.
Pseudo classes
Pseudo classes were introduced in CSS 3. They are most often used to select elements in a given position:
:root
– Match the element only if it is the root element (no parent element).:empty
– Match the element only if it has no child elements, and no text content.:content(string)
– Match the element only if it hasstring
as its text content (ignoring leading and trailing whitespace).:only-child
– Match the element if it is the only child (element) of its parent element.:only-of-type
– Match the element if it is the only child (element) of its parent element and its type.:first-child
– Match the element if it is the first child (element) of its parent element.:first-of-type
– Match the element if it is the first child (element) of its parent element of its type.:last-child
– Match the element if it is the last child (element) of its parent element.:last-of-type
– Match the element if it is the last child (element) of its parent element of its type.:nth-child(b)
– Match the element if it is the b-th child (element) of its parent element. The valueb
specifies its index, starting with 1.:nth-child(an+b)
– Match the element if it is the b-th child (element) in each group ofa
child elements of its parent element.:nth-child(-an+b)
– Match the element if it is the first child (element) in each group ofa
child elements, up to the firstb
child elements of its parent element.:nth-child(odd)
– Match element in the odd position (i.e. first, third). Same as:nth-child(2n+1)
.:nth-child(even)
– Match element in the even position (i.e. second, fourth). Same as:nth-child(2n+2)
.:nth-of-type(..)
– As above, but only counts elements of its type.:nth-last-child(..)
– As above, but counts from the last child.:nth-last-of-type(..)
– As above, but counts from the last child and only elements of its type.:not(selector)
– Match the element only if the element does not match the simple selector.
As you can see, :nth-child
pseudo class and its variant can get quite tricky and the CSS specification doesn't do a much better job explaining it. But after reading the examples and trying a few combinations, it's easy to figure out.
For example:
table tr:nth-child(odd)
Selects every second row in the table starting with the first one.
div p:nth-child(4)
Selects the fourth paragraph in the div
, but not if the div
contains other elements, since those are also counted.
div p:nth-of-type(4)
Selects the fourth paragraph in the div
, counting only paragraphs, and ignoring all other elements.
div p:nth-of-type(-n+4)
Selects the first four paragraphs, ignoring all others.
And you can always select an element that matches one set of rules but not another using :not
. For example:
p:not(.post)
Matches all paragraphs that do not have the class .post
.
Substitution Values
You can use substitution with identifiers, class names and element values. A substitution takes the form of a question mark (?
) and uses the next value in the argument list following the CSS expression.
The substitution value may be a string or a regular expression. All other values are converted to strings.
For example:
selector = HTML::Selector.new "#?", /^\d+$/
matches any element whose identifier consists of one or more digits.
Public Class Methods
Creates a new selector for the given class name.
# File actionview/lib/action_view/vendor/html-scanner/html/selector.rb, line 216
def for_class(cls)
self.new([".?", cls])
end
Creates a new selector for the given id.
# File actionview/lib/action_view/vendor/html-scanner/html/selector.rb, line 225
def for_id(id)
self.new(["#?", id])
end
Creates a new selector from a CSS 2 selector expression.
The first argument is the selector expression. All other arguments are used for value substitution.
Throws InvalidSelectorError is the selector expression is invalid.
# File actionview/lib/action_view/vendor/html-scanner/html/selector.rb, line 241
def initialize(selector, *values)
raise ArgumentError, "CSS expression cannot be empty" if selector.empty?
@source = ""
values = values[0] if values.size == 1 && values[0].is_a?(Array)
# We need a copy to determine if we failed to parse, and also
# preserve the original pass by-ref statement.
statement = selector.strip.dup
# Create a simple selector, along with negation.
simple_selector(statement, values).each { |name, value| instance_variable_set("@#{name}", value) }
@alternates = []
@depends = nil
# Alternative selector.
if statement.sub!(/^\s*,\s*/, "")
second = Selector.new(statement, values)
@alternates << second
# If there are alternate selectors, we group them in the top selector.
if alternates = second.instance_variable_get(:@alternates)
second.instance_variable_set(:@alternates, [])
@alternates.concat alternates
end
@source << " , " << second.to_s
# Sibling selector: create a dependency into second selector that will
# match element immediately following this one.
elsif statement.sub!(/^\s*\+\s*/, "")
second = next_selector(statement, values)
@depends = lambda do |element, first|
if element = next_element(element)
second.match(element, first)
end
end
@source << " + " << second.to_s
# Adjacent selector: create a dependency into second selector that will
# match all elements following this one.
elsif statement.sub!(/^\s*~\s*/, "")
second = next_selector(statement, values)
@depends = lambda do |element, first|
matches = []
while element = next_element(element)
if subset = second.match(element, first)
if first && !subset.empty?
matches << subset.first
break
else
matches.concat subset
end
end
end
matches.empty? ? nil : matches
end
@source << " ~ " << second.to_s
# Child selector: create a dependency into second selector that will
# match a child element of this one.
elsif statement.sub!(/^\s*>\s*/, "")
second = next_selector(statement, values)
@depends = lambda do |element, first|
matches = []
element.children.each do |child|
if child.tag? && subset = second.match(child, first)
if first && !subset.empty?
matches << subset.first
break
else
matches.concat subset
end
end
end
matches.empty? ? nil : matches
end
@source << " > " << second.to_s
# Descendant selector: create a dependency into second selector that
# will match all descendant elements of this one. Note,
elsif statement =~ /^\s+\S+/ && statement != selector
second = next_selector(statement, values)
@depends = lambda do |element, first|
matches = []
stack = element.children.reverse
while node = stack.pop
next unless node.tag?
if subset = second.match(node, first)
if first && !subset.empty?
matches << subset.first
break
else
matches.concat subset
end
elsif children = node.children
stack.concat children.reverse
end
end
matches.empty? ? nil : matches
end
@source << " " << second.to_s
else
# The last selector is where we check that we parsed
# all the parts.
unless statement.empty? || statement.strip.empty?
raise ArgumentError, "Invalid selector: #{statement}"
end
end
end
Public Instance Methods
Matches an element against the selector.
For a simple selector this method returns an array with the element if the element matches, nil otherwise.
For a complex selector (sibling and descendant) this method returns an array with all matching elements, nil if no match is found.
Use +first_only=true+ if you are only interested in the first element.
For example:
if selector.match(element)
puts "Element is a login form"
end
# File actionview/lib/action_view/vendor/html-scanner/html/selector.rb, line 365
def match(element, first_only = false)
# Match element if no element name or element name same as element name
if matched = (!@tag_name || @tag_name == element.name)
# No match if one of the attribute matches failed
for attr in @attributes
if element.attributes[attr[0]] !~ attr[1]
matched = false
break
end
end
end
# Pseudo class matches (nth-child, empty, etc).
if matched
for pseudo in @pseudo
unless pseudo.call(element)
matched = false
break
end
end
end
# Negation. Same rules as above, but we fail if a match is made.
if matched && @negation
for negation in @negation
if negation[:tag_name] == element.name
matched = false
else
for attr in negation[:attributes]
if element.attributes[attr[0]] =~ attr[1]
matched = false
break
end
end
end
if matched
for pseudo in negation[:pseudo]
if pseudo.call(element)
matched = false
break
end
end
end
break unless matched
end
end
# If element matched but depends on another element (child,
# sibling, etc), apply the dependent matches instead.
if matched && @depends
matches = @depends.call(element, first_only)
else
matches = matched ? [element] : nil
end
# If this selector is part of the group, try all the alternative
# selectors (unless first_only).
if !first_only || !matches
@alternates.each do |alternate|
break if matches && first_only
if subset = alternate.match(element, first_only)
if matches
matches.concat subset
else
matches = subset
end
end
end
end
matches
end
Returns the next element after this one. Skips sibling text nodes.
With the name
argument, returns the next element with that name, skipping other sibling elements.
# File actionview/lib/action_view/vendor/html-scanner/html/selector.rb, line 495
def next_element(element, name = nil)
if siblings = element.parent.children
found = false
siblings.each do |node|
if node.equal?(element)
found = true
elsif found && node.tag?
return node if (name.nil? || node.name == name)
end
end
end
nil
end
Selects and returns an array with all matching elements, beginning with one node and traversing through all children depth-first. Returns an empty array if no match is found.
The root node may be any element in the document, or the document itself.
For example:
selector = HTML::Selector.new "input[type=text]"
matches = selector.select(element)
matches.each do |match|
puts "Found text field with name #{match.attributes['name']}"
end
# File actionview/lib/action_view/vendor/html-scanner/html/selector.rb, line 455
def select(root)
matches = []
stack = [root]
while node = stack.pop
if node.tag? && subset = match(node, false)
subset.each do |match|
matches << match unless matches.any? { |item| item.equal?(match) }
end
elsif children = node.children
stack.concat children.reverse
end
end
matches
end
Similar to select but returns the first matching element. Returns nil
if no element matches the selector.
# File actionview/lib/action_view/vendor/html-scanner/html/selector.rb, line 473
def select_first(root)
stack = [root]
while node = stack.pop
if node.tag? && subset = match(node, true)
return subset.first if !subset.empty?
elsif children = node.children
stack.concat children.reverse
end
end
nil
end
Protected Instance Methods
Create a regular expression to match an attribute value based on the equality operator (=, ^=, |=, etc).
# File actionview/lib/action_view/vendor/html-scanner/html/selector.rb, line 689
def attribute_match(equality, value)
regexp = value.is_a?(Regexp) ? value : Regexp.escape(value.to_s)
case equality
when "=" then
# Match the attribute value in full
Regexp.new("^#{regexp}$")
when "~=" then
# Match a space-separated word within the attribute value
Regexp.new("(^|\s)#{regexp}($|\s)")
when "^="
# Match the beginning of the attribute value
Regexp.new("^#{regexp}")
when "$="
# Match the end of the attribute value
Regexp.new("#{regexp}$")
when "*="
# Match substring of the attribute value
regexp.is_a?(Regexp) ? regexp : Regexp.new(regexp)
when "|=" then
# Match the first space-separated item of the attribute value
Regexp.new("^#{regexp}($|\s)")
else
raise InvalidSelectorError, "Invalid operation/value" unless value.empty?
# Match all attributes values (existence check)
//
end
end
Called to create a dependent selector (sibling, descendant, etc). Passes the remainder of the statement that will be reduced to zero eventually, and array of substitution values.
This method is called from four places, so it helps to put it here for reuse. The only logic deals with the need to detect comma separators (alternate) and apply them to the selector group of the top selector.
# File actionview/lib/action_view/vendor/html-scanner/html/selector.rb, line 802
def next_selector(statement, values)
second = Selector.new(statement, values)
# If there are alternate selectors, we group them in the top selector.
if alternates = second.instance_variable_get(:@alternates)
second.instance_variable_set(:@alternates, [])
@alternates.concat alternates
end
second
end
Returns a lambda that can match an element against the nth-child pseudo class, given the following arguments:
a
– Value of a part.b
– Value of b part.of_type
– True to test only elements of this type (of-type).reverse
– True to count in reverse order (last-).
# File actionview/lib/action_view/vendor/html-scanner/html/selector.rb, line 724
def nth_child(a, b, of_type, reverse)
# a = 0 means select at index b, if b = 0 nothing selected
return lambda { |element| false } if a == 0 && b == 0
# a < 0 and b < 0 will never match against an index
return lambda { |element| false } if a < 0 && b < 0
b = a + b + 1 if b < 0 # b < 0 just picks last element from each group
b -= 1 unless b == 0 # b == 0 is same as b == 1, otherwise zero based
lambda do |element|
# Element must be inside parent element.
return false unless element.parent && element.parent.tag?
index = 0
# Get siblings, reverse if counting from last.
siblings = element.parent.children
siblings = siblings.reverse if reverse
# Match element name if of-type, otherwise ignore name.
name = of_type ? element.name : nil
found = false
for child in siblings
# Skip text nodes/comments.
if child.tag? && (name == nil || child.name == name)
if a == 0
# Shortcut when a == 0 no need to go past count
if index == b
found = child.equal?(element)
break
end
elsif a < 0
# Only look for first b elements
break if index > b
if child.equal?(element)
found = (index % a) == 0
break
end
else
# Otherwise, break if child found and count == an+b
if child.equal?(element)
found = (index % a) == b
break
end
end
index += 1
end
end
found
end
end
Creates a only child lambda. Pass of-type
to only look at elements of its type.
# File actionview/lib/action_view/vendor/html-scanner/html/selector.rb, line 774
def only_child(of_type)
lambda do |element|
# Element must be inside parent element.
return false unless element.parent && element.parent.tag?
name = of_type ? element.name : nil
other = false
for child in element.parent.children
# Skip text nodes/comments.
if child.tag? && (name == nil || child.name == name)
unless child.equal?(element)
other = true
break
end
end
end
!other
end
end
Creates a simple selector given the statement and array of substitution values.
Returns a hash with the values tag_name
, attributes
, pseudo
(classes) and negation
.
Called the first time with can_negate
true to allow negation. Called a second time with false since negation cannot be negated.
# File actionview/lib/action_view/vendor/html-scanner/html/selector.rb, line 522
def simple_selector(statement, values, can_negate = true)
tag_name = nil
attributes = []
pseudo = []
negation = []
# Element name. (Note that in negation, this can come at
# any order, but for simplicity we allow if only first).
statement.sub!(/^(\*|[[:alpha:]][\w\-]*)/) do |match|
match.strip!
tag_name = match.downcase unless match == "*"
@source << match
"" # Remove
end
# Get identifier, class, attribute name, pseudo or negation.
while true
# Element identifier.
next if statement.sub!(/^#(\?|[\w\-]+)/) do
id = $1
if id == "?"
id = values.shift
end
@source << "##{id}"
id = Regexp.new("^#{Regexp.escape(id.to_s)}$") unless id.is_a?(Regexp)
attributes << ["id", id]
"" # Remove
end
# Class name.
next if statement.sub!(/^\.([\w\-]+)/) do
class_name = $1
@source << ".#{class_name}"
class_name = Regexp.new("(^|\s)#{Regexp.escape(class_name)}($|\s)") unless class_name.is_a?(Regexp)
attributes << ["class", class_name]
"" # Remove
end
# Attribute value.
next if statement.sub!(/^\[\s*([[:alpha:]][\w\-:]*)\s*((?:[~|^$*])?=)?\s*('[^']*'|"[^*]"|[^\]]*)\s*\]/) do
name, equality, value = $1, $2, $3
if value == "?"
value = values.shift
else
# Handle single and double quotes.
value.strip!
if (value[0] == ?" || value[0] == ?') && value[0] == value[-1]
value = value[1..-2]
end
end
@source << "[#{name}#{equality}'#{value}']"
attributes << [name.downcase.strip, attribute_match(equality, value)]
"" # Remove
end
# Root element only.
next if statement.sub!(/^:root/) do
pseudo << lambda do |element|
element.parent.nil? || !element.parent.tag?
end
@source << ":root"
"" # Remove
end
# Nth-child including last and of-type.
next if statement.sub!(/^:nth-(last-)?(child|of-type)\((odd|even|(\d+|\?)|(-?\d*|\?)?n([+\-]\d+|\?)?)\)/) do |match|
reverse = $1 == "last-"
of_type = $2 == "of-type"
@source << ":nth-#{$1}#{$2}("
case $3
when "odd"
pseudo << nth_child(2, 1, of_type, reverse)
@source << "odd)"
when "even"
pseudo << nth_child(2, 2, of_type, reverse)
@source << "even)"
when /^(\d+|\?)$/ # b only
b = ($1 == "?" ? values.shift : $1).to_i
pseudo << nth_child(0, b, of_type, reverse)
@source << "#{b})"
when /^(-?\d*|\?)?n([+\-]\d+|\?)?$/
a = ($1 == "?" ? values.shift :
$1 == "" ? 1 : $1 == "-" ? -1 : $1).to_i
b = ($2 == "?" ? values.shift : $2).to_i
pseudo << nth_child(a, b, of_type, reverse)
@source << (b >= 0 ? "#{a}n+#{b})" : "#{a}n#{b})")
else
raise ArgumentError, "Invalid nth-child #{match}"
end
"" # Remove
end
# First/last child (of type).
next if statement.sub!(/^:(first|last)-(child|of-type)/) do
reverse = $1 == "last"
of_type = $2 == "of-type"
pseudo << nth_child(0, 1, of_type, reverse)
@source << ":#{$1}-#{$2}"
"" # Remove
end
# Only child (of type).
next if statement.sub!(/^:only-(child|of-type)/) do
of_type = $1 == "of-type"
pseudo << only_child(of_type)
@source << ":only-#{$1}"
"" # Remove
end
# Empty: no child elements or meaningful content (whitespaces
# are ignored).
next if statement.sub!(/^:empty/) do
pseudo << lambda do |element|
empty = true
for child in element.children
if child.tag? || !child.content.strip.empty?
empty = false
break
end
end
empty
end
@source << ":empty"
"" # Remove
end
# Content: match the text content of the element, stripping
# leading and trailing spaces.
next if statement.sub!(/^:content\(\s*(\?|'[^']*'|"[^"]*"|[^)]*)\s*\)/) do
content = $1
if content == "?"
content = values.shift
elsif (content[0] == ?" || content[0] == ?') && content[0] == content[-1]
content = content[1..-2]
end
@source << ":content('#{content}')"
content = Regexp.new("^#{Regexp.escape(content.to_s)}$") unless content.is_a?(Regexp)
pseudo << lambda do |element|
text = ""
for child in element.children
unless child.tag?
text << child.content
end
end
text.strip =~ content
end
"" # Remove
end
# Negation. Create another simple selector to handle it.
if statement.sub!(/^:not\(\s*/, "")
raise ArgumentError, "Double negatives are not missing feature" unless can_negate
@source << ":not("
negation << simple_selector(statement, values, false)
raise ArgumentError, "Negation not closed" unless statement.sub!(/^\s*\)/, "")
@source << ")"
next
end
# No match: moving on.
break
end
# Return hash. The keys are mapped to instance variables.
{:tag_name=>tag_name, :attributes=>attributes, :pseudo=>pseudo, :negation=>negation}
end
© 2004–2016 David Heinemeier Hansson
Licensed under the MIT License.