Parsing with Nokogiri is a breeze, especially if you’re super familiar with CSS selectors and you’re good at traversing the DOM.
1. First, you need to install the Nokogiri gem:
gem install nokogiri
If you are parsing something that lives remotely, you might want to include the
open-uri module, which will make it even easier.
Depending on your app, remember to do the require ‘nokogiri’ and require ‘open-uri’
2. Now open the page you want to parse:
page = Nokogiri::HTML(open("http://www.upworthy.com/"))
3. Select the element you want.
There’s tons of ways of getting what you want. But the pretty basic thing is to do something like:
This will give you the whole nugget of info. If you wanted the image urls, you’d do:
And there you go. If you know how to select elements, then you can get text/data, anything you want. Supposing you selected all the links in a page:
links = page.css(“a”)
What you’ll get is a Nokogiri Node object, which acts like an “array” of links. Therefore, you can call the methods:
links.length # => 6
links.text # => Click here
links["href"] # => http://www.google.com
You can even iterate through them, as with this example:
links.each do |link| link['href'] end
Of course there’s a lot more to it than just this, but you can do a lot of parsing with just the above and just by knowing how things flow in the DOM.