juretta.com

Ruby: Net::Http and open-uri

August 13, 2006
Tags: Ruby

Ruby has different libraries that provide higher-level access to network protocols such as FTP, HTTP or HTTPS. This article shows how to use the net::http, net::https, open-uri and the rio library.

open-uri

open-uri is part of the ruby standard library. It enhances the Kernel.open method and is a wrapper for the net::http, net::https and net::ftp packages.

open.rb:
require 'open-uri'
require 'pp'

open('http://www.juretta.com/') do |f|
  # hash with meta information
  pp  f.meta

  #
  pp "Content-Type: " + f.content_type
  pp "last modified" + f.last_modified.to_s

  no = 1
  # print the first three lines
  f.each do |line|
    print "#{no}: #{line}"
    no += 1
    break if no > 4
  end
end

Running this code results in:

powerbook:~ sts$ ruby open.rb
{"last-modified"=>"Sun, 13 Aug 2006 17:46:36 GMT",
 "x-cache"=>"MISS from www.juretta.com",
 "date"=>"Mon, 14 Aug 2006 05:32:54 GMT",
 "etag"=>"1864126947",
 "content-type"=>"text/html",
 "server"=>"lighttpd/1.3.13",
 "content-length"=>"33242",
 "accept-ranges"=>"bytes"}
"Content-Type: text/html"
"last modifiedSun Aug 13 19:46:36 CEST 2006"
1: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
2:         "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

HTTPS with basic authentication using Net::HTTPS

The following example uses the net::https library to access the del.icio.us API which uses SSL and Basic Authentication.

require 'net/https'
require "rexml/document"

username = "" # your del.icio.us username
password = "" # your del.icio.us password

resp = href = "";
begin
  http = Net::HTTP.new("api.del.icio.us", 443)
  http.use_ssl = true
  http.start do |http|
    req = Net::HTTP::Get.new("/v1/tags/get", {"User-Agent" =>
        "juretta.com RubyLicious 0.2"})
    req.basic_auth(username, password)
    response = http.request(req)
    resp = response.body
  end
  #  XML Document
  doc = REXML::Document.new(resp)
  # iterate over each element <tag count="200" tag="Rails"/>
  doc.root.elements.each do |elem|
    print elem.attributes['tag']  + " -> " \
       + elem.attributes['count'] + "\n"
  end

rescue SocketError
  raise "Host " + host + " nicht erreichbar"
rescue REXML::ParseException => e
  print "error parsing XML " + e.to_s
end

Net::HTTP with Hpricot

The following example shows the usage of Net::HTTP. Hpricot is used to parse the html and return selected elements (Although it is recommended to use open-uri instead).

Hpricot is a nice, loose HTML parser for Ruby, written in C.
require 'net/http'
require 'uri'
require 'rubygems'
# use 'gem install hpricot --source code.whytheluckystiff.net'
# to install hpricot
require 'hpricot'

require 'pp'

# Use Net::HTTP to fetch some html
html = Net::HTTP.get(URI.parse('http://www.juretta.com/log/'))

# use hpricot
doc = Hpricot(html)

# get all entries
doc.search("//div[@class='entry']/h3/a").each do |a|
  print a.inner_html + "\n  -> " + a.attributes['href'] + "\n\n"
end

rio

rio is yet another convenience class wrapping library. It uses open-uri to access network streams and allows easy handling of all kinds of different input and output streams.

# (sudo) gem install rio
require 'rubygems'
require 'rio'
# open an URI and copy the content into a file
rio('http://www.juretta.com/') > rio('juretta_index.html')

You may want to take a look at curl or wget.


blog comments powered by Disqus

About

juretta.com is the personal workspace of Stefan Saasen. More about this site can be found in the „About“ section.

Share!

Latest links  RSS  

More...