Search code examples
web-scrapingruby-on-rails-4

Error running a scrape command ruby on rails


I am trying to re-setup my app on a new computer and run a scrape to build the database. When I run my first rake scraper:scrape, this is the error I am getting. I am not sure why I am getting this error any help would make my day.. cheers!

Art West@ARTWESTIV ~/desktop/duckduckjeep-master
$ rake scraper:scrape --trace
** Invoke scraper:scrape (first_time)
** Invoke environment (first_time)
** Execute environment
** Execute scraper:scrape
rake aborted!
NoMethodError: undefined method `value' for nil:NilClass
c:/Users/Art West/desktop/duckduckjeep-master/lib/tasks/scraper.rake:17:in `block (3 levels) in <top (required)>'
c:/Users/Art West/desktop/duckduckjeep-master/lib/tasks/scraper.rake:12:in `loop'
c:/Users/Art West/desktop/duckduckjeep-master/lib/tasks/scraper.rake:12:in `block (2 levels) in <top (required)>'
c:/RailsInstaller/Ruby2.1.0/lib/ruby/gems/2.1.0/gems/rake-10.4.2/lib/rake/task.rb:240:in `call'
c:/RailsInstaller/Ruby2.1.0/lib/ruby/gems/2.1.0/gems/rake-10.4.2/lib/rake/task.rb:240:in `block in execute'
c:/RailsInstaller/Ruby2.1.0/lib/ruby/gems/2.1.0/gems/rake-10.4.2/lib/rake/task.rb:235:in `each'
c:/RailsInstaller/Ruby2.1.0/lib/ruby/gems/2.1.0/gems/rake-10.4.2/lib/rake/task.rb:235:in `execute'
c:/RailsInstaller/Ruby2.1.0/lib/ruby/gems/2.1.0/gems/rake-10.4.2/lib/rake/task.rb:179:in `block in invoke_with_call_chain'
c:/RailsInstaller/Ruby2.1.0/lib/ruby/2.1.0/monitor.rb:211:in `mon_synchronize'
c:/RailsInstaller/Ruby2.1.0/lib/ruby/gems/2.1.0/gems/rake-10.4.2/lib/rake/task.rb:172:in `invoke_with_call_chain'
c:/RailsInstaller/Ruby2.1.0/lib/ruby/gems/2.1.0/gems/rake-10.4.2/lib/rake/task.rb:165:in `invoke'
c:/RailsInstaller/Ruby2.1.0/lib/ruby/gems/2.1.0/gems/rake-10.4.2/lib/rake/application.rb:150:in `invoke_task'
c:/RailsInstaller/Ruby2.1.0/lib/ruby/gems/2.1.0/gems/rake-10.4.2/lib/rake/application.rb:106:in `block (2 levels) in top_level'
c:/RailsInstaller/Ruby2.1.0/lib/ruby/gems/2.1.0/gems/rake-10.4.2/lib/rake/application.rb:106:in `each'
c:/RailsInstaller/Ruby2.1.0/lib/ruby/gems/2.1.0/gems/rake-10.4.2/lib/rake/application.rb:106:in `block in top_level'
c:/RailsInstaller/Ruby2.1.0/lib/ruby/gems/2.1.0/gems/rake-10.4.2/lib/rake/application.rb:115:in `run_with_threads'
c:/RailsInstaller/Ruby2.1.0/lib/ruby/gems/2.1.0/gems/rake-10.4.2/lib/rake/application.rb:100:in `top_level'
c:/RailsInstaller/Ruby2.1.0/lib/ruby/gems/2.1.0/gems/rake-10.4.2/lib/rake/application.rb:78:in `block in run'
c:/RailsInstaller/Ruby2.1.0/lib/ruby/gems/2.1.0/gems/rake-10.4.2/lib/rake/application.rb:176:in `standard_exception_handling'
c:/RailsInstaller/Ruby2.1.0/lib/ruby/gems/2.1.0/gems/rake-10.4.2/lib/rake/application.rb:75:in `run'
c:/RailsInstaller/Ruby2.1.0/lib/ruby/gems/2.1.0/gems/rake-10.4.2/bin/rake:33:in `<top (required)>'
c:/RailsInstaller/Ruby2.1.0/bin/rake:23:in `load'
c:/RailsInstaller/Ruby2.1.0/bin/rake:23:in `<main>'
Tasks: TOP => scraper:scrape

here is my scraper.rake

namespace :scraper do
  desc "Fetch Craigslist posts from 3Taps"
  task scrape: :environment do
    require 'open-uri'
    require 'json'
    # Set API token and URL
    auth_token = "b077632d17da8857e2fa92c053115e43"
    polling_url = "http://polling.3taps.com/poll"

  # Grab data until up-to-date
    loop do

    # Specify request parameters
    params = {
      auth_token: auth_token,
      anchor: Anchor.first.value,
      source:"CRAIG",
      category_group: "VVVV",
      category: "VAUT",
      'location.country' => "USA",
      retvals: "location,external_url,heading,body,timestamp,price,images,annotations"

    }

    # Prepare API request
    uri = URI.parse(polling_url)
    uri.query = URI.encode_www_form(params)

    # Submit request
    result = JSON.parse(open(uri).read)

    # Display results to screen
    #puts result["postings"].first["annotations"]["year"]

    Anchor.first.update(value: result["anchor"])
    puts Anchor.first.value
    break if result["postings"].empty?

    # #store results in Database
    result["postings"].each do |posting|

      #ADD HARD FILTER (IN PROGRESS....)
      if posting["annotations"]["make"] == "Jeep"

        #create new post
        @post= Post.new
        @post.heading = posting["heading"]
        @post.body = posting["body"]
        @post.price = posting["price"]
        @post.neighborhood = posting["location"]["locality"]
        @post.external_url = posting["external_url"]
        @post.timestamp = posting["timestamp"]
        @post.year = posting ["annotations"]["year"] if posting ["annotations"]["year"].present? 
        @post.make = posting ["annotations"]["make"] if posting ["annotations"]["make"].present? 
        @post.model = posting ["annotations"]["model"] if posting ["annotations"]["model"].present? 
        @post.title_status = posting ["annotations"]["title_status"] if posting ["annotations"]["title_status"].present? 
        @post.transmission = posting ["annotations"]["transmission"] if posting ["annotations"]["transmission"].present? 
        @post.mileage = posting ["annotations"]["mileage"] if posting ["annotations"]["mileage"].present? 
        @post.source_account = posting ["annotations"]["source_account"] if posting ["annotations"]["source_account"].present?
        @post.phone = posting ["annotations"]["phone"] if posting ["annotations"]["phone"].present?
        @post.lat = posting["location"]["lat"]
        @post.lng = posting["location"]["long"]
        @post.zipcode = posting["location"]["zipcode"]
        #Save Post
        @post.save

        # Loop over images and save to Image database
        posting["images"].each do |image|
          @image = Image.new
          @image.url = image["full"]
          @image.post_id = @post.id 
          @image.save
        end
      end

    end
    end
  end




desc "Destroy All Posting Data"
task destroy_all_posts: :environment do
  Post.destroy_all
end

desc "Save neighborhood codes in a reference table"
task scrape_neighborhoods: :environment do
  require 'open-uri'
  require 'json'

      # Set API token and URL
      auth_token = "b077632d17da8857e2fa92c053115e43"
      location_url = "http://reference.3taps.com/locations"

    # Specify request parameters
    params = {
      auth_token: auth_token,
      level: "locality",
      country: "USA"
    }


# Prepare API request
uri = URI.parse(location_url)
uri.query = URI.encode_www_form(params)

    # Submit request
    result = JSON.parse(open(uri).read)

    # Display results to screen
    # puts JSON.pretty_generate result
    
    # Store results in database
    result["locations"].each do |location|
      @location = Location.new
      @location.code = location["code"]
      @location.name = location["short_name"]
      @location.save
    end
  end




  desc "Discard old data"
  task discard_old_data: :environment do
    Post.all.each do |post|
      if post.created_at < 72.hours.ago
        post.destoy
      end
    end
  end
end

Solution

  • Your error message is saying it all:

    NoMethodError: undefined method `value' for nil:NilClass
    

    You are trying to call value on nil at some point. And, this is probably happening here:

    # Specify request parameters
    params = {
      auth_token: auth_token,
      anchor: Anchor.first.value, # this is the line that's creating problem
      source:"CRAIG",
      category_group: "VVVV",
      category: "VAUT",
      'location.country' => "USA",
      retvals: "location,external_url,heading,body,timestamp,price,images,annotations"
    
    }
    

    When you call this: anchor: Anchor.first.value for the very first time, your Anchor.first is nil and so you are trying to call: nil.value and that's where your rake task fails with the specified error message.

    Make sure you have populated the database table (anchors) so that when you call Anchor.first, you don't get nil.

    Another approach to avoid this issue would be to use try:

      anchor: Anchor.first.try(:value)
    

    That way, your rake task won't fail even if Anchor.first returns nil.