Search code examples
rubyjsonreddit

Ruby JSON extractor failing, possibly due to overly large JSON


I was in the process of creating a script to extract all of the comments from a Reddit Thread as a JSON:

 require "rubygems"
 require "json"
 require "net/http"
 require "uri"
 require 'open-uri'
 require 'neatjson'

 #The URL.
 url = ("https://www.reddit.com/r/AskReddit/comments/46n0zc.json")

 #Sets up the JSON reader.
 result = JSON.parse(open(url).read)
 children = result["data"]["children"]

 #Prints the jsons.
 children.each do |child|
   puts "Author:       " + child["data"]["author"]
   puts "Body:         " + child["data"]["body"]
   puts "ID:           " + child["data"]["id"]
   puts "Upvotes:      " + child["data"]["ups"].to_s
   puts ""
 end

And for some reason it gives me an error. However, the error is not in the actual JSON printer, but in the reader:

   005----extractallredditpostcomments.rb:17:in `[]': no implicit conversion of String into Integer (TypeError)
           from 005----extractallredditpostcomments.rb:17:in `<main>'

For some reason,

children = result["data"]["children"]

Isn't working, which is strange because it worked fine yesterday

What I'm wondering is: Could this be causes by the size of the JSON? If you actually go to the link (https://www.reddit.com/r/AskReddit/comments/46n0zc.json) you can see that the file is huge. I'm having so much trouble finding the tags I need due to the sheer size of the page, it took me hours and I'm still not sure I have the correct ones, that could be causing the error as well. I'm not sure what's failing here.

Oh, and one last thing: I tried simplifying the program by removing the printer:

 #Sets up the JSON reader.
 result = JSON.parse(open(url).read)
 children = result["data"]["children"]

 puts children

 #Prints the jsons.
 #children.each do |child|
 #  puts "Author:       " + child["data"]["author"]
 #  puts "Body:         " + child["data"]["body"]
 #  puts "ID:           " + child["data"]["id"]
 #  puts "Upvotes:      " + child["data"]["ups"].to_s
 #  puts ""
 #end

And it still fails:

005----extractallredditpostcomments.rb:13:in `[]': no implicit conversion of String into Integer (TypeError)
        from 005----extractallredditpostcomments.rb:13:in `<main>'

Solution

  • A quick look at the returned JSON value shows that it is a JSON array of two JSON objects and not a JSON object. It looks somewhat like this:

    [
        {
            "data": {
                "after": null,
                "before": null,
                "children": [
                    {
                        "data": {
                            "approved_by": null,
                            "archived": false,
          ...
          },
          "kind": "Listing"
        },
        {
            "data": {
                "after": null,
                "before": null,
                "children": [
                    {
                        "data": {
                            "approved_by": null,
                            "archived": false,
                            "author": "finkledinkle7",
                            "author_flair_css_class": null,
                            "author_flair_text": null,
                            "banned_by": null,
                            "body": "My mother was really sick in 2008.  I was turning 25 with a younger brother and sister.\n\nLost both of my grandparents on mom's side to cancer a few years prior.  Mom had to watch as her parents slowly passed away.  It destroyed her not having her mother around as t ...
       }
    ]
    

    This means that the line children = result["data"]["children"] in your program won't work because it is treating result as a JSON object. It looks like you should do children = result[1]["data"]["children"].