I was in the process of creating a script to extract all of the comments from a Reddit Thread as a JSON:
require "rubygems"
require "json"
require "net/http"
require "uri"
require 'open-uri'
require 'neatjson'
#The URL.
url = ("https://www.reddit.com/r/AskReddit/comments/46n0zc.json")
#Sets up the JSON reader.
result = JSON.parse(open(url).read)
children = result["data"]["children"]
#Prints the jsons.
children.each do |child|
puts "Author: " + child["data"]["author"]
puts "Body: " + child["data"]["body"]
puts "ID: " + child["data"]["id"]
puts "Upvotes: " + child["data"]["ups"].to_s
puts ""
end
And for some reason it gives me an error. However, the error is not in the actual JSON printer, but in the reader:
005----extractallredditpostcomments.rb:17:in `[]': no implicit conversion of String into Integer (TypeError)
from 005----extractallredditpostcomments.rb:17:in `<main>'
For some reason,
children = result["data"]["children"]
Isn't working, which is strange because it worked fine yesterday
What I'm wondering is: Could this be causes by the size of the JSON? If you actually go to the link (https://www.reddit.com/r/AskReddit/comments/46n0zc.json) you can see that the file is huge. I'm having so much trouble finding the tags I need due to the sheer size of the page, it took me hours and I'm still not sure I have the correct ones, that could be causing the error as well. I'm not sure what's failing here.
Oh, and one last thing: I tried simplifying the program by removing the printer:
#Sets up the JSON reader.
result = JSON.parse(open(url).read)
children = result["data"]["children"]
puts children
#Prints the jsons.
#children.each do |child|
# puts "Author: " + child["data"]["author"]
# puts "Body: " + child["data"]["body"]
# puts "ID: " + child["data"]["id"]
# puts "Upvotes: " + child["data"]["ups"].to_s
# puts ""
#end
And it still fails:
005----extractallredditpostcomments.rb:13:in `[]': no implicit conversion of String into Integer (TypeError)
from 005----extractallredditpostcomments.rb:13:in `<main>'
A quick look at the returned JSON value shows that it is a JSON array of two JSON objects and not a JSON object. It looks somewhat like this:
[
{
"data": {
"after": null,
"before": null,
"children": [
{
"data": {
"approved_by": null,
"archived": false,
...
},
"kind": "Listing"
},
{
"data": {
"after": null,
"before": null,
"children": [
{
"data": {
"approved_by": null,
"archived": false,
"author": "finkledinkle7",
"author_flair_css_class": null,
"author_flair_text": null,
"banned_by": null,
"body": "My mother was really sick in 2008. I was turning 25 with a younger brother and sister.\n\nLost both of my grandparents on mom's side to cancer a few years prior. Mom had to watch as her parents slowly passed away. It destroyed her not having her mother around as t ...
}
]
This means that the line children = result["data"]["children"]
in your program won't work because it is treating result as a JSON object. It looks like you should do children = result[1]["data"]["children"]
.