I wrote a script for getting some posts from reddit.
import praw
import pandas as pd
reddit = praw.Reddit(client_id='*******', \
client_secret='*******', \
user_agent='**********', \
username='********', \
password='*******8')
subreddit1 = reddit.subreddit("Tea")
subreddit2 = reddit.subreddit("Biophysics")
top_subreddit1 = subreddit1.top(limit=500)
top_subreddit2 = subreddit2.top(limit=500)
topics_dict = { "title":[],
"score":[],
"id":[], "url":[],
"comms_num": [],
"created": [],
"body":[]}
for submission1 in top_subreddit1:
topics_dict["title"].append(submission1.title)
topics_dict["score"].append(submission1.score)
topics_dict["id"].append(submission1.id)
topics_dict["url"].append(submission1.url)
topics_dict["comms_num"].append(submission1.num_comments)
topics_dict["created"].append(submission1.created)
topics_dict["body"].append(submission1.selftext)
for submission2 in top_subreddit2:
topics_dict["title"].append(submission2.title)
topics_dict["score"].append(submission2.score)
topics_dict["id"].append(submission2.id)
topics_dict["url"].append(submission2.url)
topics_dict["comms_num"].append(submission2.num_comments)
topics_dict["created"].append(submission2.created)
topics_dict["body"].append(submission2.selftext)
topics_data = pd.DataFrame(topics_dict)
topics_data
But it only displays in my jupyter. Now I want to save progress as a database file. All advice are appreciated.
You have a couple of options. I'll present two, each with their pros and cons:
1. CSV
Simply save your file to a .csv using DataFrame.to_csv:
topics_data.to_csv('path_to_file.csv')
You can then proceed to parse this csv file in your client application, i.e., whatever application is going to use your scraped data.
Pros
Cons
2. SQLITE
You can also opt to store the dataframe in sqlite using DataFrame.to_sql:
import sqlite3
db_file = 'my.db'
# This creates a new database file if it doesn't exist
db_conn = sqlite3.connect(db_file)
# This creates a new table 'topics_data' if it doesn't exist
topics_data.to_sql('topics_data', con=db_conn)
Pros
Cons
Find out more about sqlite here: sqlite tutorial