I pull data from news sites to mongodb with python every 10 minutes. sometimes the same data is recording. because there is no control about same data. if there is same incoming data, don't save to mongodb.
import feedparser
import datetime
import threading
import pymongo
from pymongo import MongoClient
client = pymongo.MongoClient("mongodb+srv://xxx:[email protected]/rss_feed?retryWrites=true&w=majority")
db = client["rss_feed"]
collection=db["rss_collection"]
def mynet():
NewsFeedMynet = feedparser.parse("http://www.mynet.com/haber/rss/sondakika")
entry = NewsFeedMynet.entries[1]
post_mynet={"baslik":entry.title,"kisa_bilgi":entry.summary,"link":entry.link,"zaman":entry.published,"saglayici":"Mynet"}
collection.insert_one(post_mynet)
There are two ways you can go about this:
One is upsert, which is if the record is there and has the same title, you update the body. If the record isn't there, you insert.
post_mynet = {"baslik":entry.title,"kisa_bilgi":entry.summary,"link":entry.link,"zaman":entry.published,"saglayici":"Mynet"}
# first parameter is "what to match", aka, query
# second parameter is the record
# third is the flag to upsert
collection.update_one({"baslik": entry.title}, post_mynet, upsert=True)
The other is to check if the record is there, and if it is, don't update:
post_mynet = {"baslik":entry.title,"kisa_bilgi":entry.summary,"link":entry.link,"zaman":entry.published,"saglayici":"Mynet"}
if not collection.find_one({"baslik": entry.title}):
collection.insert_one(post_mynet)