I'm a newbie in python and I will backup my server data everyday. I use shell script to check my backup dates, but when the Web Host become more and more using shell script need to change a lot. So, I want to use python to check my backup file.
My Env is:
OS: ubuntu 16.04
Python Version: 3.4.3
My directories and files structure are like:
/mnt/disk2/JP/TFP-1/Web/2017/11/TFP_20171105_htdocs.tar.gz
/mnt/disk2/JP/TFP-1/Config/crontab_backup_20171105.txt
/mnt/disk2/JP/TFP-1/Config/mysql_config_backup_20170724.tar.gz
/mnt/disk2/JP/SPT_1/Web/2017/11/SPT_20171105_htdocs.tar.gz
/mnt/disk2/JP/SPT_1/Config/nginx_config_backup_20171030.tar.gz
/mnt/disk2/CN/LHD-1/Web/2017/11/LHD_20171105_htdocs.tar.gz
/mnt/disk2/CN/LHD-1/Config/crontab_backup_20171105.txt
/mnt/disk2/CN/LHD-1/Config/mysql_config_backup_20170724.tar.gz
/mnt/disk2/CN/TTY_1/Web/2017/11/TTY_20171105_htdocs.tar.gz
/mnt/disk2/CN/TTY_1/Config/nginx_config_backup_20171030.tar.gz
Because of my backup file have datetime on it, so my shell script will use today's file size to minus yesterday's file size. If equal 0 that means backup file are not change, if not 0 it will send alert mail to notify me.
(But if the file size difference is not big, it does not matter. I only need to notice those file size difference of more than 1GB. So, these is why I don't use md5 or filecmp to comparison)
Now I want to make the same function program by using python, but I stuck on calculate file sizes for two different dates.
This is my code:
## Import Module
import sys
import os
import re
from datetime import datetime, timedelta
# Global Variables
jpWebList = ["/mnt/disk2/JP/TFP-1/Web", "/mnt/disk2/JP/SPT_1/Web"]
jpConfigList = ["/mnt/disk2/JP/TFP-1/Conig", "/mnt/disk2/JP/SPT_1/Config"]
## Function Program
#-- Get file name's time and calculate yesterday.
def findYtdFile(filePath):
YtdData = ""
fsize = 0
now = datetime.now()
aDay = timedelta(days=-1)
yDay = now + aDay
yDay = yDay.strftime("%Y%m%d") # formatted the byDay value into 20170820.
# print(yDay) # Check yDay's value.
# print(filePath)
if re.search(yDay, filePath) is not None:
# print(filePath)
YtdData = filePath
# print(YtdData) # Check what kinds of file we got.
fsize = os.path.getsize(YtdData)
print(YtdData, "--file size is", fsize)
return fsize
#-- Get file name's time and calculate the day before yesterday
def findDbyFile(filePath):
DbyData = ""
fsize = 0
now = datetime.now()
aDay = timedelta(days=-2)
byDay = now + aDay
byDay = byDay.strftime("%Y%m%d") # formatted the byDay value into 20170820
# print(byDay) # Check byDay's value.
if re.search(byDay, filePath) is not None:
DbyData = filePath
fsize = os.path.getsize(DbyData)
print(DbyData, "--file size is", fsize)
return fsize
#--Main, Get tar.gz and txt file list.
for tmpList in jpWebList:
for root, dirs, files in os.walk(tmpList): # recursive to get directories and files list.
for file in files:
if file.endswith((".tar.gz", ".txt")):
filePath = os.path.join(root, file)
ytdFileSize = findYtdFile(filePath)
dbyFileSize = findDbyFile(filePath)
a = ytdFileSize - dbyFileSize
print(a)
and the terminal shows:
/mnt/disk2/JP/TFP-1/Web/2017/11/TFP_backend_20171106_htdocs.tar.gz --file size is 76021633
76021633
0
/mnt/disk2/JP/TFP-1/Web/2017/11/TFP_backend_20171105_htdocs.tar.gz --file size is 76012434
-76012434
/mnt/disk2/JP/TFP-1/Web/2017/11/TFP_Test_backend_20171106_htdocs.tar.gz --file size is 62391961
62391961
0
0
0
/mnt/disk2/JP/TFP-1/Web/2017/11/TFP_Test_front_20171105_htdocs.tar.gz --file size is 82379384
-82379384
/mnt/disk2/JP/TFP-1/Web/2017/11/TFP_Test_front_20171106_htdocs.tar.gz --file size is 82379384
82379384
0
0
0
0
0
/mnt/disk2/JP/TFP-1/Web/2017/11/TFP_Test_backend_20171105_htdocs.tar.gz --file size is 62389231
-62389231
The answer suppose to like "TFP_Test_front_20171106_htdocs.tar.gz(82379384)" minus "TFP_Test_backend_20171105_htdocs.tar.gz(62389231)" equal to 19990153.
I have been try glob, re.findall, os.listdir but still don't work fine. Is there anything I did not notice? or something I can refer to? Thank you for your help!
I can't see what is wrong in your code, but I've changed your code and tried to simplify it:
import sys, os
from datetime import datetime, timedelta
jpWebList = ["/mnt/disk2/JP/TFP-1/Web", "/mnt/disk2/JP/SPT_1/Web"]
jpConfigList = ["/mnt/disk2/JP/TFP-1/Conig", "/mnt/disk2/JP/SPT_1/Config"]
def get_date(d):
aDay = timedelta(days=d)
byDay = datetime.now() + aDay
return byDay.strftime("%Y%m%d")
#main
today = get_date(0)
yesterday = get_date(-1)
for tmpList in jpWebList:
for root, dirs, files in os.walk(tmpList):
todays_files = [file for file in files if today in file and file.endswith((".tar.gz", ".txt"))]
yesterdays_files = [file for file in files if yesterday in file and file.endswith((".tar.gz", ".txt"))]
for todays_file in todays_files:
yesterdays_file = todays_file.replace(today, yesterday)
if yesterdays_file in yesterdays_files:
todays_path = os.path.join(root, todays_file)
yesterdays_path = os.path.join(root, yesterdays_file)
size_difference = os.path.getsize(todays_path) - os.path.getsize(yesterdays_path)
print(size_difference)
I can't check it completely without folders and files that you have, but I tried with 2 files, and it works fine. Let me know if it doesn't work.