How to create a copy of an existing DataFrame(panda)?

I have just started exploring pandas. I tried applying logarithmic scaling to a Dataframe column without affecting the source Dataframe. I passed the existing DataFrame(data_source) to the DataFrame constructor thinking that it would create a copy.

data_source = pd.read_csv("abc.csv")
log_data = pd.DataFrame(data = data_source).apply(lambda x: np.log(x + 1))

I think it works properly but is it a recommended/correct way of applying scaling on a copied DataFrame ? How is it different from the 'DataFrame.copy' function?

Solution

pd.DataFrame(data = data_source) does not make a copy. This is documented in the docs for the copy argument to the constructor:

copy : boolean, default False
Copy data from inputs. Only affects DataFrame / 2d ndarray input

This is also easily observed by trying to mutate the result:

>>> x = pandas.DataFrame({'x': [1, 2, 3], 'y': [1., 2., 3.]})
>>> y = pandas.DataFrame(x)
>>> x
   x    y
0  1  1.0
1  2  2.0
2  3  3.0
>>> y
   x    y
0  1  1.0
1  2  2.0
2  3  3.0
>>> y.iloc[0, 0] = 2
>>> x
   x    y
0  2  1.0
1  2  2.0
2  3  3.0

If you want a copy, call the copy method. You don't need a copy, though. apply already returns a new dataframe, and better yet, you can call numpy.log or numpy.log1p on dataframes directly:

>>> x = pandas.DataFrame({'x': [1, 2, 3], 'y': [1., 2., 3.]})
>>> numpy.log1p(x)
          x         y
0  0.693147  0.693147
1  1.098612  1.098612
2  1.386294  1.386294