Search code examples
pythondataframenumbersdata-scienceincrement

How Do I Create a Dataframe from 1 to 100,000?


I am sure this is not hard, but I can't figure it out!

I want to create a dataframe that starts at 1 for the first row and ends at 100,000 in increments of 1, 2, 4, 5, or whatever. I could do this in my sleep in Excel, but is there a slick way to do this without importing a .csv or .txt file?

I have needed to do this in variations many times and just settled on importing a .csv, but I am tired of that.

Example in Excel


Solution

  • Generating numbers

    Generating numbers is not something special to pandas, rather numpy module or range function (as mentioned by @Grismer) can do the trick. Let's say you want to generate a series of numbers and assign these numbers to a dataframe. As I said before, there are multiple approaches two of which I personally prefer.

    • range function

    Take range(1,1000,1) as an Example. This function gets three arguments two of which are not mandatory. The first argument defines the start number, the second one defines the end number, and the last one points to the steps of this range. So the abovementioned example will result in the numbers 1 to 9999 (Note that this range is a half-open interval which is closed at the start and open at the end).

    • numpy.arange function

    To have the same results as the previous example, take numpy.arange(1,1000,1) as an example. The arguments are completely the same as the range's arguments.

    Assigning to dataframe

    Now, if you want to assign these numbers to a dataframe, you can easily do this by using the pandas module. Code below is an example of how to generate a dataframe:

    import numpy as np
    import pandas as pd
    myRange = np.arange(1,1001,1) # Could be something like myRange = range(1,1000,1)
    df = pd.DataFrame({"numbers": myRange})
    df.head(5)
    

    which results in a dataframe like(Note that just the first five rows have been shown):

    numbers
    0 1
    1 2
    2 3
    3 4
    4 5

    Difference of numpy.arange and range

    To keep this answer short, I'd rather to refer to this answer by @hpaulj