QBoard » Artificial Intelligence & ML » AI and ML - Python » How to read a lot of excel files in python pandas?

How to read a lot of excel files in python pandas?

  • I have lots of excel files(xlsx format) and want to read and handle them.

    For example, file names are ex201901, ex201902,  201912.

    Its name is made by ex YYYYMM format.

    Anyway, to import these files in pandas as an usual case, it's easy.

      July 17, 2021 1:44 PM IST
    0
  • Given that you'll probably want to somehow work with all data frames at once afterwards, it's a smell if you even put them into separate local variables, and in general, whenever you're experiencing a "this task feels repetitive because I'm doing the same thing over and over again", that calls for introducing loops of some sort. As you're planning to use pandas, chances are that you'll be iterating soon again (now that you have your files, you're probably going to be performing some transformations on the rows of those files), in which case you'll probably be best off looking into how control flow a la loops works in Python (and indeed in pandas) in general; good tutorials are plentiful.

    In your particular case, depending on what kind of processing you are planning on doing afterwards, you'd probably benefit from having something like

    df2019 = [pd.read_excel(rf'C:\users\ex2019{str(i).zfill(2)}.xlsx') for i in range(1, 13)]
    ​

    With that, you can access the individual data frames through e.g. df2019[5]
    to get the data frame corresponding to June, or you can collapse all of them into a single data frame using
    df = pd.concat(df2019)
    if that's what suits your need.
    If you have less structure in your file names,
    glob
    can come in handy. With that, the above could become something like

    import glob
    df2019 = list(map(pd.read_excel, glob.glob(r'C:\users\ex2019*.xlsx')))
    This post was edited by Viaan Prakash at July 21, 2021 1:54 PM IST
      July 21, 2021 1:53 PM IST
    0
  • You can use OS module from python. It has a method listdir which stores all the file names in the folder. Check the code below:

    import os, re
    listDir = os.listdir(FILE_PATH)
    dfList = []
    for aFile in listDir:
        if re.search(r'ex20190[0-9]{1}.xlsx', aFile):
            tmpDf = pd.read_excel(FILE_PATH + aFile)
            dfList.append(tmpDf)
    
    outDf = pd.concat(dfList)​
      July 30, 2021 2:47 PM IST
    0
  • You can use OS module from python. It has a method listdir which stores all the file names in the folder. Check the code below:

    import os, re
    listDir = os.listdir(FILE_PATH)
    dfList = []
    for aFile in listDir:
        if re.search(r'ex20190[0-9]{1}.xlsx', aFile):
            tmpDf = pd.read_excel(FILE_PATH + aFile)
            dfList.append(tmpDf)
    
    outDf = pd.concat(dfList)​
      August 1, 2021 11:10 PM IST
    0
  • import pandas
    df = pandas.read_excel(open('your_xls_xlsx_filename','rb'), sheetname='Sheet 1')
    # or using sheet index starting 0
    df = pandas.read_excel(open('your_xls_xlsx_filename','rb'), sheetname=2)
      August 11, 2021 5:19 PM IST
    0