I have lots of excel files(xlsx format) and want to read and handle them.
For example, file names are ex201901, ex201902, 201912.
Its name is made by ex YYYYMM format.
Anyway, to import these files in pandas as an usual case, it's easy.
df2019 = [pd.read_excel(rf'C:\users\ex2019{str(i).zfill(2)}.xlsx') for i in range(1, 13)]
With that, you can access the individual data frames through e.g. df2019[5]
to get the data frame corresponding to June, or you can collapse all of them into a single data frame using
df = pd.concat(df2019)
if that's what suits your need.
If you have less structure in your file names,
glob
can come in handy. With that, the above could become something like
import glob
df2019 = list(map(pd.read_excel, glob.glob(r'C:\users\ex2019*.xlsx')))
This post was edited by Viaan Prakash at July 21, 2021 1:54 PM IST
import os, re
listDir = os.listdir(FILE_PATH)
dfList = []
for aFile in listDir:
if re.search(r'ex20190[0-9]{1}.xlsx', aFile):
tmpDf = pd.read_excel(FILE_PATH + aFile)
dfList.append(tmpDf)
outDf = pd.concat(dfList)
import os, re
listDir = os.listdir(FILE_PATH)
dfList = []
for aFile in listDir:
if re.search(r'ex20190[0-9]{1}.xlsx', aFile):
tmpDf = pd.read_excel(FILE_PATH + aFile)
dfList.append(tmpDf)
outDf = pd.concat(dfList)
import pandas
df = pandas.read_excel(open('your_xls_xlsx_filename','rb'), sheetname='Sheet 1')
# or using sheet index starting 0
df = pandas.read_excel(open('your_xls_xlsx_filename','rb'), sheetname=2)