QBoard » Artificial Intelligence & ML » AI and ML - Python » Get list from pandas DataFrame column headers

Get list from pandas DataFrame column headers

  • I want to get a list of the column headers from a pandas DataFrame. The DataFrame will come from user input so I won't know how many columns there will be or what they will be called.

    For example, if I'm given a DataFrame like this:

    >>> my_dataframe
        y  gdp  cap
    0   1    2    5
    1   2    3    9
    2   8    7    2
    3   3    4    7
    4   6    7    7
    5   4    8    3
    6   8    2    8
    7   9    9   10
    8   6    6    4
    9  10   10    7

    I would get a list like this:

    >>> header_list
    ['y', 'gdp', 'cap']



     

      December 3, 2020 9:43 PM IST
    0
  • It gets even simpler (by Pandas 0.16.0):

    df.columns.tolist()
    

     

    will give you the column names in a nice list.

     
      November 2, 2021 2:54 PM IST
    0
  • I feel the question deserves an additional explanation.
    As fixxxer noted, the answer depends on the Pandas version you are using in your project. Which you can get with pd.__version__ command.
    If you are for some reason like me (on Debian 8 (Jessie) I use 0.14.1) using an older version of Pandas than 0.16.0, then you need to use:
    df.keys().tolist() because there isn’t any df.columns method implemented yet.
    The advantage of this keys method is that it works even in newer version of Pandas, so it's more universal.
      November 2, 2021 7:15 PM IST
    0
  • %%timeit
    final_df.columns.values.tolist()
    948 ns ± 19.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
    
    %%timeit
    list(final_df.columns)
    14.2 µs ± 79.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
    
    %%timeit
    list(final_df.columns.values)
    1.88 µs ± 11.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
    
    %%timeit
    final_df.columns.tolist()
    12.3 µs ± 27.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
    
    %%timeit
    list(final_df.head(1).columns)
    163 µs ± 20.6 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


      December 4, 2020 9:13 PM IST
    0
  • This gives us the names of columns in a list:

    list(my_dataframe.columns)
    

    Another function called tolist() can be used too:

    my_dataframe.columns.tolist()
      December 4, 2020 9:33 PM IST
    0
  • For a quick, neat, visual check, try this:

    for col in df.columns:
        print col
      October 23, 2021 1:55 PM IST
    0