QBoard » Artificial Intelligence & ML » AI and ML - Python » How to read a CSV file partially into pandas dataframe?

How to read a CSV file partially into pandas dataframe?

  • I have a large csv file but want to read only few rows of data to avoid memory issues. How can we do it using pandas?
      January 8, 2021 3:54 PM IST
    0
  • Iterating through pandas dataFrame objects is generally slow. Iteration beats the whole purpose of using DataFrame. It is an anti-pattern and is something you should only do when you have exhausted every other option. It is better look for a List Comprehensions , vectorized solution or DataFrame.apply() method for iterate through DataFrame.

    List comprehensions example

    result = [(x, y,z) for x, y,z in zip(df['Name'], df['Promoted'],df['Grade'])]
      April 21, 2021 11:03 AM IST
    0
  • DataFrame.readcsv() nrows argument useful which simply defines the number of rows you want to import. Thereby you don't get an iterator but rather can just import a part of the whole file of size nrows. It works with skiprows too.

    df = pd.read_csv('matrix.txt',sep=',', header = None, skiprows= 1000, nrows=1000)
    This post was edited by rick slator at May 10, 2021 11:20 AM IST
      May 10, 2021 11:17 AM IST
    0
  • Pandas provides an argument to mention how many rows to read at once.
    import pandas as pd
    data = pd.read_csv(<filepath>, chunksize=100)
    
    for chunk in data:
        df = chunk
        break​

    In the above code, pandas reads only 100 rows for each iteration of the for loop. So break the loop after one iteration to get first 100 rows.
      January 8, 2021 4:02 PM IST
    0