QBoard » Advanced Visualizations » Viz - Python » How Solve a Data Science Question Using Python's Panda Data Structure Syntax

How Solve a Data Science Question Using Python's Panda Data Structure Syntax

  • Good afternoon.

    I have this question I am trying to solve using "panda" statistical data structures and related syntax from the Python scripting language. I am already graduated from a US university and employed while currently taking the Coursera.org course of "Python for Data Science" just for professional development, which is offered online at Coursera's platform by the University of Michigan. I'm not sharing answers to anyone either as I abide by Coursera's Honor Code.

    First, I was given this panda dataframe chart concerning Olympic medals won by countries around the world:

    # Summer    Gold    Silver  Bronze  Total   # Winter    Gold.1  Silver.1    Bronze.1    Total.1 # Games Gold.2  Silver.2    Bronze.2    Combined total  ID
    
    Afghanistan 13  0   0   2   2   0   0   0   0   0   13  0   0   2   2   AFG
    Algeria 12  5   2   8   15  3   0   0   0   0   15  5   2   8   15  ALG
    Argentina   23  18  24  28  70  18  0   0   0   0   41  18  24  28  70  ARG
    Armenia 5   1   2   9   12  6   0   0   0   0   11  1   2   9   12  ARM
    Australasia 2   3   4   5   12  0   0   0   0   0   2   3   4   5   12  ANZ

     

    Second, the question asked is, "Which country has won the most gold medals in summer games?"

    Third, a hint given me as to how to answer using Python's panda syntax is this: "This function should return a single string value."

    Fourth, I tried entering this as the answer in Python's panda syntax:

    import pandas as pd
        df = pd.read_csv('olympics.csv', index_col=0, skiprows=1)
    def answer_one():
        if df.columns[:2]=='00':
            df.rename(columns={col:'Country'+col[4:]}, inplace=True)    
        df_max = df[df[max('Gold')]]
        return df_max['Country']
    answer_one() 


    Fifth, I have tried other various answers like this in Coursera's auto-grader, but it keeps giving this error message:

    Could you please help me solve that question? Any hints/suggestions/comments are welcome for that.

    Thanks, Kevin

     

      September 8, 2021 12:48 PM IST
    0
  • import pandas as pd
    def answer_one():
        df1=pd.Series.max(df['Gold'])
        df1=df[df['Gold']==df1]
        return df1.index[0]
    
    answer_one()
      September 13, 2021 11:52 PM IST
    0
  • You can use pandas' loc function to find the country name corresponding to the maximum of the "Gold" column:

    data = [('Afghanistan', 13),
            ('Algeria', 12), 
            ('Argentina', 23)]
    
    df = pd.DataFrame(data, columns=['Country', 'Gold'])
    
    df['Country'].loc[df['Gold'] == df['Gold'].max()]​


    The last line returns Argentina as answer.

    Edit 1: I just noticed you import the .csv file using pd.read_csv('olympics.csv', index_col=0, skiprows=1). If you leave out the skiprows argument you will get a dataframe where the first line in the .csv file correspond to column names in the dataframe. This makes handling of your dataframe much easier in pandas and is encouraged. Second, I see that using the index_col=0 argument you use the country names as indices in the dataframe. In this case you should choose to use index over the loc function as follows:

    df.index[df['Gold'] == df['Gold'].max()][0]
    ​
      September 14, 2021 1:38 PM IST
    0
  • Function argmax() returns the index of the maximum element in the data frame.''

    return df['Gold'].argmax()
    
      November 20, 2021 12:16 PM IST
    0