Introduction to Data Science in Python problem

QBoard » Advanced Visualizations » Viz - Python » Introduction to Data Science in Python problem

User Dashboard

Introduction to Data Science in Python problem

Back To Topics

Tags : python pandas numpy data-science

Advika Banerjee

319 1

Can any one tell my what that part (town = thisLine[:thisLine.index('(')-1])exactly do?

def get_list_of_university_towns():
'''Returns a DataFrame of towns and the states they are in from the 
university_towns.txt list. The format of the DataFrame should be:
DataFrame( [ ["Michigan", "Ann Arbor"], ["Michigan", "Yipsilanti"] ], 
columns=["State", "RegionName"]  )

The following cleaning needs to be done:
1. For "State", removing characters from "[" to the end.
2. For "RegionName", when applicable, removing every character from " (" to the end.
3. Depending on how you read the data, you may need to remove newline character '\n'. '''

data = []
state = None
state_towns = []
with open('university_towns.txt') as file:
    for line in file:
        thisLine = line[:-1]
        if thisLine[-6:] == '[edit]':
            state = thisLine[:-6]
            continue
        if '(' in line:
            town = thisLine[:thisLine.index('(')-1]
            state_towns.append([state,town])
        else:
            town = thisLine
            state_towns.append([state,town])
        data.append(thisLine)
df = pd.DataFrame(state_towns,columns = ['State','RegionName'])
return df

get_list_of_university_towns()

October 1, 2021 1:32 PM IST

Vaibhav Mali

259
This line does the part of requirement 2 of the list of things cleaned up:

For example: if Line is:
```
line = "Michigan, (Ann Arbor"
```
Then your code will output Michigan,
November 27, 2021 10:35 AM IST

0
Viaan Prakash

461
It performs this step:
```
2. For "RegionName", when applicable, removing every character from " (" to the end.
```
An index of -1 means the end of an array or list.
December 27, 2021 12:14 PM IST

0

Maryam Bains

317

import re
raw_data=open('university_towns.txt','r')
data=raw_data.readlines()
raw_data.close()
subs='[edit]'
state=''
region=''
df=pd.DataFrame(columns=('State','RegionName'))

for line in data:
    line.rstrip()
    if subs in line:
        state=line.replace(subs,'')
    else:
        region=re.sub(r" \(.*",'',line)
        df=df.append({'State':state,'RegionName':region},ignore_index=True)

df=df.replace('\n','',regex=True)
df

October 26, 2021 12:45 PM IST

Cluzters.ai

Cluzters.ai is the first step towards uniting various Industry participants in the field of Applied Data Innovations. It is a gamified community geared towards creating a level playing turf for Data science professionals.

Member Sign In

Member Sign In

Create Account

Introduction to Data Science in Python problem

Connect With Us