QBoard » Advanced Visualizations » Viz - Python » Cleaning a list of names in Python for Data Science

Cleaning a list of names in Python for Data Science

  • so I happened to receive an xlms file that contains names of individuals with different titles such as Mr, Ms, Dr, Mrs, Judge etc. However some of these names contains multiple titles within one name example "Mr Mrs Ronderval", "Dr Rev Johns Mr" etc. So am trying to remove all of them except for one, hence the final result should be Mr Ronderval or Mrs Ronderval, Dr Johns or Rev Johns or Mr Johns any of them will be fine. So far what i have done is to convert the strings into a list of lists such as name_list = [['Mr','Mrs', 'Ronderval'], ['Dr', 'Rev','Johns', 'Mr']] and have a list of titles title=['Mr', 'Ms', 'Dr', 'Mrs', 'Judge','Rev']. So i tried to iterate through the name_list removing all values from titles and the result obviously is "Roderval" and "Johns" but i want atleast one title to be left in the name Mr Ronderval or Mrs Ronderval, Dr Johns or Rev Johns or Mr Johns. How do i go about this?

    Here is my code using list comprehension

     name_list=[[x for x in l if (x not in title )] for l in name_list] 
    

     

      November 23, 2021 1:15 PM IST
    0
  • You can make one pass through your name list checking to find a title and name (anything that is not a title) for each entry as you go.

    Example:

    name_list = [['Mr','Mrs', 'Ronderval'], ['Dr', 'Rev','Johns', 'Mr']]
    title_list = ['Mr', 'Ms', 'Dr', 'Mrs', 'Judge','Rev']
    
    filtered_name_list = []
    
    for one_entry in name_list:
        title, name = None, None
    
        for name_or_title in one_entry:
            if name_or_title in title_list:
                title = name_or_title
            else:
                name = name_or_title
            if title and name:
                break
    
        filtered_name_list.append([title, name])
    
    print(filtered_name_list)

     

    Output:

    [['Mrs', 'Ronderval'], ['Rev', 'Johns']]
    
      November 27, 2021 10:33 AM IST
    0
  • This worked for me:

    name_list_all = [['Mr','Mrs', 'Ronderval'], ['Dr', 'Rev','Johns', 'Mr']]
    title=['Mr', 'Ms', 'Dr', 'Mrs', 'Judge','Rev']
    name_list=[[x for x in l if (x not in title )] for l in name_list_all]
    title_list =[[x for x in l if (x  in title )] for l in name_list_all]
    
    [x.append(y[0]) for x,y in zip(name_list,title_list)]
      December 2, 2021 2:55 PM IST
    0