QBoard » Artificial Intelligence & ML » AI and ML - Python » Coursera Course - Introduction of Data Science in Python Assignment 1

Coursera Course - Introduction of Data Science in Python Assignment 1

  • I'm taking this course on Coursera, and I'm running some issues while doing the first assignment. The task is to basically use regular expression to get certain values from the given file. Then, the function should output a dictionary containing these values:

    example_dict = {"host":"146.204.224.152", 
    
                    "user_name":"feest6811", 
    
                    "time":"21/Jun/2019:15:45:24 -0700",
    
                    "request":"POST /incentivize HTTP/1.1"} 


    This is just a screenshot of the file. Due to some reasons, the link doesn't work if it's not open directly from Coursera. I apologize in advance for the bad formatting. One thing I must point out is that for some cases, as you can see in the first example, there's no username. Instead '-' is used.

    159.253.153.40 - - [21/Jun/2019:15:46:10 -0700] "POST /e-business HTTP/1.0" 504 19845
    136.195.158.6 - feeney9464 [21/Jun/2019:15:46:11 -0700] "HEAD /open-source/markets HTTP/2.0" 204 21149 

     

    This is what I currently have right now. However, the output is None. I guess there's something wrong in my pattern.

    import re
    def logs():
        
        with open("assets/logdata.txt", "r") as file:
            logdata = file.read()
        # YOUR CODE HERE
            
            pattern = """ 
            (?P<host>\w*)
            (\d+\.\d+.\d+.\d+\ )
            (?P<user_name>\w*)
            (\ -\ [a-z]+[0-9]+\ )
            (?P<time>\w*)
            (\[(.*?)\])
            (?P<request>\w*)
            (".*")
            """
            for item in re.finditer(pattern,logdata,re.VERBOSE):
           
                print(item.groupdict())
    
      December 2, 2021 3:02 PM IST
    0
  • import re
    def names():
        simple_string = """Amy is 5 years old, and her sister Mary is 2 years old. Ruth and Peter, their parents, have 3 kids."""
    
        # YOUR CODE HERE
        p=re.findall('[A-Z][a-z]*',simple_string)
        return p
    
        #raise NotImplementedError()

     

    Check using following code:

    assert len(names()) == 4, "There are four names in the simple_string"
    

     

    For more information regarding regex, read the following documentation, it would be very useful for beginners: https://docs.python.org/3/library/re.html#module-re

     
      December 4, 2021 1:12 PM IST
    0
  • You can use the following expression:

    (?P<host>\d+(?:\.\d+){3}) # 1+ digits and 3 occurrenses of . and 3 digits
    \s+\S+\s+                 # 1+ whitespaces, 1+ non-whitespaces, 1+ whitespaces
    (?P<user_name>\S+)\s+\[   # 1+ non-whitespaces (Group "user_name"), 1+ whitespaces and [
    (?P<time>[^\]\[]*)\]\s+   # Group "time": 0+ chars other than [ and ], ], 1+ whitespaces
    "(?P<request>[^"]*)"      # ", Group "request": 0+ non-" chars, "

     

    See the regex demo. See the Python demo:

    import re
    logdata = r"""159.253.153.40 - - [21/Jun/2019:15:46:10 -0700] "POST /e-business HTTP/1.0" 504 19845
    136.195.158.6 - feeney9464 [21/Jun/2019:15:46:11 -0700] "HEAD /open-source/markets HTTP/2.0" 204 21149"""
    pattern = r'''
    (?P<host>\d+(?:\.\d+){3}) # 1+ digits and 3 occurrenses of . and 3 digits
    \s+\S+\s+                 # 1+ whitespaces, 1+ non-whitespaces, 1+ whitespaces
    (?P<user_name>\S+)\s+\[   # 1+ non-whitespaces (Group "user_name"), 1+ whitespaces and [
    (?P<time>[^\]\[]*)\]\s+   # Group "time": 0+ chars other than [ and ], ], 1+ whitespaces
    "(?P<request>[^"]*)"      # ", Group "request": 0+ non-" chars, "
    '''
    for item in re.finditer(pattern,logdata,re.VERBOSE):
        print(item.groupdict())

     

    Output:

    {'host': '159.253.153.40', 'user_name': '-', 'time': '21/Jun/2019:15:46:10 -0700', 'request': 'POST /e-business HTTP/1.0'}
    {'host': '136.195.158.6', 'user_name': 'feeney9464', 'time': '21/Jun/2019:15:46:11 -0700', 'request': 'HEAD /open-source/markets HTTP/2.0'}
      December 7, 2021 1:47 PM IST
    0