How to Tokenize group of words in Python

QBoard » Artificial Intelligence & ML » AI and ML - Python » How to Tokenize group of words in Python

How to Tokenize group of words in Python

Back To Topics

Tags : nlp python tokenize

Maryam Bains

317

I am developing a application in python which gives job recommendation based on the resume uploaded. I am trying to tokenize resume before processing further. I want to tokenize group of words. For example Data Science is a keyword when i tokenize i will get data and science separately. How to overcome this situation. Is there any library which does these extraction in python?

October 26, 2021 1:22 PM IST

0
Sindhuja Martha

181

if you wish to tokenise all the words in the resume by some delimiter such as a space then based on your example input "Data Science" and output ["data", "science"] the following function will lower case a string an split its contents by a space, returning a list of strings.

def tokenize(resume_string): return resume_string.lower().split(" ")

October 27, 2021 2:09 PM IST

0
Viaan Prakash

461
Looks like you are looking to generate n-grams (bi-grams in particular). If that's the case, the following is one way to achieve this:
```
from nltk import ngrams
resume = '... working in the data science field for years ...'
n = 2
bigrams = ngrams(resume.split(), n)
for grams in bigrams:
  print grams
```
October 28, 2021 4:40 PM IST

0

Cluzters.ai is the first step towards uniting various Industry participants in the field of Applied Data Innovations. It is a gamified community geared towards creating a level playing turf for Data science professionals.

Member Sign In

Member Sign In

Create Account

How to Tokenize group of words in Python

Connect With Us