How to remove Stopwords in Python

Stopwords are English words which does not add much meaning to a sentence. They can be ignored without losing the meaning of the sentence

NLTK(Natural Language Toolkit) in python has a list of stopwords stored in 16 different languages. We first download it to our python environment.

import nltk

nltk.download('stopwords')

To check the list of stopwords you can type the following command:

import nltk
from nltk.corpus import stopwords
print(stopwords.words('english'))

['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "you're", "you've", "you'll", "you'd", 'your', 'yours', 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', "she's", 'her', 'hers', 'herself', 'it', "it's", 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves', 'what', 'which', 'who', 'whom', 'this', 'that', "that'll", 'these', 'those', 'am', 'is', 'are', 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does', 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until', 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into', 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down', 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here', 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more', 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so', 'than', 'too', 'very', 's', 't', 'can', 'will', 'just', 'don', "don't", 'should', "should've", 'now', 'd', 'll', 'm', 'o', 're', 've', 'y', 'ain', 'aren', "aren't", 'couldn', "couldn't", 'didn', "didn't", 'doesn', "doesn't", 'hadn', "hadn't", 'hasn', "hasn't", 'haven', "haven't", 'isn', "isn't", 'ma', 'mightn', "mightn't", 'mustn', "mustn't", 'needn', "needn't", 'shan', "shan't", 'shouldn', "shouldn't", 'wasn', "wasn't", 'weren', "weren't", 'won', "won't", 'wouldn', "wouldn't"]

Removing stop words with NLTK

from nltk.corpus import stopwords 
from nltk.tokenize import word_tokenize 

example = "Ramu is a good boy."

stopwords = set(stopwords.words('english')) 

tokens = word_tokenize(example) 

filtered = [] 

for i in tokens: 
	if i not in stopwords: 
		filtered.append(i) 

print(tokens) 
print(filtered)

Output:

['Ramu', 'is', 'a', 'good', 'boy', '.']
['Ramu', 'good', 'boy', '.']

Python

How to remove Stopwords in Python

How to remove space from string in Python

How to return multiple values from a function in Python

Contact

Company

Useful Links

Support

Python

How to remove space from string in Python

How to return multiple values from a function in Python

You may also like

15 Powerful Step for Mastering JSON Parsing in Python: Boosting Data Manipulation and Validation

Introduction to Transfer Learning with Python: A Practical Guide

How to Check Type in Python

Contact

Company

Useful Links

Support

Login with your site account

Register a new account