How to tokenize a string in Python

String tokenization is a process where a string is broken into several parts. Each part is called a token.

While working on data, we need to perform the string tokenization of the strings that we might get as input. This particular thing has uses in many applications of Machine Learning.

Method 1 : Using list comprehension + split()

sample = ['Python is easy', ' powerful', 'programming language']

print('Original list: ' + str(sample))

#tokenizing string
out = [i.split() for i in sample]

print('After tokenizing: ' + str(out))

Output:

Original list: ['Python is easy', ' powerful', 'programming language']
After tokenizing: [['Python', 'is', 'easy'], ['powerful'], ['programming', 'language']]

Method 2 : Using map() + split()

sample = ['Python is easy', ' powerful', 'programming language']

print('Original list: ' + str(sample))

#tokenizing string
out = list(map(str.split, sample))

print('After tokenizing: ' + str(out))

Output:

Original list: ['Python is easy', ' powerful', 'programming language']
After tokenizing: [['Python', 'is', 'easy'], ['powerful'], ['programming', 'language']]

Python

How to tokenize a string in Python

How to terminate a program in Python

How to Traverse a List in Python

Contact

Company

Useful Links

Support

Python

How to terminate a program in Python

How to Traverse a List in Python

You may also like

15 Powerful Step for Mastering JSON Parsing in Python: Boosting Data Manipulation and Validation

Introduction to Transfer Learning with Python: A Practical Guide

How to Check Type in Python

Contact

Company

Useful Links

Support

Login with your site account

Register a new account