# HOW TO REMOVE OUTLIERS IN PYTHON

Outliers are values in the dataset that are different from the rest of the data. The outliers in a dataset can be formed as a result of a mistake during data collection, or it can be an indication of variance in your data.

Pandas is a hugely popular package for removing outliers in Python.

In this article, we are using the **Z-Score method** for removal of Outliers.

**Z-score** is the score that tells how many Standard deviations above or below a number is from the mean of the dataset. A positive Z score means the number of standard deviation above the mean and a negative score means number of standard deviation below the mean. Zscore can be found easily using SciPy.

According to the empirical rule, the absolute value of z-score above 3 is considered as an Outlier.

import numpy as np import pandas as pd from scipy import stats df = pd.DataFrame([-1, 15, 21, 41, 48, 48, 158]) print(df) z_score = stats.zscore(df) # calculate z-score abs_zscore = np.abs(z_score) entries = (abs_zscore < 3).all(axis = 1) new_df = df[entries] print(new_df)