HOW TO REMOVE OUTLIERS IN PYTHON
Outliers are values in the dataset that are different from the rest of the data. The outliers in a dataset can be formed as a result of a mistake during data collection, or it can be an indication of variance in your data.
Pandas is a hugely popular package for removing outliers in Python.
In this article, we are using the Z-Score method for removal of Outliers.
Z-score is the score that tells how many Standard deviations above or below a number is from the mean of the dataset. A positive Z score means the number of standard deviation above the mean and a negative score means number of standard deviation below the mean. Zscore can be found easily using SciPy.
According to the empirical rule, the absolute value of z-score above 3 is considered as an Outlier.
import numpy as np import pandas as pd from scipy import stats df = pd.DataFrame([-1, 15, 21, 41, 48, 48, 158]) print(df) z_score = stats.zscore(df) # calculate z-score abs_zscore = np.abs(z_score) entries = (abs_zscore < 3).all(axis = 1) new_df = df[entries] print(new_df)