Sitemap
Science Spectrum

Science Spectrum is here to guide you on your personal path to understanding the fascinating world of science, mathematics, and related topics. Our goal is to make complex concepts accessible to everyone. We are happy to be a member of the Medium Boost family!

Member-only story

Extreme Value Theory: The Science of Outliers?

--

Artistic interpretation of value distributions.
Image Credit: Idyllic,

Extreme events often appear to be outliers and are subsequently overlooked, despite their significance in understanding a given phenomenon. Extreme Value Theory (EVT) provides a framework for analyzing extremes, helping us make sense of a world that isn’t always as well-behaved as the average statistician would like. In this article, we’ll look at EVT through practical examples from my research on methods for studying epilepsy, alongside other intriguing models that capture rare and chaotic behaviors.

On Outliers

When I teach machine learning (ML), we always have a lengthy discussion about outliers.

How can we detect them? Once they’re on our radar, what should we do about them? How did they get into our dataset in the first place?

I am firmly on team outlier in that I believe you should only bench (i.e., remove) outliers in the case of sensor or data entry errors. Outliers are exceedingly useful; for instance, they can often provide insight into whether or not a given machine learning model is overfitting:

How outliers can help you catch when a model is overfitting. An underfit model will have few of the points of the general distribution actually represented. A good model will fit the points that fall within the bounds of the general distribution, excluding outliers. The overfit model will capture all of the points, including the outliers, performing too well on the training set, and poorly on the test set.
Image Credit:

Further, in unbalanced datasets — like those encountered in trying to create fraud…

Science Spectrum
Science Spectrum

Published in Science Spectrum

Science Spectrum is here to guide you on your personal path to understanding the fascinating world of science, mathematics, and related topics. Our goal is to make complex concepts accessible to everyone. We are happy to be a member of the Medium Boost family!

Laurel W
Laurel W

Written by Laurel W

Mathematician/Data Scientist. Current interests include ML-forward EEG analysis, computational methods for exploring prime distributions, and climate models.

Responses (10)