How to apply Min-Max Normalization to your data

Sourabh Gupta
4 min readJan 6, 2022

--

But why do we need Normalization anyway?

Let’s say you have a dataset with multiple features, so far so good, right? You notice that the ranges in which the values of these features span over are incomparable to each other, for example, a feature varies between 1 to 10, while a different one varies from 1 to 1000. Now if we ignore this fact and go straightaway to modelling then the second attribute is going to mean a lot to the model! The weights that the model assigns to these features would get affected heavily and the model would end up assigning a high weightage to the “bigger variable”.

Now how can this be fixed? Bring those damn features in the same or at least comparable ranges. This is what we call normalizing our data.

There are multiple ways to achieve data normalization, for example, min-max normalization, decimal scaling, and z score normalization. I have written in detail on each one of them on my site. you should check them out to see which normalization is best for your data.

Let’s finally start with the Min-Max! It is one of the most common ways to normalize data. All you need to do is map the minimum and maximum values of a feature to 0 and 1 respectively.

So for every feature,

  • the minimum value of that feature gets transformed into 0,
  • the maximum value gets transformed into a 1,
  • and all the other values are transformed to a value between 0 and 1 linearly.

In case you want to choose a different scale than 0 to 1, for example, 10 to 100 or -1 to 1 then you can use the following formula.

v’ is the new value of each entry in data.

v is the old value of each entry in data.

new_max(A), new_min(A) is the max and min value of the range (i.e boundary value of range required) respectively.

Where is the current value of feature F?

Let us consider one example with feature ‘F’ to make the calculation method clear.

minimum value of F= $50,000

maximum values = $100,000

We want to normalize F from 0 to 1.

In accordance with min-max normalization, v = $80,000 is transformed to:

you can also use the sci-kit-learn object MinMaxScaler straight away as well.

As you can see this technique enables us to interpret the data easily. There are no large numbers, only concise data that do not require further transformation and can be used in the decision-making process immediately.

The bad side of the Min-max is that it doesn’t handle outliers very well. For example, if you have 99 values between 0 and 40, and one value is 100, then the 99 values will all be transformed to a value between 0 and 0.4.

That data is just as squished as before!

Take a look at the image below to see an example of this.

After normalizing, look at the below diagram. It fixed the squishing problem on the y-axis but the x-axis remains problematic. Also, the point in orange colour is an outlier, which the min-max normalizer failed to handle.

Good practice usage with the MinMaxScaler and other scaling techniques is as follows:

  • Fit the scaler using available training data. For normalization, this means the training data will be used to estimate the minimum and maximum observable values. This is done by calling the fit() function.
  • Apply the scale to training data. This means you can use the normalized data to train your model. This is done by calling the transform() function.
  • Apply the scale to data going forward. This means you can prepare new data in the future on which you want to make predictions.

References:

1] https://www.codecademy.com/articles/normalization

--

--