mlnext.pipeline.ClippingMinMaxScaler#

class mlnext.pipeline.ClippingMinMaxScaler(feature_range: Tuple[float, float] = (0, 1), *, clip: Tuple[float, float] | None = None, p: float = 100.0, copy: bool = True)[source]#

Bases: OneToOneFeatureMixin, BaseEstimator, TransformerMixin

Normalizes the fitted data to the interval feature_range. The parameter p can be used to calculate the max value as the p-th percentile of the fitted data, i.e., p``% of the data is below. Data which exceeds the limits of ``feature_range after the scaling can be clipped to specific values via a clip range.

Example

>>> data = pd.DataFrame({'a': [1, 2, 3, 4]})
>>> scaler = mlnext.ClippingMinMaxScaler(
...     feature_range=(0, 0.5),
...     clip=(0, 1))
>>> scaler.fit_transform(df)
    a
0       0.000000
1       0.166667
2       0.333333
3       0.500000
>>> df2 =  pd.DataFrame({'a': [1, 4, 6, 8, 10]})
        a
0       0.000000
1       0.500000
2       0.833333
3       1.000000
4       1.000000

Methods

`fit`	Fits the scaler to the data.
`fit_transform`	Fit to data, then transform it.
`get_feature_names_out`	Get output feature names for transformation.
`get_params`	Get parameters for this estimator.
`set_output`	Set output container.
`set_params`	Set the parameters of this estimator.
`transform`	Transforms `X` to the new feature range.

fit(X, y=None)[source]#

Fits the scaler to the data.

Parameters:

X (np.array) – Data.
y ([type], optional) – Unused.

Returns:

Returns self.

Return type:

MinMaxScaler

fit_transform(X, y=None, **fit_params)#

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:

X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.

Returns:

X_new – Transformed array.

Return type:

ndarray array of shape (n_samples, n_features_new)

get_feature_names_out(input_features=None)#

Get output feature names for transformation.

Parameters:

input_features (array-like of str or None, default=None) –

Input features.

If input_features is None, then feature_names_in_ is used as feature names in. If feature_names_in_ is not defined, then the following input feature names are generated: [“x0”, “x1”, …, “x(n_features_in_ - 1)”].
If input_features is an array-like, then input_features must match feature_names_in_ if feature_names_in_ is defined.

Returns:

feature_names_out – Same as input features.

Return type:

ndarray of str objects

get_params(deep=True)#

Get parameters for this estimator.

Parameters:: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:: params – Parameter names mapped to their values.
Return type:: dict

set_output(*, transform=None)#

Set output container.

See sphx_glr_auto_examples_miscellaneous_plot_set_output.py for an example on how to use the API.

Parameters:

transform ({"default", "pandas"}, default=None) –

Configure output of transform and fit_transform.

”default”: Default output format of a transformer
”pandas”: DataFrame output
None: Transform configuration is unchanged

Returns:

self – Estimator instance.

Return type:

estimator instance

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:: **params (dict) – Estimator parameters.
Returns:: self – Estimator instance.
Return type:: estimator instance

transform(X) → ndarray[source]#

Transforms X to the new feature range.

Parameters:: X (np.array) – Data.
Returns:: Returns the scaled X.
Return type:: np.array