QBoard » Artificial Intelligence & ML » AI and ML - Python » What is the use of Sklearn preprocessing?

What is the use of Sklearn preprocessing?

  • What is the use of Sklearn preprocessing
      August 16, 2021 3:32 PM IST
    0
  • Firstly, from sklearn.preprocessing, StandardScaler is imported for standardizing the training dataset. Standardizing means getting the z-score http://www.w3.org/1998/Math/MathML"><mo stretchy="false">(</mo><mstyle displaystyle="true" scriptlevel="0"><mfrac><mrow><mi>x</mi><mo>&#x2212;</mo><mrow class="MJX-TeXAtom-ORD"><mo>&#xB5;</mo></mrow></mrow><mrow class="MJX-TeXAtom-ORD"><mo>&#x3C3;</mo></mrow></mfrac></mstyle><mo stretchy="false">)</mo></math>">(xµσ)(x−µσ)for each of your training datapoint (x). So when you apply fit() on it, the training dataset is fitted into this standardized model. Thereafter, you can use transform() to use this dataset for transforming it to different forms. There is another easy function fit_transform() which jointly does the work of both fit() and transform().
      August 16, 2021 9:45 PM IST
    0
  • StandardScaler removes the mean and scales each feature/variable to unit variance. This operation is performed feature-wise in an independent way. StandardScaler can be influenced by outliers (if they exist in the dataset) since it involves the estimation of the empirical mean and standard deviation of each feature.
      August 17, 2021 1:03 PM IST
    0
  • Normalize samples individually to unit norm.

    Each sample (i.e. each row of the data matrix) with at least one non zero component is rescaled independently of other samples so that its norm (l1, l2 or inf) equals one.

    This transformer is able to work both with dense numpy arrays and scipy.sparse matrix (use CSR format if you want to avoid the burden of a copy / conversion).

    Scaling inputs to unit norms is a common operation for text classification or clustering for instance. For instance the dot product of two l2-normalized TF-IDF vectors is the cosine similarity of the vectors and is the base similarity metric for the Vector Space Model commonly used by the Information Retrieval community.

      December 14, 2021 11:54 AM IST
    0
  • The sklearn. preprocessing package provides several common utility functions and transformer classes to change raw feature vectors into a representation that is more suitable for the downstream estimators. In general, learning algorithms benefit from standardization of the data set.
     
      December 15, 2021 12:39 PM IST
    0