QBoard » Artificial Intelligence & ML » AI and ML - Tensorflow » Building SVM with tensorflow's LinearClassifier and Panda's Dataframes

Building SVM with tensorflow's LinearClassifier and Panda's Dataframes

  • I'm aware of this question, but it is for an outdated function.

    Let's say I'm trying to predict whether a person will visit country 'X' given the countries they have already visited and their income.

    I have a training data set in a pandas DataFrame that's in the following format.

    Each row represents a different person, each unrelated to the others in matrix.
    The first 10 columns are all names of countries and the values in the column are binary (1 if they have visited that country or 0 if they haven't).
    Column 11 is their income. It's a continuous decimal variable.
    Lastly, column 12 is another binary table that says yes they have visited 'X' or not.
    So essentially, if I have a 100,000 people in my dataset, then I have a dataframe of dimensions 100,000 x 12. I want to be able to properly pass this into a linear classifier using tensorflow. But not sure even how to approach this.

    I am trying to pass the data into this function

    estimator = LinearClassifier(
        n_classes=n_classes, feature_columns=[sparse_column_a, 
     sparse_feature_a_x_sparse_feature_b], label_keys=label_keys)​


    (If there's a better suggestion on which estimator to use, I'd be open to trying that.)

    And I'm passing data as:

    df = pd.DataFrame(np.random.randint(0,2,size=(100, 12)), columns=list('ABCDEFGHIJKL'))
    tf_val = tf.estimator.inputs.pandas_input_fn(X.iloc[:, 0:9], X.iloc[:, 11], shuffle=True)


    However, I'm not sure how to take this output and properly pass into a classifier. Am I setting up the problem properly? I'm not coming from a data science background, so any guidance would be very helpful!

    Concerns

    1. Column 11 is a covariate. Hence, I don't think it can just be passed in as a feature, can it?
    2. How can I incorporate column 11 into the classifier as well, since column 11 is a completely different type of feature than columns 1 through 10.
    3. At the very least, even if I ignore column 11, how do I at least fit column 1 through 10, with label = column 12 and pass this into a classifier?
      October 11, 2021 1:05 PM IST
    0
  • Since all of your features are already numerical you can use them as they are.

    df = pd.DataFrame(np.random.randint(0,2,size=(100, 12)), columns=list('ABCDEFGHIJKL'))
    df['K'] = np.random.random(100)
    nuemric_features = [tf.feature_column.numeric_column(column) for column in df.columns[:11]]
    model = tf.estimator.LinearClassifier(feature_columns=nuemric_features)
    tf_val = tf.estimator.inputs.pandas_input_fn(df.iloc[:,:11], df.iloc[:,11], shuffle=True)
    model.train(input_fn=tf_val, steps=1000)
    
    print(list(model.predict(input_fn=tf_val))[0])
    {'logits': array([-1.7512109], dtype=float32), 'logistic': array([0.14789453], dtype=float32), 'probabilities': array([0.8521055 , 0.14789453], dtype=float32), 'class_ids': array([0]), 'classes': array(, dtype=object)}

     

    The probabilities of the prediction output is most likely what you are interested in. You have two probabilities, one for the target being Flase and one for True.

    If you want to have more details look at this nice blog-post about binary classification with TensorFlow.

      October 12, 2021 1:36 PM IST
    0
  • Linear SVM
    SVM is a max margin classifier, i.e. it maximizes the width or the margin separating the positive class from the negative class. The loss function of linear SVM in case of binary classification is given below.

    enter image description here

    It can be derived from the more generalized multi class linear SVM loss (also called hinge loss) shown below (with Δ = 1).

    enter image description here enter image description here

    Note: In all the above equations, the weight vector w includes bias b

    How on the earth did someone came up with this loss? Lets dig in.

    enter image description here

    Image above shows the data points belonging to positive class separated from the data point belonging to the negative class by a separating hyperplane (shown as solid line). However, there can be many such separating hyperplanes. SVM finds the separating hyperplane such that the distance of the hyperplane to the nearest positive data point and to the nearest negative data point is maximum (shown as dotted line).

    Mathematically, SVM finds the weight vector w (bias included) such that

    enter image description here

    If the labels(y) of +ve class and -ve class are +1 and -1 respectively, then SVM finds w such that

    enter image description here

    • If a data point is on the correct side of the hyperplane (correctly classified) then

    enter image description here

    • If a data point is on the wrong side (miss classified) then

    enter image description here

    So the loss for a data point, which is a measure of miss classification can be written as

    enter image description here

    Regularization
    If a weight vector w correctly classifies the data (X) then any multiple of these weight vector λw where λ>1 will also correctly classifies the data ( zero loss). This is because the transformation λW stretches all score magnitudes and hence also their absolute differences. L2 regularization penalizes the large weights by adding the regularization loss to the hinge loss.

    enter image description here

    For example, if x=[1,1,1,1] and two weight vectors w1=[1,0,0,0], w2=[0.25,0.25,0.25,0.25]. Then dot(W1,x) =dot(w2,x) =1 i.e. both the weight vectors lead to the same dot product and hence same hinge loss. But the L2 penalty of w1 is 1.0 while the L2 penalty of w2 is only 0.25. Hence L2 regularization prefers w2 over w1. The classifier is encouraged to take into account all input dimensions to small amounts rather than a few input dimensions and very strongly. This improve the generalization of the model and lead to less overfitting.

    L2 penalty leads to the max margin property in SVMs. If the SVM is expressed as an optimization problem then the generalized Lagrangian form for the constrained quadratic optimization problem is as below

    enter image description here

    Now that we know the loss function of linear SVM we can use gradient decent (or other optimizers) to find the weight vectors which minimizes the loss.
      December 30, 2021 1:18 PM IST
    0
  • Support Vector Machines (SVMs) are a type of classification algorithm that are more flexible - they can do linear classification, but can use other non-linear basis functions. The following example uses a linear classifier to fit a hyperplane that separates the data into two classes:

     
    import sklearn as sk
    from sklearn import svm
    import pandas as pd
    import os
    
    os.chdir('/Users/stevenhurwitt/Documents/Blog/Classification')
    heart = pd.read_csv('SAHeart.csv', sep=',',header=0)
    
    y = heart.iloc[:,9]
    X = heart.iloc[:,:9]
    
    SVM = svm.LinearSVC()
    SVM.fit(X, y)
    SVM.predict(X.iloc[460:,:])
    round(SVM.score(X,y), 4)
      January 14, 2022 2:01 PM IST
    0