estimator = LinearClassifier(
n_classes=n_classes, feature_columns=[sparse_column_a,
sparse_feature_a_x_sparse_feature_b], label_keys=label_keys)
(If there's a better suggestion on which estimator to use, I'd be open to trying that.)
And I'm passing data as:
df = pd.DataFrame(np.random.randint(0,2,size=(100, 12)), columns=list('ABCDEFGHIJKL'))
tf_val = tf.estimator.inputs.pandas_input_fn(X.iloc[:, 0:9], X.iloc[:, 11], shuffle=True)
However, I'm not sure how to take this output and properly pass into a classifier. Am I setting up the problem properly? I'm not coming from a data science background, so any guidance would be very helpful!
Concerns
Since all of your features are already numerical you can use them as they are.
df = pd.DataFrame(np.random.randint(0,2,size=(100, 12)), columns=list('ABCDEFGHIJKL'))
df['K'] = np.random.random(100)
nuemric_features = [tf.feature_column.numeric_column(column) for column in df.columns[:11]]
model = tf.estimator.LinearClassifier(feature_columns=nuemric_features)
tf_val = tf.estimator.inputs.pandas_input_fn(df.iloc[:,:11], df.iloc[:,11], shuffle=True)
model.train(input_fn=tf_val, steps=1000)
print(list(model.predict(input_fn=tf_val))[0])
{'logits': array([-1.7512109], dtype=float32), 'logistic': array([0.14789453], dtype=float32), 'probabilities': array([0.8521055 , 0.14789453], dtype=float32), 'class_ids': array([0]), 'classes': array(, dtype=object)}
The probabilities of the prediction output is most likely what you are interested in. You have two probabilities, one for the target being Flase and one for True.
If you want to have more details look at this nice blog-post about binary classification with TensorFlow.
Support Vector Machines (SVMs) are a type of classification algorithm that are more flexible - they can do linear classification, but can use other non-linear basis functions. The following example uses a linear classifier to fit a hyperplane that separates the data into two classes:
import sklearn as sk
from sklearn import svm
import pandas as pd
import os
os.chdir('/Users/stevenhurwitt/Documents/Blog/Classification')
heart = pd.read_csv('SAHeart.csv', sep=',',header=0)
y = heart.iloc[:,9]
X = heart.iloc[:,:9]
SVM = svm.LinearSVC()
SVM.fit(X, y)
SVM.predict(X.iloc[460:,:])
round(SVM.score(X,y), 4)