Prasad Chaskar

Related Listings

stock prices predict...

0 comments, 1 review , 3 likes
Brain Stroke Prediction

0 comments, 4 reviews , 5 likes

Brain Stroke Prediction

0 comments, 4 reviews , 993 views, 5 likes
Thyroid Disease Prediction

0 comments, 2 reviews , 523 views, 3 likes

Major Concepts

Models Home » Domain Usecases » Health Care and Pharmaceuticals » Pancreatic Cancer Detection

Pancreatic Cancer Detection

Models Status

Model Overview

Introduction :

What Is Pancreatic Cancer?

Pancreatic cancer is a type of cancer that starts in the pancreas. Cancer starts when cells in the body begin to grow out of control.
Pancreatic cancer is usually not found until advanced stages because it is hard to detect. Signs of pancreatic cancer include jaundice and weight loss. Risk factors include having diabetes and exposure to certain chemicals. Specific treatment depends on the size and location of the tumor, and whether or not it has spread to other areas of the body.

What are the symptoms of pancreatic cancer?

Upper abdominal pain that may spread to the back.

Yellowing of the skin and the whites of the eyes.

Tiredness.

Loss of appetite.

Light-colored poop.

Dark-colored pee.

Weight loss.

Used Libraries :

pandas

numpy

matplotlib

sklearn

seaborn

pickle

About Data :

Dataset Link : https://www.kaggle.com/johnjdavisiv/urinary-biomarkers-for-pancreatic-cancer

The key features are four urinary biomarkers: creatinine, LYVE1, REG1B, and TFF1.

Creatinine is a protein that is often used as an indicator of kidney function.

YVLE1 is lymphatic vessel endothelial hyaluronan receptor 1, a protein that may play a role in tumor metastasis

REG1B is a protein that may be associated with pancreas regeneration

TFF1 is trefoil factor 1, which may be related to regeneration and repair of the urinary tract

Age and sex, both included in the dataset, may also play a role in who gets pancreatic cancer. The dataset includes a few other biomarkers as well, but these were not measured in all patients (they were collected partly to measure how various blood biomarkers compared to urine biomarkers).

Lets look code.
Import Data :

cancer_df = pd.read_csv('data.csv')

cancer_df.head()

# Shape of our dataset

cancer_df.shape

Check Correlation :

plt.figure(figsize=(8,7))

sns.heatmap(cancer_df.corr(),annot=True)

Drop Unwanted Columns :

cancer_df.drop(['sample_id','sample_origin','benign_sample_diagnosis','stage','patient_cohort'],axis=1,inplace=True)

cancer_df.head()

Handle Missing Values :

cancer_df.isnull().sum()

Inference :

plasma and REG1A contain null values.

# 1 - no pancreatic disease

# 2 - benign

# 3 - pancreatic cancer

plt.figure(figsize=(8,5))

sns.countplot(x='diagnosis',data=cancer_df,palette='Set2');

plt.title("Countplot for Target variable",{'fontsize':20});

Data Visualization :

sns.countplot(x='sex',data=cancer_df,hue='diagnosis',palette='Set1');

plt.title("Countplot for sex variable based on dignosis",{'fontsize':20});

Inference :

There are more male patients than females.
Boxplot for other features :

fig, axes = plt.subplots(2, 3, figsize=(12, 8), sharey=True)

sns.boxplot(ax=axes[0][0],x="plasma_CA19_9", data=cancer_df,palette='dark')

sns.boxplot(ax=axes[0][1],x="creatinine", data=cancer_df,palette='dark')

sns.boxplot(ax=axes[0][2],x="LYVE1", data=cancer_df,palette='dark')

sns.boxplot(ax=axes[1][0],x="REG1B", data=cancer_df,palette='dark')

sns.boxplot(ax=axes[1][1],x="TFF1", data=cancer_df,palette='dark')

sns.boxplot(ax=axes[1][2],x="REG1A", data=cancer_df,palette='dark');

sns.boxplot(x='age',data=cancer_df,palette='bright');

plt.title("Distibution of feature age",{'fontsize':20})

Split Data into dependent and independent feature :

X = cancer_df.drop('diagnosis',axis=1)

y = cancer_df.diagnosis

Normalization :

In target variable previously 1 was no pancreatic disease,2 was Benign Pancreatic Disease and 3 was Pancreatic Disease.
Now we change it into as follow:
0 - No Pancreatic Disease
1 - Benign Pancreatic Disease
2 - Pancreatic Disease

for i in X.columns:

    if X[i].dtype == 'O':

        print(f"{i} : ",X[i].unique())



encoder1=LabelEncoder()

X.sex = encoder1.fit_transform(X.sex)



def fun(df):

    if df==1:

        return 0

    if df==2:

        return 1

    else:

        return 2

y = y.apply(fun)

Split data int two subsets: training data and testing data :

X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.25,random_state=11)

print(X_train.shape,X_test.shape)

Model Building :

models = list()

accuracy = list()

Hyperparameter Tuning :
1] Random Forest :

rf  = RandomForestClassifier(random_state=43)

param_grid = { 

    'n_estimators': [200, 500],

    'max_features': ['auto', 'sqrt', 'log2'],

    'max_depth' : [4,5,6,7,8],

    'criterion' :['gini', 'entropy']

}

CV_rfc = GridSearchCV(estimator=rf, param_grid=param_grid, cv= 5)

CV_rfc.fit(X_train, y_train)

CV_rfc.best_params_

new_rf = RandomForestClassifier(criterion='entropy',max_depth=8,n_estimators=240)

new_rf.fit(X_train,y_train)

models.append("Random Forest")

accuracy.append(abs(new_rf.score(X_test,y_test)))

Classification Report :

ypred1 = new_rf.predict(X_test)

print("Classification Report for Random Forest")

print(classification_report(y_test,ypred1))

2] Support Vector Machine :

param_grid = {'C': [0.1, 1, 10, 100, 1000],

              'gamma': [1, 0.1, 0.01, 0.001, 0.0001],

              'kernel': ['rbf']}

 

grid = GridSearchCV(SVC(), param_grid, refit = True, verbose = 3)

grid.fit(X_train,y_train)

grid.best_params_



#Output:{'C': 10, 'gamma': 0.0001, 'kernel': 'rbf'}

s = SVC(C=10,gamma=0.0001,kernel='rbf')

s.fit(X_train,y_train)

models.append("SVM")

accuracy.append(abs(s.score(X_test,y_test)))



ypred2 = s.predict(X_test)

print("Classification Report SVM")

print(classification_report(y_test,ypred2))

3] Logistic Regression :

logreg=LogisticRegression(max_iter=3500,penalty='l2')

logreg.fit(X_train,y_train)

models.append("Logistic Regression")

accuracy.append(abs(logreg.score(X_test,y_test)))



ypred3 = logreg.predict(X_test)

print("Classification Report Logistic Regression")

print(classification_report(y_test,ypred3))

4] Dicision Tree :

dt = DecisionTreeClassifier()

dt.fit(X_train,y_train)

models.append("Decision Tree")

accuracy.append(abs(dt.score(X_test,y_test)))



ypred4 = dt.predict(X_test)

print("Classification Report Dicision Tree")

print(classification_report(y_test,ypred4))

Accuracy of Models :

plt.figure(figsize=(8,6))

plt.bar(models,accuracy,color=['red','deepskyblue','orange','blue']);

plt.title("Accuracy of models",{'fontsize':20})

Thank You ):

0 comments

Prasad Chaskar, Maryam Bains, and Viaan Prakash like this

Related Listings

Prasad Chaskar's other Models Reports

Major Concepts

Pancreatic Cancer Detection

Models Status

Model Overview

Deployment

Photos

Reviews

Connect With Us

Member Sign In

Member Sign In

Create Account

Related Listings

Prasad Chaskar's other Models Reports

Major Concepts

Pancreatic Cancer Detection

Models Status

Model Overview

Deployment

Photos

Reviews

Connect With Us