Prasad Chaskar's other Models Reports

Major Concepts

 

Sign-Up/Login to access Several ML Models and also Deploy & Monetize your own ML solutions for free

Models Home » Generic Models » Predictive Modelling » Stellar Classification

Stellar Classification

Models Status

Model Overview

About Dataset:
The data consists of 100,000 observations of space taken by the SDSS (Sloan Digital Sky Survey). Every observation is described by 17 feature columns and 1 class column which identifies it to be either a star, galaxy or quasar.

  1. obj_ID = Object Identifier, the unique value that identifies the object in the image catalog used by the CAS

  2. alpha = Right Ascension angle (at J2000 epoch)

  3. delta = Declination angle (at J2000 epoch)

  4. u = Ultraviolet filter in the photometric system

  5. g = Green filter in the photometric system

  6. r = Red filter in the photometric system

  7. i = Near Infrared filter in the photometric system

  8. z = Infrared filter in the photometric system

  9. run_ID = Run Number used to identify the specific scan

  10. rereun_ID = Rerun Number to specify how the image was processed

  11. cam_col = Camera column to identify the scanline within the run

  12. field_ID = Field number to identify each field

  13. spec_obj_ID = Unique ID used for optical spectroscopic objects (this means that 2 different observations with the same spec_obj_ID must share the output class)

  14. class = object class (galaxy, star or quasar object)

  15. redshift = redshift value based on the increase in wavelength

  16. plate = plate ID, identifies each plate in SDSS

  17. MJD = Modified Julian Date, used to indicate when a given piece of SDSS data was taken

  18. fiber_ID = fiber ID that identifies the fiber that pointed the light at the focal plane in each observation


Dataset Link:https://www.kaggle.com/fedesoriano/stellar-classification-dataset-sdss17

Import Libraries:



import pandas as pd 
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')​

Load Dataset:


star_df = pd.read_csv('star_classification.csv')
star_df.head()


star_df.columns

Drop Unwanted Columns:


star_df = star_df[['alpha', 'delta', 'u', 'g', 'r', 'i', 'z','class','redshift']]
star_df.head()

EDA:


galaxy = star_df[star_df['class']=='GALAXY']
star = star_df[star_df['class']=='STAR']
qso = star_df[star_df['class']=='QSO']

Alpha Vs Redshift:

plt.figure(figsize=(9,7))
sns.scatterplot(x='alpha',y='redshift',data=galaxy,color='r');
plt.title("Alpha Vs Redshift for Galaxy",{'fontsize':20});



plt.figure(figsize=(9,7))
sns.scatterplot(x='alpha',y='redshift',data=star,color='r');
plt.title("Alpha Vs Redshift for Star",{'fontsize':20});


plt.figure(figsize=(9,7))
sns.scatterplot(x='alpha',y='redshift',data=qso,color='r');
plt.title("Alpha Vs Redshift for QSO",{'fontsize':20});


Green filter Vs Red Filter:


plt.figure(figsize=(9,7))
sns.scatterplot(x='g',y='r',data=galaxy,color='b');
plt.title("Green filter Vs Red Filter for Galaxy",{'fontsize':20});


plt.figure(figsize=(9,7))
sns.scatterplot(x='g',y='r',data=star,color='b');
plt.title("Green filter Vs Red Filter for Star",{'fontsize':20});


plt.figure(figsize=(9,7))
sns.scatterplot(x='g',y='r',data=qso,color='b');
plt.title("Green filter Vs Red Filter for QSO",{'fontsize':20});


Target Feature:


plt.figure(figsize=(8,7))
sns.countplot(star_df['class']);
plt.title("Distribution of Target Feature",{'fontsize':20});


Data Spliting:


X = star_df.drop('class',axis=1)
y = star_df['class']

Conver Categorical column into numeric:


from sklearn.preprocessing import LabelEncoder
encoder = LabelEncoder()
y = encoder.fit_transform(y)

from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2)

Data Scaling:


from sklearn.preprocessing import StandardScaler
scalar = StandardScaler()
X_train = scalar.fit_transform(X_train)
X_test = scalar.transform(X_test)

Model Training:


from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
import math
models = {
KNeighborsClassifier(n_neighbors=3):'K-Neighbors Classifier',
SVC():"Support Vector Machine",
RandomForestClassifier():'Random Forest Classifier'
}
for m in models.keys():
m.fit(X_train,y_train)
for model,name in models.items():
print(f"Accuracy Score for {name} is : ",math.floor(model.score(X_test,y_test)),"%")


Random Forest gives higher accuracy on test data. So, we choose it for prediction.
Classification Report for Each Model:
Class 0: Galaxy
Class 1: QSO
Class 2: Star


from sklearn.metrics import classification_report
for model,name in models.items():
y_pred = model.predict(X_test)
print(f"Classification Report for : {name}")
print(classification_report(y_test,y_pred))


Thank You ):


0 comments