Prasad Chaskar's other Models Reports

Major Concepts


Sign-Up/Login to access Several ML Models and also Deploy & Monetize your own ML solutions for free

Models Home » Generic Models » Predictive Modelling » Stellar Classification

Stellar Classification

Models Status

Model Overview

About Dataset:
The data consists of 100,000 observations of space taken by the SDSS (Sloan Digital Sky Survey). Every observation is described by 17 feature columns and 1 class column which identifies it to be either a star, galaxy or quasar.

  1. obj_ID = Object Identifier, the unique value that identifies the object in the image catalog used by the CAS

  2. alpha = Right Ascension angle (at J2000 epoch)

  3. delta = Declination angle (at J2000 epoch)

  4. u = Ultraviolet filter in the photometric system

  5. g = Green filter in the photometric system

  6. r = Red filter in the photometric system

  7. i = Near Infrared filter in the photometric system

  8. z = Infrared filter in the photometric system

  9. run_ID = Run Number used to identify the specific scan

  10. rereun_ID = Rerun Number to specify how the image was processed

  11. cam_col = Camera column to identify the scanline within the run

  12. field_ID = Field number to identify each field

  13. spec_obj_ID = Unique ID used for optical spectroscopic objects (this means that 2 different observations with the same spec_obj_ID must share the output class)

  14. class = object class (galaxy, star or quasar object)

  15. redshift = redshift value based on the increase in wavelength

  16. plate = plate ID, identifies each plate in SDSS

  17. MJD = Modified Julian Date, used to indicate when a given piece of SDSS data was taken

  18. fiber_ID = fiber ID that identifies the fiber that pointed the light at the focal plane in each observation

Dataset Link:

Import Libraries:

import pandas as pd 
import matplotlib.pyplot as plt
import seaborn as sns
import warnings

Load Dataset:

star_df = pd.read_csv('star_classification.csv')


Drop Unwanted Columns:

star_df = star_df[['alpha', 'delta', 'u', 'g', 'r', 'i', 'z','class','redshift']]


galaxy = star_df[star_df['class']=='GALAXY']
star = star_df[star_df['class']=='STAR']
qso = star_df[star_df['class']=='QSO']

Alpha Vs Redshift:

plt.title("Alpha Vs Redshift for Galaxy",{'fontsize':20});

plt.title("Alpha Vs Redshift for Star",{'fontsize':20});

plt.title("Alpha Vs Redshift for QSO",{'fontsize':20});

Green filter Vs Red Filter:

plt.title("Green filter Vs Red Filter for Galaxy",{'fontsize':20});

plt.title("Green filter Vs Red Filter for Star",{'fontsize':20});

plt.title("Green filter Vs Red Filter for QSO",{'fontsize':20});

Target Feature:

plt.title("Distribution of Target Feature",{'fontsize':20});

Data Spliting:

X = star_df.drop('class',axis=1)
y = star_df['class']

Conver Categorical column into numeric:

from sklearn.preprocessing import LabelEncoder
encoder = LabelEncoder()
y = encoder.fit_transform(y)

from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2)

Data Scaling:

from sklearn.preprocessing import StandardScaler
scalar = StandardScaler()
X_train = scalar.fit_transform(X_train)
X_test = scalar.transform(X_test)

Model Training:

from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
import math
models = {
KNeighborsClassifier(n_neighbors=3):'K-Neighbors Classifier',
SVC():"Support Vector Machine",
RandomForestClassifier():'Random Forest Classifier'
for m in models.keys():,y_train)
for model,name in models.items():
print(f"Accuracy Score for {name} is : ",math.floor(model.score(X_test,y_test)),"%")

Random Forest gives higher accuracy on test data. So, we choose it for prediction.
Classification Report for Each Model:
Class 0: Galaxy
Class 1: QSO
Class 2: Star

from sklearn.metrics import classification_report
for model,name in models.items():
y_pred = model.predict(X_test)
print(f"Classification Report for : {name}")

Thank You ):