Hashwanth Gogineni's other Models Reports

Major Concepts

 

Sign-Up/Login to access Several ML Models and also Deploy & Monetize your own ML solutions for free

Models Home » Domain Usecases » Health Care and Pharmaceuticals » Mental illness Prediction

Mental illness Prediction

Models Status

Model Overview

Mental illness


Mental illness, often known as mental health issues, refers to a wide range of conditions that affect your emotions, thoughts, and behaviour. Mental illnesses include depression, anxiety disorders, schizophrenia, eating problems, and addictive behaviours.


Many people experience mental problems from time to time. A mental health disorder becomes a mental disease when persistent signs and symptoms cause frequent stress and impede your ability to function.


Mental illness can make you sad and cause problems in your daily life, such as school, work, or relationships. Symptoms are often managed with a mix of medications and talk therapy (psychotherapy).


 


Mental illness Symptoms


Mental disease can manifest itself in a variety of ways. Symptoms of mental illness can alter emotions, attitudes, and behaviours.


Here are some instances of warning signs and symptoms:



  • Feeling sad or low

  • Confusion in thoughts or a loss of concentration

  • Excessive anxieties or fears, as well as severe guilt emotions

  • Extreme mood swings with highs and lows

  • Friendships and hobbies are being cut off

  • Significant exhaustion, a lack of energy, or difficulty sleeping

  • Detachment from reality (delusions), paranoia, or hallucinations are all examples of delusions

  • Inability to deal with day-to-day issues or stress

  • Having difficulty comprehending and responding to circumstances and people

  • Problems with alcohol or drug use

  • Major changes in eating habits

  • Sex drive changes

  • Excessive anger, hostility, or violence

  • Suicidal thinking


Physical difficulties, such as stomach discomfort, back pain, headaches, or other inexplicable aches and pains, can sometimes indicate mental health.


Why Mental Illness Prediction?


The project can be useful to Tech companies to analyze and solve employees' mental issues.


 


Dataset


The data comes from a 2014 poll that looked at attitudes about mental health in the workplace and the prevalence of mental health issues. 


This dataset contains the following data:



  • Timestamp

  • Age

  • Gender

  • Country

  • State-If you live in the United States, which state or territory do you live in?

  • Self employed-Are you self-employed?

  • Family history-Do you have a family history of mental illness?

  • Treatment-Have you sought treatment for a mental health condition?

  • Work interfere-If you have a mental health condition, do you feel it interferes with your work?

  • No Employees-How many employees do your company or organization have?

  • Remote_work-Do you work remotely or at least 50% of the time?

  • tech_company-Is your employer primarily a Tech company?

  • Benefits-Does your employer provide mental health benefits for you?

  • care_options-Are you familiar with your company's mental health care options?

  • wellness_program-Have you ever had a conversation with your boss about mental health as part of an employee wellness program?

  • seek_help-Does your employer provides resources to learn more about mental health issues and how to seek help?

  • Anonymity-Is your anonymity protected if you choose to take advantage of mental health or substance abuse treatment resources?

  • Leave-How easy is it for you to take medical leave for a mental health condition?

  • Mental health consequence-Do you think discussing a mental health issue with your employer would have negative consequences?

  • physhealthconsequence-Do you think that discussing a physical health issue with your employer would have negative consequences?

  • Coworkers-Would you be willing to discuss a mental health issue with your coworkers?

  • Supervisor-Would you be willing to discuss a mental health issue with your direct supervisor(s)?

  • Mental health interview-Would you bring up a mental health issue with a potential employer in an interview?

  • physhealthinterview-Would you bring up a physical health issue with a potential employer in an interview?

  • mentalvsphysical-Do you feel that your employer takes mental health as seriously as physical health?

  • obs_consequence-Have you heard of or observed negative consequences for coworkers with mental health conditions in your workplace?

  • comments-Any additional comments





Random Forest


A random forest is a machine learning approach for solving classification and regression issues. 


It uses ensemble learning, a technique for solving complicated problems by combining several classifiers.


Many decision trees make up a 'random forest' algorithm. 


Bagging/bootstrap aggregation is used to train the 'forest' formed by the random forest method. 


Bagging is an algorithm that increases the accuracy of machine learning methods by grouping them.


Random forest algorithm determines the output based on decision tree predictions. 


It forecasts by averaging or averaging the outputs of various trees. 


The precision of the result improves as the number of trees grows.


The random forest method overcomes the drawbacks of a decision tree algorithm. 


It reduces dataset overfitting problems and improves precision. 


It generates forecasts without requiring a large number of package setups (like sci-kit-learn).


 


Why Random Forest?


The following are a few reasons why we should utilize the Random Forest algorithm:



  • It takes less training time as compared to other algorithms.

  • It predicts output with high accuracy. Even for the large dataset, it runs efficiently.

  • It can also maintain accuracy when a large proportion of data is missing.


 


Understanding Code


First, let us import the required libraries for the project.


import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelEncoder
from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn.metrics import mean_squared_error as mse
from sklearn.metrics import r2_score
import joblib
import pickle


And now load the data into the system.


df=pd.read_csv("data.csv")


Also, let us have a look at a few important visualizations of our data.


from collections import Counter

country_count = Counter(df['Country'].dropna().tolist()).most_common(10)
country_idx = [country[0] for country in country_count]
country_val = [country[1] for country in country_count]
fig,ax = plt.subplots(figsize=(8,6))
sns.barplot(x = country_idx,y=country_val ,ax =ax)
plt.title('Top ten country')
plt.xlabel('Country')
plt.ylabel('Count')
ticks = plt.setp(ax.get_xticklabels(),rotation=90)


import seaborn as sns

sns.countplot(df['treatment'])
plt.title('Treatement Distribution')



Coming to the 'Data Preprocessing' part, let us search for missing values in the data.


df.isnull().sum()


As you can see, missing values exist in our data.


df['work_interfere'] = df['work_interfere'].fillna('Don\'t know' )
print(df['work_interfere'].unique())

df['self_employed'] = df['self_employed'].fillna('No')
print(df['self_employed'].unique())

df.drop(["Timestamp", "comments", "state"], axis = 1, inplace = True)

As you can see I dropped 'Timestamp', 'comments' and 'state' features as there are a lot of missing values in them.

Now let us encode the categorical values to feed the data into the model.


from sklearn import preprocessing

categorical = ['Gender', 'Country', 'self_employed', 'family_history',
'treatment', 'work_interfere', 'no_employees', 'remote_work',
'tech_company', 'benefits', 'care_options', 'wellness_program',
'seek_help', 'anonymity', 'leave', 'mental_health_consequence',
'phys_health_consequence', 'coworkers', 'supervisor',
'mental_health_interview', 'phys_health_interview',
'mental_vs_physical', 'obs_consequence']
for feature in categorical:
le = preprocessing.LabelEncoder()
df[feature] = le.fit_transform(df[feature])

As you can see, I used the 'Label encoder' function to encode our data.

Let us split the data using the "train_test_split" function into training and testing sets.


Y = df['treatment']
X = df.drop('treatment', axis = 1)

X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size = 0.2)

 


Finally, we need to scale our data before feeding our data into a model.


from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()

X_train = pd.DataFrame(scaler.fit_transform(X_train), columns = X.columns)

X_test = pd.DataFrame(scaler.transform(X_test), columns = X.columns)

As you can see, I used the "StandardScaler" function to scale the data.

Now, let us dive deep into the modelling part of the project.


from sklearn.ensemble import RandomForestClassifier


rf_model= RandomForestClassifier()
rf_model.fit(X_train, y_train)
y_pred = rf_model.predict(X_test)
rf_model.score(X_train, y_train)

I used the "Random Forest" model to solve the problem.
As you can see, I used the "RandomForestClassifier" function to use the "Random Forest" algorithm.

Now let us have a look at the model's performance report.


from sklearn.metrics import classification_report
class_names = ['Mental illness Treatment is not required', 'Mental illness Treatment is required']
print(classification_report(y_test, y_pred, target_names=class_names))


As you can see the model performed well.


Thank you for your time.


2 comments