Primary Responsibilities:
- We are looking for a data scientist who is eager to tackle the challenges of extracting insights from vast amounts of EHR data originating from multiple sources
- You will work with a diverse set of Optum researchers, clinicians, internal and external clients to define the projects and see them through to completion that cover a range of clinical focus
- You would be participating in the development of our end-to-end NLP pipeline, building descriptive and predictive analytics and producing research validations
- Specifically, as a member of our collaborative team you will be working with structured and unstructured data to explore the data, develop models, and performing research analytics
- You will have the opportunity to work with open-source distributed data processing frameworks, such as Apache Spark and build scalable machine learning applications and deploy them in production
- The right candidate will have strong prioritization skills, ability to manage ad hoc requests in parallel with ongoing projects
If you have the experience, curiosity and entrepreneurial spirit to tackle these challenges, we want to talk to you.
You'll be rewarded and recognized for your performance in an environment that will challenge you and give you clear direction on what it takes to succeed in your role as well as provide development for other roles you may be interested in.
Required Qualifications:
- Bachelor’s degree in Science, Technology, Engineering, Math, or Linguistics or equivalent degree area
- 3+ years of experience as a hands-on data scientist, non-managerial role and executing on data science, analytics, or programming projects (academic projects may be considered to equal the required 1+ year of experience. Projects need to be clearly outlined in the resume)
- 3+ years of experience in machine learning, predictive modeling, or natural language processing
- 3+ years of experience in Python
- Experience with leveraging best practices conducting advanced analytics projects
Preferred Qualifications:
- Experience with Pandas, Scikit-learn, TensorFlow, Spark NLP, Pandas, or spaCy
- Experience with open-source distributed data processing frameworks, such as Spark
- Experience presenting technical results to broad audiences with effective visualizations
- Experience in Biostatistics or related field
- Experience with SQL or relational databases
- Familiarity with EHR data and standards