Role Overview:
As an MLOps Engineer, you will work collaboratively with Data Scientists and Data engineers to deploy and operate systems. You’ll help automate and streamline our operations and processes. You’ll build and maintain tools for deployment, monitoring, and operations. You’ll also troubleshoot and resolve issues in development, testing, and production environments
Responsibilities:
- Operate and maintain systems supporting the provisioning of new clients, applications, and features.
- Day-to-day monitoring of the Production service delivery environment to ensure all services and applications are operating optimally and SLAs are met.
- Software deployment and configuration management in both QA and Production environments.
- Collaborate with Data Scientists and Data Engineers on feature development teams to containerize and build out deployment pipelines for new modules
- Design, build and optimize applications’ containerization and orchestration with Docker and Kubernetes and AWS or Azure
- Automate applications and infrastructure deployments.
- Produce build and deployment automation scripts to integrate between services
- Be a subject matter expert on DevOps practices, CI/CD and Configuration Management with assigned engineering team
- Experience with one of the cloud computing platforms: Google Cloud, Amazon Web Service, Azure, Kubernetes.
- Experience in MLFlow, Qubeflow, MLTracking, MLExperiments
- Experience in big data technologies preferred: Hadoop, Hive, Spark, Kafka.
- Knowledge of machine learning frameworks: Tensorflow, Caffe/Caffe2, Pytorch, Keras, MXNet, Scikit-Learn.
Skills:
- At least 3 years’ experience working with cloud-base services and DevOps concepts, tools and practices
- Extensive experience with Unix/AIX/Linux environments
- Experience with Kubernetes or Docker Swarm
- Experience working in cross-functional Agile engineering teams
- Familiarity with standard concepts and technologies used in CI/CD build, deployment pipelines
- Experience with scripting and coding using Python, Shell
- Experience with configuration using tools such as Chef, Ansible
- Experience with automation servers such as Jenkins, CloudBees, Travis
- Experience with logging tools such as Splunk, ElasticSearch, Kibana, Logstash
- Experience with monitoring tools such as Munin, Prometheus, Grafana, AlertManager, PagerDuty
- Big data technical stack experience is a plus such as HDFS, Spark, Ambari, ZooKeeper, Kafka
- Excellent Written and Verbal Communication Skills
- Ability to collaborate effectively with highly technical resources in a fast-paced environment
- Ability to solve complex challenges/problems and rapidly deliver innovative solutions
Fractal Analytics provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.