Job Description
Details: DevOps Cloud Engineer (
Systems Operations Engineer 4) RTO schedule - 3 Days in Office Hybrid
Approved Locations: IRVING, TX/ PHOENIX, AZ/CHARLOTTE, NC 28202 Top Skills - Required Qualifications • 5-7+ years of experience in DevOps, SRE, Platform Engineering, or Cloud Engineering roles
• Hands-on experience with Harness CD
• Strong experience with Kubernetes/OpenShift, Linux, cloud services and deployment best practices
• Solid understanding of CI/CD workflows and software release automation
SRE & Automation • Experience applying SRE concepts such as SLIs/SLOs, error budgets, and operational maturity improvements
• Strong automation/scripting skills using Python, Bash, or PowerShell
• Infrastructure as Code experience with Terraform, Ansible, Helm, or equivalent tooling
Observability & Troubleshooting • Experience with observability tools (Prometheus, Grafana, Splunk, ELK, AppDynamics, etc.)
• Strong troubleshooting skills across container, OS, networking, platform, and cloud technology layers
Preferred Qualifications • Experience supporting CD platforms at enterprise scale (hundreds of teams, multi-region deployments)
• Experience in cloud-native and hybrid cloud environments (Azure, GCP)
• Familiarity with DevSecOps practices, policy automation frameworks, and governance models
• Experience supporting complex upgrades, platform migrations, or modernization projects
Key Responsibilities:
Platform Ownership & Reliability (SRE): -Support end-to-end reliability, availability, and performance of the Harness CD platform across non-prod, prod, and BCP environments
-Maintain and report on SLIs, SLOs, error budgets, deployment success rates, and platform health metrics
-Lead incident response, troubleshooting, and RCA for deployment failures, delegate outages, or platform performance issues
-Identify and remediate scaling, performance, and capacity constraints across delegates, pipelines, Kubernetes clusters, and cloud integrations
Automation & Engineering Excellence: -Develop automation for provisioning, configuration, scaling, upgrades, and maintenance of Harness components
-Build Infrastructure as Code (IaC) using Terraform, Ansible, Helm, or equivalent tools
-Automate common operational tasks including delegate lifecycle, cluster onboarding, secret rotation, and pipeline validation
-Reduce manual work by implementing resilient, repeatable, and self-service automation workflows
DevOps & CI/CD Integration: -Maintain and enhance Harness integrations with GitHub, Jenkins, Azure DevOps, Kubernetes/OpenShift clusters, and cloud providers
-Ensure an efficient developer experience through well-optimized pipelines and reliable deployment mechanisms
-Partner with DevOps teams to optimize orchestration strategies (blue/green, canary, rolling)
-Work with Security teams to embed DevSecOps controls such as policy enforcement, governance pipelines, and security checks
Observability & Monitoring: -Implement and maintain monitoring, logging, dashboards, and alerting for all Harness components
-Use Splunk, Prometheus, Grafana, AppDynamics, or similar tools to build actionable alerts
-Detect and escalate issues such as delegate saturation, pipeline slowdowns, API failures, and Kubernetes resource constraints
-Support proactive monitoring to reduce mean time to detection and resolution
Modernization & Continuous Improvement: -Assist with Harness upgrades, hotfixes, patching, and vendor-recommended lifecycle activities
-Contribute to modernization efforts including containerization, cloud-native deployments, and multi-cloud expansion
-Support resiliency improvements such as BCP validation, backup verification, and BCP readiness
-Evaluate new Harness features, modules on platform capabilities for enterprise usage
Technical Leadership: -Act as a technical SME for Harness platform operations and enhancements
-Provide platform guidance, documentation, architecture details, and runbook development
-Partner with senior engineers to improve standards, automation patterns, and operational excellence
Job Tags
Work at office