Ce recruteur est en ligne!

Voilà ta chance d'être vu en premier!

Postuler maintenant

Senior Site Reliability Engineer (SRE) with deep expertise in Red Hat OpenShift and infrastructure automation for our banking client

Toronto, ON
  • Nombre de poste(s) à combler : 1

  • À discuter
  • Emploi Contrat

  • Date d'entrée en fonction : 1 poste à combler dès que possible

We are seeking an experienced Site Reliability Engineer (SRE) with deep expertise in Red Hat OpenShift and infrastructure automation. The ideal candidate will have hands-on experience deploying, maintaining, and optimizing OpenShift clusters in both on-premise and cloud environments.

This role requires a strong understanding of platform reliability, networking, GitOps practices, and enterprise security standards. The successful candidate will work closely with development and infrastructure teams to ensure seamless CI/CD processes, high availability, and efficient incident response.


Location - Downtown Toronto

Work Mode - Mostly remote, some onsite work

Duration - ASAP to Feb 27, 2026 with possibility of extension


Must-Have

  • 8+ years of experience in infrastructure, DevOps, or SRE roles, including 3+ years focused on OpenShift administration.
  • Proven experience with OpenShift installation, configuration, and lifecycle management in both on-prem and cloud environments.
  • Expertise with Terraform and Ansible for automation and configuration management.
  • Strong hands-on experience with ArgoCD and GitOps workflows.
  • Working knowledge of Red Hat ACM for managing multiple OCP clusters.
  • Proficiency in F5 load balancer configuration and networking fundamentals (DNS, routing, firewalls, subnets).
  • Experience building observability stacks (Prometheus, Grafana, ELK, Alertmanager).
  • Solid understanding of TLS/mTLS, certificate management, and security hardening.
  • Proven track record in incident response, RCA, and postmortem analysis.
  • Experience defining and managing SLIs/SLOs for production services.
  • Familiarity with CI/CD pipelines, Kubernetes-native tools, and container orchestration principles.
  • Strong scripting skills (e.g., Bash, Python, or Go).


Responsibilities

  • Install, configure, upgrade, and administer OpenShift clusters (OCP) in on-premise and cloud environments.
  • Manage OCP internal networking, ingress, egress, and cluster services.
  • Configure and integrate LDAP authentication and access management.
  • Implement TLS and MTLS encryption, and manage certificate lifecycle for secure communications.
  • Implement GitOps workflows using ArgoCD for continuous delivery and environment consistency.
  • Manage multi-cluster orchestration using RedHat Advanced Cluster Management (ACM).
  • Automate platform and application provisioning using Terraform and Ansible.
  • Configure and maintain F5 LTM load balancers.
  • Configure and manage DNS, networking, and subnets.
  • Build and manage monitoring, logging, and alerting frameworks (e.g., Prometheus, Grafana, ELK).
  • Define and enforce SLIs/SLOs and error budgets for services running on OCP.
  • Lead incident response, root cause analysis (RCA), and postmortems.
  • Build automation for self-healing, scaling, and zero-touch operations.
  • Ensure high availability, disaster recovery, and failover strategies are implemented.
  • Secure platform and workloads following enterprise security standards.
  • Support application deployments and CI/CD pipelines on OpenShift.
  • Troubleshoot networking, cluster, and deployment issues end-to-end.
  • Apply SRE best practices to improve reliability, scalability, and performance.
  • Collaborate with development and platform teams to optimize system operations.
Apply

Exigences

Niveau d'études

non déterminé

Années d'expérience

non déterminé

Langues écrites

non déterminé

Langues parlées

non déterminé