Ce recruteur est en ligne!

Voilà ta chance d'être vu en premier!

Postuler maintenant

Int. Site Reliability Engineer to join our Digital Banking client.

Toronto, ON
  • Nombre de poste(s) à combler : 1

  • À discuter
  • Emploi Contrat

  • Date d'entrée en fonction : 1 poste à combler dès que possible

Int. Site Reliability Engineer to join our Digital Banking client.


Duration: 6 month contract to start


Location: Hybrid (North York/DT Toronto) - 2-3 days/week

Job Responsibilities:

  • You’ll be responsible for maintaining the production applications and day-to-day operational activities, manage escalations and modify established procedures / approaches to suit specific situations including 24 x 7 support and coordination of recovery efforts.
  • You will run the production environment by monitoring availability and taking a holistic view of system health.
  • Lead Daily team huddles, responsible for incident assignment and ensure timely closure of all customer escalations and problems.
  • Responsible for coaching and monitoring team and help them resolving the complex/critical production incidents/problems.
  • You’ll be responsible for providing investigation and second level support on client issues, technical issues, system/web site outages and questions from all internal and external application by maintaining, prioritization and addressing to respective technology groups and vendors.
  • Lead on-call problem escalation and outage recovery effort, not limited to code fixes in presentation and integration layer, but also provide infrastructure level investigation and support where necessary.
  • Lead post-incident technical retrospect to discover and implement remediation actions.
  • You will improve our suite of software solutions' reliability, quality, and time-to-market.
  • Measure and optimize system performance to push our capabilities forward, get ahead of customer needs, and innovate to improve continually.
  • Participate in defining SLIs, SLOs and SLAs for Enterprise Systems.
  • Gather and analyze metrics from both applications and infrastructure to assist in performance tuning and fault finding.
  • Partner with development teams to improve services through rigorous testing and release procedures.
  • Create sustainable systems and services through automation and process improvements.
  • Monitor multiple application health and discover opportunities to optimize in a continuously growing large complex hybrid environment.

Must haves:

  • 5+ years of experience as a Site Reliability Engineer (SRE)
  • 5+ years of experience Maintaining KPIs, managing CI/CD pipelines, and using monitoring and alerting tools
  • Kubernetes, Splunk, Ansible, Dynatrace, Sumologic, Service now, PagerDuty
  • Strong background in using Java and GCP


Nice to haves:

  • Previous financial services/banking experience
Apply

Exigences

Niveau d'études

non déterminé

Années d'expérience

non déterminé

Langues écrites

non déterminé

Langues parlées

non déterminé