This recruiter is online.

This is your chance to shine!

Apply Now

Int. Site Reliability Engineer to join our Digital Banking client.

Toronto, ON
  • Number of positions available : 1

  • To be discussed
  • Contract job

  • Starting date : 1 position to fill as soon as possible

Int. Site Reliability Engineer to join our Digital Banking client.


Duration: 6 month contract to start


Location: Hybrid (North York/DT Toronto) - 2-3 days/week

Job Responsibilities:

  • You’ll be responsible for maintaining the production applications and day-to-day operational activities, manage escalations and modify established procedures / approaches to suit specific situations including 24 x 7 support and coordination of recovery efforts.
  • You will run the production environment by monitoring availability and taking a holistic view of system health.
  • Lead Daily team huddles, responsible for incident assignment and ensure timely closure of all customer escalations and problems.
  • Responsible for coaching and monitoring team and help them resolving the complex/critical production incidents/problems.
  • You’ll be responsible for providing investigation and second level support on client issues, technical issues, system/web site outages and questions from all internal and external application by maintaining, prioritization and addressing to respective technology groups and vendors.
  • Lead on-call problem escalation and outage recovery effort, not limited to code fixes in presentation and integration layer, but also provide infrastructure level investigation and support where necessary.
  • Lead post-incident technical retrospect to discover and implement remediation actions.
  • You will improve our suite of software solutions' reliability, quality, and time-to-market.
  • Measure and optimize system performance to push our capabilities forward, get ahead of customer needs, and innovate to improve continually.
  • Participate in defining SLIs, SLOs and SLAs for Enterprise Systems.
  • Gather and analyze metrics from both applications and infrastructure to assist in performance tuning and fault finding.
  • Partner with development teams to improve services through rigorous testing and release procedures.
  • Create sustainable systems and services through automation and process improvements.
  • Monitor multiple application health and discover opportunities to optimize in a continuously growing large complex hybrid environment.

Must haves:

  • 5+ years of experience as a Site Reliability Engineer (SRE)
  • 5+ years of experience Maintaining KPIs, managing CI/CD pipelines, and using monitoring and alerting tools
  • Kubernetes, Splunk, Ansible, Dynatrace, Sumologic, Service now, PagerDuty
  • Strong background in using Java and GCP


Nice to haves:

  • Previous financial services/banking experience
Apply

Requirements

Level of education

undetermined

Work experience (years)

undetermined

Written languages

undetermined

Spoken languages

undetermined