Int. Site Reliability Engineer to join our Digital Banking client.
S.i. Systems
Toronto, ON-
Number of positions available : 1
- Salary To be discussed
-
Contract job
- Published on June 19th, 2025
-
Starting date : 1 position to fill as soon as possible
Description
Duration: 6 month contract to start
Location: Hybrid (North York/DT Toronto) - 2-3 days/week
Job Responsibilities:
- You’ll be responsible for maintaining the production applications and day-to-day operational activities, manage escalations and modify established procedures / approaches to suit specific situations including 24 x 7 support and coordination of recovery efforts.
- You will run the production environment by monitoring availability and taking a holistic view of system health.
- Lead Daily team huddles, responsible for incident assignment and ensure timely closure of all customer escalations and problems.
- Responsible for coaching and monitoring team and help them resolving the complex/critical production incidents/problems.
- You’ll be responsible for providing investigation and second level support on client issues, technical issues, system/web site outages and questions from all internal and external application by maintaining, prioritization and addressing to respective technology groups and vendors.
- Lead on-call problem escalation and outage recovery effort, not limited to code fixes in presentation and integration layer, but also provide infrastructure level investigation and support where necessary.
- Lead post-incident technical retrospect to discover and implement remediation actions.
- You will improve our suite of software solutions' reliability, quality, and time-to-market.
- Measure and optimize system performance to push our capabilities forward, get ahead of customer needs, and innovate to improve continually.
- Participate in defining SLIs, SLOs and SLAs for Enterprise Systems.
- Gather and analyze metrics from both applications and infrastructure to assist in performance tuning and fault finding.
- Partner with development teams to improve services through rigorous testing and release procedures.
- Create sustainable systems and services through automation and process improvements.
- Monitor multiple application health and discover opportunities to optimize in a continuously growing large complex hybrid environment.
Must haves:
- 5+ years of experience as a Site Reliability Engineer (SRE)
- 5+ years of experience Maintaining KPIs, managing CI/CD pipelines, and using monitoring and alerting tools
- Kubernetes, Splunk, Ansible, Dynatrace, Sumologic, Service now, PagerDuty
- Strong background in using Java and GCP
Nice to haves:
- Previous financial services/banking experience
Requirements
undetermined
undetermined
undetermined
undetermined
Other S.i. Systems's offers that may interest you