Description
A career as a senior site reliability developer in the Cybersecurity - Data Protection team at National Bank means acting as a specialist in the operational reliability of data protection platforms. This role allows you to have a significant impact on the organization through your skills in software engineering, SRE practices and automation.
Your role
• Ensure the reliability, availability and performance of critical data protection platforms throughout their lifecycle.
• Apply SRE principles and practices, including defining and monitoring SLOs/SLIs, managing error budgets and contributing to continuous improvement initiatives.
• Deploy and maintain strong observability practices using specialized tools.
• Participate in major incident management by coordinating interventions, analyzing root causes and producing post‑mortems.
• Collaborate with development, security and platform teams to integrate reliability early in solution design.
• Automate repetitive activities and explore the integration of AI agents to optimize operations.
Your team
The Data Protection Team is composed of specialists working in an agile, proactive and collaborative manner to seize opportunities, enhance technological resilience and evolve operational practices.
Within this sector, you are part of a large and dedicated team reporting to the manager responsible for operational excellence and critical platforms. Our team stands out through its collaborative spirit, openness to innovation and commitment to advancing SRE practices.
We aim to offer you maximum flexibility to support your quality of life, including hybrid work options and a flexible schedule.
National Bank values continuous development and internal mobility. Our personalized training programs, based on learning by doing, allow you to master your role and develop new areas of expertise. Tools such as the Data Academy, language training, the Harvard Learning Center, as well as coaching and mentoring support are available to you at all times.
Requirements
• Hold a bachelor’s degree in computer science or a related field, or possess equivalent hands‑on experience.
• Have 7 to 10 years of relevant experience in SRE, DevSecOps, service reliability or critical platform operations.
• Demonstrate expertise in operating highly available distributed systems and performance analysis.
• Show proven experience with SLO/SLI definition and monitoring, as well as error budget management.
• Master observability and troubleshooting tools such as Datadog, Splunk or equivalents.
• Possess solid understanding of cloud environments, particularly AWS.