This recruiter is online.

This is your chance to shine!

Apply Now

Sr. Cloud Platform Engineer - GPU

What is the opportunity?


Be at the forefront of RBC’s modernization journey by joining the team delivering on-demand access to GPU’s to support machine learning across RBC.  In the initial phases, the current focus is to support Borealis AI; a world-class AI research centre created by RBC, building AI-enabled products using state-of-the-art machine learning.  

As a Cloud Platform Engineer, you would join the team building and operating the product offering with a focus on operational stability, security and customer experience.
Join an innovative technology team that is accelerating cloud native development at RBC!  Take an active leadership role to define and execute on a strategy to drive the benefits of cloud to the massive and diverse RBC community.

This is an opportunity to interact with cutting edge cloud platforms, and to gain top-notch experience in adopting an enterprise to the cloud.  You'll be surrounded by people who are incredibly talented, passionate about cloud computing, and believe that world class service is critical to customer success.


What will you do?


  • Build and operate a platform to deliver access to GPU’s for data scientists to train and operationalize their machine learning models via a SLURM datagrid and Openshift
  • Apply SRE concepts to provide a highly available service leveraging automation to ensure the team and platform can scale to meet incoming demand while adhering to strict SLOs
  • Review and refine Cloud team practices and toolsets to drive improvement in the customer experience and success; act as an escalation point for incidents
  • Maintain services once they are live by measuring and monitoring availability, latency and overall system health
  • Provide mentorship and training to other team members on technologies and processes; drive education and knowledge transfer of design patterns, technical practices, and relevant technologies and tools
  • Drive high standards around incident response practices and policies



What do you need to succeed? 


Must have:


  • 5+ years of experience with development, deployment and support of applications with an active user base; managed through a change management process 
  • 3+ years of experience with development or administration on any cloud platform (e.g. Openshift, AKS, GKE, Tanzu, Cloud Foundry, etc.)
  • Hands-on experience as a Machine Learning Engineer and/or a solid understanding of the model development lifecycle and associated roles
  • Infrastructure automation experience using Terraform and Ansible playbooks
  • Demonstrated knowledge of cloud-native engineering and experience with developing / using CI/CD pipelines through referenceable projects 
  • Bash/Shell scripting knowledge; SCM Tools such as GitHub/Subversion/Gerrit


Nice to have:


  • Python/JAVA/GO programming language is an asset
  • Familiarity with API Documentation tools such as Swagger, RAML is an asset
  • Solid understanding of general networking principles and common protocols
  • Experience in operational / incident management tools such as ServiceNow, PagerDuty, DataDog
  • Monitoring Stack : Dynatrace, Prometheus, Grafana, Zabbix
  • Logging Stack : ELK,  Splunk; Agile Tools : Jira, Confluence, Gliffy, LucidChart, Mural


What’s in it for you?


We thrive on the challenge to be our best, progressive thinking to keep growing, and working together to deliver trusted advice to help our clients thrive and communities prosper. We care about each other, reaching our potential, making a difference to our communities, and achieving success that is mutual.

  • A comprehensive Total Rewards Program including bonuses and flexible benefits, competitive compensation, commissions, and stock where applicable
  • Leaders who support your development through coaching and managing opportunities
  • Ability to make a difference and lasting impact
  • Work in a dynamic, collaborative, progressive, and highperforming team
  • A worldclass training program in financial services
  • 4 weeks vacation



Learn more about RBC Tech Jobs


About RBC
Royal Bank of Canada is Canada’s largest bank, and one of the largest banks in the world, based on market capitalization. We are one of North America’s leading diversified financial services companies, and provide personal and commercial banking, wealth management, insurance, investor services and capital markets products and services on a global basis. We have over 80,000 full- and part-time employees who serve more than 16 million personal, business, public sector and institutional clients through offices in Canada, the U.S. and 37 other countries. For more information, please visit


Join our Talent Community

Stay in-the-know about great career opportunities at RBC. Sign up and get customized info on our latest jobs, career tips and Recruitment events that matter to you.

Expand your limits and create a new future together at RBC. Find out how we use our passion and drive to enhance the well-being of our clients and communities at


Inclusion and Equal Opportunity Employment
RBC is an equal opportunity employer committed to diversity and inclusion. We are pleased to consider all qualified applicants for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, age, disability, protected veterans status, Aboriginal/Native American status or any other legally-protected factors. Disability-related accommodations during the application process are available upon request.


City:  Toronto
Address:  330 Front Street West, 10th Floor
Work Hours/Week:  37.5
Work Environment:  Office
Employment Type:  Permanent
Career Level:  Experienced Hire/Professional
Pay Type:  Salary + Variable Bonus
Required Travel(%):  0-25
Exempt/Non-Exempt:  N/A
People Manager:  No
Application Deadline:  11/12/2021
Req ID:  347577

Ad Code(s):  

Read more


Level of education



In progress

Work experience (years)


Written languages


Spoken languages