This recruiter is online.

This is your chance to shine!

Apply Now

Senior PySpark Developer to support the design, development, and maintenance of modernized data pipelines for a large-scale data modernization initiative

Toronto, ON
  • Number of positions available : 1

  • To be discussed
  • Contract job

  • Starting date : 1 position to fill as soon as possible

Our valued public sector client is seeking Senior PySpark Developer to support the design, development, and maintenance of modernized data pipelines for a large-scale data modernization initiative!

Initial 5-month contract (until March 31, 2026) with a strong possibility of extension. Remote work arrangement within Canada, full-time, Monday to Friday.

The successful candidates will be responsible for developing, testing, and supporting data ingestion and transformation pipelines using PySpark, Python, and AWS-based technologies, following Agile development practices and CI/CD principles. The developers will work closely with technical and business teams to deliver scalable, high-performance data solutions that support enterprise analytics and reporting.

Responsibilities

  • Design, develop, and maintain large-scale data processing pipelines using PySpark, Spark SQL, and Python.
  • Collaborate with business and technical stakeholders to translate business requirements into technical solutions.
  • Develop modular, reusable, and maintainable code following software development best practices.
  • Implement automated testing frameworks to ensure data quality and reliability.
  • Participate in peer code reviews and apply CI/CD practices using Git-based workflows.
  • Work with Airflow or equivalent orchestration tools for pipeline scheduling and automation.
  • Develop and maintain ETL mappings, documentation, and data flow diagrams.
  • Deploy and monitor data workflows in a cloud-based environment (AWS EMR, Redshift, S3, Lambda).
  • Troubleshoot performance issues and optimize Spark jobs for scalability and efficiency.
  • Ensure compliance with quality assurance and change management procedures.

Must-Have

  • 5+ years of hands-on programming experience in Python and SQL, writing modular, maintainable code.
  • 3+ years of strong experience developing PySpark data pipelines for large-scale data processing.
  • Solid understanding of Spark DataFrames, Spark SQL, and distributed data processing concepts.
  • Practical experience working in AWS Cloud environments (e.g., EMR, Redshift, Lambda).
  • Strong knowledge of MySQL or equivalent relational databases.
  • Proficiency with Git, unit testing, and release automation.
  • Familiarity with Apache Iceberg or similar open table formats.
  • Experience with Airflow or equivalent orchestration frameworks.
  • Excellent problem-solving and troubleshooting skills, with a proactive, collaborative attitude.
  • Strong oral and written communication skills in English.

Nice to Have

  • Experience with AWS Cloud Development Kit (CDK) for Python.
  • Familiarity with serverless architecture (AWS Lambda, event-driven design).
  • Exposure to DevOps automation and continuous integration pipelines.
  • Previous experience working on healthcare or public sector data platforms.



Apply

Requirements

Level of education

undetermined

Work experience (years)

undetermined

Written languages

undetermined

Spoken languages

undetermined