Senior PySpark Developer to support the design, development, and maintenance of modernized data pipelines for a large-scale data modernization initiative
S.i. Systems
Toronto, ON-
Number of positions available : 1
- Salary To be discussed
-
Contract job
- Published on November 6th, 2025
-
Starting date : 1 position to fill as soon as possible
Description
Our valued public sector client is seeking Senior PySpark Developer to support the design, development, and maintenance of modernized data pipelines for a large-scale data modernization initiative!
Initial 5-month contract (until March 31, 2026) with a strong possibility of extension. Remote work arrangement within Canada, full-time, Monday to Friday.
The successful candidates will be responsible for developing, testing, and supporting data ingestion and transformation pipelines using PySpark, Python, and AWS-based technologies, following Agile development practices and CI/CD principles. The developers will work closely with technical and business teams to deliver scalable, high-performance data solutions that support enterprise analytics and reporting.
Responsibilities
- Design, develop, and maintain large-scale data processing pipelines using PySpark, Spark SQL, and Python.
- Collaborate with business and technical stakeholders to translate business requirements into technical solutions.
- Develop modular, reusable, and maintainable code following software development best practices.
- Implement automated testing frameworks to ensure data quality and reliability.
- Participate in peer code reviews and apply CI/CD practices using Git-based workflows.
- Work with Airflow or equivalent orchestration tools for pipeline scheduling and automation.
- Develop and maintain ETL mappings, documentation, and data flow diagrams.
- Deploy and monitor data workflows in a cloud-based environment (AWS EMR, Redshift, S3, Lambda).
- Troubleshoot performance issues and optimize Spark jobs for scalability and efficiency.
- Ensure compliance with quality assurance and change management procedures.
Must-Have
- 5+ years of hands-on programming experience in Python and SQL, writing modular, maintainable code.
- 3+ years of strong experience developing PySpark data pipelines for large-scale data processing.
- Solid understanding of Spark DataFrames, Spark SQL, and distributed data processing concepts.
- Practical experience working in AWS Cloud environments (e.g., EMR, Redshift, Lambda).
- Strong knowledge of MySQL or equivalent relational databases.
- Proficiency with Git, unit testing, and release automation.
- Familiarity with Apache Iceberg or similar open table formats.
- Experience with Airflow or equivalent orchestration frameworks.
- Excellent problem-solving and troubleshooting skills, with a proactive, collaborative attitude.
- Strong oral and written communication skills in English.
Nice to Have
- Experience with AWS Cloud Development Kit (CDK) for Python.
- Familiarity with serverless architecture (AWS Lambda, event-driven design).
- Exposure to DevOps automation and continuous integration pipelines.
- Previous experience working on healthcare or public sector data platforms.
Requirements
undetermined
undetermined
undetermined
undetermined
Other S.i. Systems's offers that may interest you