Software Site Reliability Engineer

Job Summary

CES Corporation is a growing, versatile company in the field of data center infrastructure. Our Intelliflex™ product line is a start to finish solution, purpose-built to support today’s data center.  It includes modular, integrated systems in both air and liquid immersion cooling technologies. We’re looking for talented Software Site Reliability Engineers to diagnose field issues, find creative solutions and stress test our products to take them to the next level to provide high-quality, reliable systems for our clients. The successful candidate will be working with a team of driven, skilled professionals with a passion for software engineering.

CES Corporation is located west of Edmonton, in the North View Business Park. Flexible arrangements can be made to accommodate a hybrid In-Office and Work-From-Home model of work. Due to the requirements for the systems to be tested, including working with physical hardware, some physical presence will be required, and a hybrid Work-From-Home approach can be accommodated. How much a candidate can work from home will depend on how well they can mock out the hardware in our system and isolate its testing from the rest of the system.

Our team works in a comfortable office adjacent to the shop where our manufacturing team builds our IntelliFlex Modular Data Centers in-house. Working hours are standard 40-hour work weeks, with some occasional flex-time for accommodating customer support.  Newcomers will enjoy 3 weeks of vacation per year to start.

Responsibilities and Duties

As a Software Site Reliability Engineer, your duties will include:

  • Working with our lead software engineer to establish baseline quality metrics for our software systems’ Service Level Objectives and design a program to establish practical Service Level Agreements with our customers.
  • Performing data-driven analysis of our systems, both internal and deployed, for identifying, mitigating and eliminating defects.
  • Analyzing testing, release and monitoring processes to identify improvements and optimizations and provide feedback to our lead software engineer.

You will be responsible and accountable for:

  • Ensuring the successful ongoing operation and improvement of our systems for our clients. This means:
    • Recommending changes to engineering processes to prevent recurrence of defects.
    • Providing feedback to the lead software engineer and other relevant stakeholders for maintaining and increasing the reliability and availability of our systems, both in hardware and software.
    • Designing and implementing monitoring processes for our software and hardware at our clients’ sites, along with analyzing operational activities of our systems and providing feedback via recommendations to improve their operation.
    • Diagnosing operational problems, designing remediation strategies, and successfully applying remediations.

Your day-to-day activities will include:

  • Designing, documenting, building, deploying and applying any necessary software infrastructure to support the monitoring processes you will define, in that order.
  • Designing, documenting, building, deploying and applying any necessary software infrastructure to support diagnostic and remediation processes you will define, in that order.
  • Assisting in the commissioning of software and hardware packages for deployment to client sites.
  • A variety of DevOps activities for supporting our internal engineering processes, including but not limited to defining, executing and monitoring Release and configuration pipelines.

You will be an essential member of the team that we will rely on for maintaining and upholding our engineering operations.

Qualifications and Skills

This position is open to Canadian citizens and other applicants legally entitled to work in Canada.

Education:

  • A degree in Computer Engineering or Computing Science from an accredited post-secondary educational institution recognized in Canada is required.

Experience (Required):

  • Must have at least 5 years’ experience in the operation and monitoring of systems that integrate both hardware and software components.
  • Must have at least 3 years’ experience with Microsoft Azure.
  • Must already be intimately familiar with the deployment and operation of Kubernetes and Docker.
  • Must be intimately familiar with Unix and Linux-based operating systems, specifically:
    • Understanding how processes and threads work, and Inter-Process Communication between them.
    • Deeply knowledgeable of how to use and write scripts in Bash, Python, or any other language in which you can demonstrate considerable proficiency.
    • Use of systemd to manage the lifecycle, monitoring and diagnostics of system services.
    • Use and deployment of Infrastructure-as-Code tools like ARM templates, Bicep, Terraform, Pulumi and any others for which you can make a convincing case for adoption.
    • Examination of wire protocols at multiple layers of the technological stack, including HTTPS connections, CAN bus inspection, UART timing, etc.
  • Must be well experienced in SSH and WinRM technologies
    • Must understand the fundamental workings of public and private key encryption
    • Must be capable of effective remote connection management.
  • Must have experience using Time Series databases and visual graphing tools to analyze data sets.
  • Must have an engineering mindset with a drive for delivering systems of exceptional quality.

Nice-To-Haves (Optional):

  • Experience with web-based user interface testing tools and methodologies.

Our office in North View Business Park is located at 28029 108 Avenue Northwest, Acheson, AB T7X 6P7. The successful candidate will require their own transportation as the business park is not serviced by public transportation.

Salary and Benefits

Our salary determinations are made based on a set of internal guidelines and policies. Our offered salary for this position is determined by proficiency and breadth of capability. As increased proficiency is demonstrated in areas of capability and responsibility, transfer into higher salary bands will be available. Benefits including health insurance are provided.

How to Apply:

Apply with a cover letter and resume (one file please) to apply@cescorp.ca.  Let us know how your experience and education will benefit CES Corporation. Links to any open-source projects you’ve worked on are of particular interest. Clearly outline pay and vacation expectations in your cover letter. Please note that candidates who lack full time permanent work experience will not be considered for the position.