Other Remote jobs you may be interested in
Upgrade Technician
Senior DevOps Engineer
DevOps Engineer
DevOps Engineer
Senior Platform Engineer
Senior Site Reliability Engineer
NOC / SRE Manager at Tecsys
Job details
Having recognized the advantages of remote work, such as improved employee morale, increased productivity, and positive impacts on both employee wellbeing and the environment, we are proud to be a digital-first company. Our digital-first work environment, combined with our conveniently located offices and collaborative workspaces, provides our team with the freedom and flexibility to work in the most productive way for them.
Tecsys is a fast-growing innovator offering supply chain solutions to industry leading healthcare systems, hospitals, and pharmacy businesses to distributors, retailers, and 3PLs. We work with industry leaders to transform their supply chains through technology. If you thrive on tackling interesting challenges with continuous learning opportunities, then Tecsys could be a good fit for you!
The NOC/SRE Manager is responsible for leading and managing the Network Operations Center (NOC) and Site Reliability Engineering (SRE) teams. This role focuses on maintaining high service reliability, availability, and performance for our products and platform. You will oversee the infrastructure that supports our services, ensuring a seamless experience for our customers through proactive monitoring, incident response, and continuous improvement initiatives.
Key Responsibilities
- Team Leadership & Management: Lead a team of NOC engineers and SREs, providing guidance, mentorship, and career development opportunities. Manage daily assignments and ensure the team operates efficiently and effectively.
- Incident Management: Oversee the incident management process to ensure rapid response and resolution of high-severity incidents. Coordinate with the help desk, technical teams, and stakeholders to manage and resolve incidents effectively.
- Monitoring & Alerting: Ensure high availability of the platform through thoughtful monitoring practices and collaboration with the NOC team. Develop and maintain comprehensive monitoring and alerting strategies to proactively detect and resolve issues.
- Operational Excellence: Drive operational excellence by defining, implementing, and continuously improving processes and technology that promote high standards for service reliability and team efficiency. Establish and enforce best practices for incident management, change management, and problem management.
- Stakeholder Communication: Maintain effective communication with internal and external stakeholders, providing regular updates on the status of ongoing incidents, changes, and improvements. Manage expectations and ensure alignment on priorities and objectives.
- Collaboration with Engineering Teams: Work closely with cloud engineering, support, and security teams to develop and refine processes that are well-documented and executed. Facilitate collaboration to enhance the reliability and security of our platform.
- Project Management: Oversee the planning and execution of infrastructure projects related to NOC/SRE activities. Create and maintain work plans, manage resources, and ensure timely delivery of projects in a fast-paced engineering environment.
- Continuous Improvement: Identify opportunities for automation, optimization, and innovation to enhance the performance and reliability of our systems. Foster a culture of continuous improvement within the NOC/SRE teams.
Qualifications
- Bachelor's degree in a technical discipline or relevant equivalent experience.
- 5+ years of experience managing technical teams in NOC, SRE, DevOps, or Infrastructure domains.
- Proven experience in incident management, monitoring, and service reliability practices.
- Strong organizational, project management, and technical acumen.
- Proficiency in delivering large-scale infrastructure projects and managing cross-functional teams.
- Self-motivated and proactive with strong analytical, troubleshooting, and problem-solving skills.
- Excellent verbal and written communication skills to effectively interact with all levels of the organization.
- Strong facilitation skills, including leading requirements sessions, design meetings, and status updates.
- Knowledge and experience with AWS Cloud Platform.
- Familiarity with tools such as Solarwinds, Datadog, GitLab, Jenkins, Terraform, Ansible, Kubernetes, and Helm is preferred.
- Basic knowledge of Java based development.
- Experience in a SaaS company is preferred.
- ITIL knowledge is preferred
- Strong proficiency in both written and verbal English communication essential for effective correspondence with clients, suppliers, business partners, and colleagues beyond the province of Quebec.
We understand that experience comes in many forms and that careers are not always linear. If you don't meet every requirement in this posting, we still encourage you to apply.
At Tecsys, we are committed to fostering a diverse and inclusive workplace where all employees feel valued, respected, and empowered. We believe that diversity drives innovation and strengthens our ability to deliver exceptional solutions. We welcome and encourage applicants from all backgrounds, experiences, and perspectives to join our team.
Tecsys is an equal opportunity employer. Accommodation is available for applicants selected for an interview.
NB: if you are applying to this position, you must be a Canadian Citizen or a Permanent Resident of Canada, OR, have a valid Canadian Work Permit.
Tecsys provides transformative supply chain solutions that equip our customers to succeed in a rapidly changing omni-channel world. From demand planning to demand fulfillment, Tecsys puts power into the hands of both front-line workers and back office planners and unshackles business leaders so they can see and manage their supply chains like never before.