Senior Site Reliability Engineer – AJUA (Formerly mSurvey)

Key Responsibilities:
Component and framework designs supporting the virtualization and orchestration of mSurvey computing infrastructure, from conception and design through testing, deployment and operation
Working on projects that make our network more efficient while sustaining service and component stability, performance and security
Working with our development and system QA teams to develop regression tests and operational monitoring covering new software changes
Troubleshooting, investigating, and remediate service outages and issues. Act as a mentor and escalation point for junior members of the team
Leading incident response teams as necessary to mitigate and deal with adverse events affecting our infrastructure
Work closely with relevant teams to support application deployments, migration designs and critical network rollouts
Understanding, engineering, and maintaining the design dependencies and integrity within client environments and service level expectations
Performance, capacity management, licensing, patching and working to maintain these within defined standards for specific clients/assets for applications installed with client environments
Administer all production, development, test, and training server environments as well as backup and disaster recovery systems
Work with IT and Security to ensure all servers and endpoints comply with relevant guidelines and regulations
Creating the appropriate documentation for our systems, including architecture and network diagrams and support procedures
, you will solve problems of global scale distributed systems that must evolve with a focus on scale, efficiency, reliability and availability using your creative abilities and experience with robust systems design.

Key Responsibilities:
Component and framework designs supporting the virtualization and orchestration of mSurvey computing infrastructure, from conception and design through testing, deployment and operation
Working on projects that make our network more efficient while sustaining service and component stability, performance and security
Working with our development and system QA teams to develop regression tests and operational monitoring covering new software changes
Troubleshooting, investigating, and remediate service outages and issues. Act as a mentor and escalation point for junior members of the team
Leading incident response teams as necessary to mitigate and deal with adverse events affecting our infrastructure
Work closely with relevant teams to support application deployments, migration designs and critical network rollouts
Understanding, engineering, and maintaining the design dependencies and integrity within client environments and service level expectations
Performance, capacity management, licensing, patching and working to maintain these within defined standards for specific clients/assets for applications installed with client environments
Administer all production, development, test, and training server environments as well as backup and disaster recovery systems
Work with IT and Security to ensure all servers and endpoints comply with relevant guidelines and regulations
Creating the appropriate documentation for our systems, including architecture and network diagrams and support procedures

Click here to Apply Online

[yuzo_related]