Job Descriptions▪ System Monitoring and Incident Response: for implementing monitoring solutions to track system health,
performance, and availability. They proactively monitor systems, identify issues, and respond to incidents
promptly, working to minimize downtime and mitigate impacts.
▪ Post-Incident Analysis: Led incident response efforts, coordinated with cross-functional teams, and
conducted post-incident analysis to identify root causes and implement preventive measures.
▪ Continuous Improvement and Reliability Engineering: SREs drive continuous improvement efforts by
identifying areas for enhancement, implementing best practices, and fostering a culture of reliability
engineering. They participate in post-mortems, conduct blameless retrospectives, and drive initiatives to
improve system reliability, stability, and maintainability.
▪ Collaboration and Knowledge Sharing: SREs collaborate closely with software engineers, operations teams,
and other stakeholders to ensure smooth coordination and effective communication. They share knowledge,
provide technical guidance, and contribute to the development of a strong engineering culture.
▪ Support and maintain configuration management for various applications and systems
▪ Implement comprehensive service monitoring, including dashboards, metrics, and alerts
▪ Define, measure, and meet key service level objectives, such as uptime, performance, incidents, and chronic
problems
▪ Partner with application and business stakeholders to ensure high quality product development and release
▪ Collaborate with the development team to enhance system reliability and performance.
LocationOnsite 100% (The location can be chosen for convenience: 1. Bangkok 2. Chiang Mai.)
Huai Khwang Huai Khwang Bangkok