Service Reliability Engineer

Job Title: Service Reliability Engineer
Location: Cardiff Bay, Wales
Salary: £68,000.00 per annum
Department: Group IT and Security
Reports To: Service Reliability Manager

Role objectives:

• Ensure our products are ready for life in Production
• Embed reliability and supportability as features, across the lifecycle of solution development
• Help to guide our engineering team’s transformation
• Raise the bar for engineering quality
• Deliver higher service availability

Personal qualities:

• Trustworthy and quick thinking; be one of the smartest people in the room and likeable
• Optimistic & Resilient; breed positivity and don’t give up on the “right thing”
• Leadership & Negotiation; sell not tell, build support and consensus
• Creativity and High standards; develop imaginative solutions without cutting corners
• Fully rounded; experience of dev, support, security, ops, architecture and sales

Day to day the Service Reliability Engineer will:

• Contribute to Service Readiness Reviews
• Utilise your domain knowledge and technical expertise with a passion for coaching and developing people
• Influence and mentor a wide range of colleagues on building robust and resilient applications that include self-healing and fault tolerance techniques
• Help to improve the performance, reliability and resilience of our internal and external products
• Help our engineering teams resolve priority issues
• Work with architectural team members to ensure that systems are loosely or fully decoupled and have oversight of how systems relate to each other
• Limit the time spent on operational tasks and automate wherever possible
• Lead the engineering activities that enable root causes to be identified, debugged and resolved to prevent recurrence
• Proactively identify the causes of outages that haven’t yet happened

Service Reliability Engineer should have:

• A track record of troubleshooting and resolving issues in live production environments and implementing strategies to eliminate them
• Experience in a technical operations support role
• Solutions architecture experience
• Shell scripting experience
• Proficient in container based environments including Docker and Kubernetes
• Experience of automating infrastructure using “as code” tooling
• Strong OS skills, Windows and Linux
• Solid understanding of relational and NoSQL databases
• Experience in a hybrid cloud based infrastructure
• Understanding of infrastructure services including DNS, DHCP, LDAP, virtualization, server monitoring, cloud services (Azure and AWS)
• Fluency in one or more high-level programming language such as JavaScript or .NET C#
• Knowledge of continuous integration and continuous delivery, testing methodologies, TDD and agile development methodologies
• Strong ability and enthusiasm to learn new technologies in a short time