DataArt’s SRE Center of Competence successfully develops and provides SRE expertise and solutions for our clients.
We are looking for a Senior SRE specialist who will join our team and provide consulting services to our clients and delivery teams.
The Senior SRE expert will participate in sales/pre-sales and discovery, provide consultancy and architecture reviews, and supervise projects during all the stages of development.
We offer an opportunity to grow professionally: lead initiatives, expand your SRE skills and technologies, mentor colleagues, and participate in R& Ds or PoCs.
- Collect and analyze data metrics, traces, and logs from the environment and the application
- Take part in system design consulting, platform management, and capacity planning
- Analyzing the requirements and supporting them from an SRE perspective
- Assist in making decisions regarding the priorities of feature development and reliability improvements based on the current state of the system
- Partner with development teams to improve services through rigorous testing and release procedures
- Programming skills with at least one of any modern programming language
- Experience with containerized environments, Docker, Kubernetes
- Experience managing code, database, infrastructure (networking, operating systems, storage)
- Experience with monitoring frameworks (Grafana, Kibana, Prometheus)
- Experience with IaaC and related tools (e.g. Terraform, CloudFormation)
- Experience with modern CI/CD (e.g. Github Actions)
- Experience with a major Cloud Provider (e.g. AWS, GCP, Azure)
- SRE experience within a service development team for supporting, troubleshooting, and log analysis to meet our service availability and observability
- Experience maintaining Service Level Objectives (SLO) / Service Level Indicators (SLI)
- Good spoken and written English, great communication skills
- Teamwork experience
Nice to have
- Strong knowledge of a scripting language (e.g. Python, Bash)
- Experience with OpenStack
- Strong Linux or Windows system-level analysis capabilities
- Experience optimizing cloud cost and reducing system resource usage by setting clear requirements through efficiency and capacity planning
- Experience with varieties of SaaS operation tools like uptime, Dynatrace, PagerDuty
- Experience in improving documentation on-site reliability measures, either in application documentation or in runbooks, explaining the issues encountered and the solutions implemented
- Experience in a negotiation process within a team or during inter-team communication