Lead Site Reliability Engineer
Company: JPMorgan Chase & Co.
Location: Plano
Posted on: April 1, 2026
|
|
|
Job Description:
Description Assume a critical role in defining the future of a
globally recognized firm and have a direct and significant effect
in a realm tailored for top achievers in site reliability. As a
Lead Site Reliability Engineer at JPMorgan Chase within the
Corporate Sector, you hold a leadership role in your team,
demonstrate strong knowledge across multiple technical domains, and
advise others on the technical and business issues facing them.
Take lead and conduct resiliency design reviews, break up complex
problems into digestible work for other engineers, act as a
technical lead for medium to large-sized products, and provide
advice and mentoring to other engineers. Job responsibilities
Manage incident response to swiftly mitigate business impacts by
coordinating cross-functional teams. Serve as the primary point of
contact during major incidents, demonstrating the ability to
quickly identify and resolve issues to prevent financial losses.
Participate in 24x7 support coverage as required. Oversee, track,
and validate all changes to the Production and Disaster Recovery
environments. Lead initiatives to enhance the reliability and
stability of team applications and platforms, utilizing data-driven
analytics to improve service levels. Document and share knowledge
within the organization through internal forums and communities of
practice. Collaborate with team members to identify comprehensive
service level indicators and work with stakeholders to establish
reasonable service level objectives and error budgets with
customers. Provide ongoing guidance, tools, and solutions to
support the firm's growth. Champion and demonstrate site
reliability culture and practices, exerting technical influence
throughout the team. Exhibit a high level of technical expertise in
one or more domains, proactively identifying and resolving
technology-related bottlenecks. Strive to become an expert on the
applications and platforms under your purview, understanding their
interdependencies and limitations. Required qualifications,
capabilities, and skills Formal training or certification on
software engineering concepts and 5 years of applied experience
Deep proficiency in reliability, scalability, performance,
security, enterprise system architecture, toil reduction, and other
site reliability best practices with the ability to implement these
practices within an application or platform Fluency in at least one
programming language such as (e.g., Python, Java Spring Boot, .Net,
etc.) Deep knowledge of software applications and technical
processes with emerging depth in one or more technical disciplines
Proficiency and experience in observability such as white and black
box monitoring, SLO alerting, and telemetry collection using tools
such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, etc.
Proficiency and experience in Cloud Platform (AWS) infrastructure
and setting up monitoring / observability for application migrated
to cloud platforms. Proficiency in continuous integration and
continuous delivery tools (e.g., Jenkins, GitLab, Terraform, etc.)
Experience with container and container orchestration (e.g., ECS,
Kubernetes, Docker, etc.) Experience with troubleshooting common
networking technologies and issues Ability to identify and solve
problems related to complex data structures, algorithms and new
technologies and if needed self-educate on new technology Ability
to expand and collaborate across different levels and stakeholder
groups Preferred qualifications, capabilities, and skills Ability
to identify new technologies and relevant solutions to ensure
design constraints are met by the software team Ability to initiate
and implement ideas to solve business problem Experience building
dashboards with products such as Grafana Prior experience in both
Systems Engineering and Software Development
Keywords: JPMorgan Chase & Co., Plano , Lead Site Reliability Engineer, IT / Software / Systems , Plano, Texas