SITE RELIABILITY ENGINEER
Company: Tanisha Systems, Inc.
Location: Plano
Posted on: September 28, 2024
Job Description:
Role: Site Reliability Engineer Location: Plano, TX What you
will be doing Sr Site Reliability Engineer with expertise in AWS
Cloud Engineering, 5G RAN Engineering, Network Design and
Engineering, 5G Core Engineering. As an integral part of the Site
Reliability and Observability Engineering team, you will be
responsible for understanding how the network, applications, tools,
and processes relate together that enables the Network Operations
Center (NOC) to quickly resolve network events. You will focus on
increasing the network and service availability through automation,
tools, and processes in your given area of expertise. You are just
as comfortable designing, implementing, and troubleshooting
technical issues escalated from Tier 1 and Tier 2 in the NOC as you
are designing automation and orchestration solutions. As new or
enhancements to tools, products, and services are introduced into
the 5G Network, you will work closely with product owners to
quickly understand and identify the benefits/drawbacks or use-case
for the NOC; then work to integrate and operationalize that
product/tool into the NOC Support teams. In this role, you
will:
- Drive solid system architecture and guide and mentor
well-disciplined code development practices (i.e. Repository
procedures for proper code check-out/in);
- Manage Safe feature branching strategies and versioning
control;
- Develop proper work-flow for team code review and deliver well
vetted and tested products.
- Will oversee/author application testing procedures; SW
deployment packaging and release coordination with customers;
Monitoring of infrastructure, in/outbound processes, web services,
application health;
- Implement feature tracking, bug fixes.
- Define standards that produce enterprise quality software that
is robust, scalable, and maintainable for the entire lifecycle of
the project and business.
- Develop and maintain a catalog of reliability scripts, tools,
and libraries that can be leveraged for common instrumentation,
automation, and operational needs
- Monitoring and analyze network performance, providing
automation and orchestration insight for identifying or mitigating
network and service-related events
- Analyze data to diagnose and identify root causes for
network-specific events within our Domain-of-Responsibility
- Act as a Tier 3 escalation for issues from Tier 1 or Tier 2
related to our observability platform
- Collaborate with Vendors and internal technical teams to
understand and incorporate technical solutions.
- Define and implement strategies for network automation to
improve operational efficiencies
- Manage a (CI/CD) pipeline for network development and
testing
- Participate in the documentation of application/network flows
for various support needs
- Provide technical guidance, training and mentorship to members
of the NOC & engineering teams we support with our platforms
- Develop and improve instrumentation for monitoring and logging
the health and availability of services
- Participate in Major Incident bridges that involve multiple
teams/participants and the resulting formal RCA reports. The Skills
and Experience You Bring Sr Site Reliability Engineer leads the
solution to any problem or issue with an automation-first mindset,
utilizing a crawl/walk/run approach towards implementation.
Requirements for the position (Must Haves)
- Bachelor s Degree in Computer Science, IT-related field, or
equivalent experience
- Require at least 3+ years of scripting experience in Python,
Javascript,
- 3+ years of event-driven engineering with a strong preference
for candidates with experience in AIOps using AI/Client
platforms/tools
- 3+ years Experience utilizing Source Code Management, CI/CD
tools, and Automation tools such as Git/Gitlab, Terraform, Ansible,
Chef, Puppet, Jenkins
- 3+ years Experience building CI/CD pipelines, version control,
and system testing with Gitlab and Jenkins.
- 3+ Years Experience OS level containerization
virtualization/techniques using Docker, WindRiver, VMware,
Kubernetes and Rancher for microservices deployment.
- 3+ Years Experience Familiar with cloud platforms such as AWS,
Azure, and Google Cloud Platform
- 5+ years of technical, hands-on experience in one or more of
the following areas: AWS Cloud Engineering, 5G ORAN, 5G Core,
and/or Data and Transport Engineering
- A passion for taking ownership of your work and delivering
results
- Habitual code branching, versioning, feature lifecycle
management, testing, packaging and deployments
- Voracious need to document code and catalog data
transformations
- Willingness to learn and teach complex technologies
- Excellent communication skills, and a team player Preferred
complementary skills for the Job
- 5+ years of experience using one or more platforms, such as
DataDog, Grafana, ServiceNow, Solarwinds, Cisco Vitria/Matrix,
Innoeye, Atlassian Stack: (Crucible, Bitbucket, JIRA,
Confluence)
- Experience gaining insight from log files with LOKI,
ElasticSearch, Prometheus, and Grafana.
- Experience implementing systems tracing with services such as
Tempo, Jaeger, Opentracing etc.
- Intermediate understanding of utilizing RestAPIs, Apache Spark,
Kafka We, founded in 2002 in Massachusetts-USA, is a leading
provider of Custom Application Development and end-to-end IT
Services to clients globally. We use a client-centric engagement
model that combines local on-site and off-site resources with the
cost, global expertise and quality advantages of off-shore
operations. We deliver Custom Application Development, Application
Modernization, Business Process Outsourcing and Professional IT
Services from office locations in USA and *. We services clients in
Government, Banking & Financial Markets, Insurance, Healthcare,
Retail & Consumer Goods, Energy & Utilities, Life Sciences,
Telecom, Manufacturing and Transportation Industries around the
globe. Our engagement model provides a flexible operational
environment that empowers our clients with the right levels of
control. Want to read more about We? Visit us at
Keywords: Tanisha Systems, Inc., Plano , SITE RELIABILITY ENGINEER, Professions , Plano, Texas
Didn't find what you're looking for? Search again!
Loading more jobs...