PlanoRecruiter Since 2001
the smart solution for Plano jobs

Lead Site Reliability Engineer

Company: McAfee, LLC
Location: plano
Posted on: May 3, 2021

Job Description:

Job Title:Lead Site Reliability EngineerLocation: US, Texas, PlanoRole Overview:As the SRE Lead, you will lead 247 SRE team of experienced individuals, and will be accountable to maintain the appropriate service levels (availability, latency, and reliability) to serve our customers' needs, and reduce the friction for managing change, while being strategic about capacity, and constantly managing performance. Your responsibilities will include setting team priorities, goals and engaging with DevOps, Engineering & other teams to understand and support our needs and projects. Every SRE manages the availability, scalability, security, performance, cost, and compliance requirements of our services. You will ensure applications on-boarded to SRE are instrumented for full-stack observability and continuous testing, introduce continuous improvement, integrate into IT Service Operations. You will also create the strategy for AIOps through AI/ML and NoOps, delivering strategic innovation to improve availability, stability, and resiliency.Company OverviewFrom device to cloud, McAfee provides market-leading cybersecurity solutions for both business and consumers. We help businesses orchestrate cyber environments that are truly integrated, where protection, detection, and correction of security threats happen simultaneously. For consumers, McAfee secures your devices against viruses, malware, and other threats, both at home and away. We want to continue to shape the future of cybersecurity by working together to build best in class products and solutions.Lead a 24*7 team of Site Reliability Engineers working on several key services and technologies to support our products in a resilient, scalable, compliant and sustainable manner.the initial response and assessment of all operational incidents and request.Oversee service operations. Develop outstanding operational processes and procedures based on ITIL framework and industry ITSM best practices in delivering services.Create and manage day to day processes including Change Management, Incident Management, and Problem ManagementWork extensively to help reduce the Mean Time to Restore (MTTR) & improve Mean Time To Detect (MTTD)Develop well-rounded Measurements to manage the operational performance of the service provided in delivering product/service supportPrepare, manage, monitor, and report production service uptime and reliability and work towards the Continuous service improvement plan for recurring incidents.Work across Engineering and Support teams to ensure we meet our goals for service reliability, availability, and efficiency.Complete Incident retrospectives. Manage the Incident lifecycle and works directly with Engineering, DevOps, IT & other teams for RCA and problem management of high priority incidents.Ensure security events and alerts are addressed.Manage availability and performance of mission-critical services. Improve automation to prevent problem recurrence, and responses to all non-exceptional service conditions.Support product engineering teams on SRE related activities to establish SLAs for all pre-defined activities and provide a high-quality customer experience.Planning and deployment of patches and product enhancements to our environments.Conduct readiness reviews before moving changes / deployments into higher environmentsParticipate early in the SDLC to ensure reliability is built in from the beginning, and creating plans for successful implementations/launches and transition into SRE team smoothly.Ensure agreement and coordination with Engineering, project and release/deployment teams.Develop productive relationships with business leaders across the organization to identify and remove barriers and ensure applications operation and support are meeting expected levels of service, quality, and performance.Continually evaluate and adopt the latest industry technologies to optimize costs and improve processes.Provide leadership, strategy, vision and direction in achieving a robust, flexible, scalable, innovative global service delivery modelLead by example, both technically and organizationally, and establish credibility with the quality of team's technical execution.Create a culture that supports innovation and creativity while delivering high output in a predictable and reliable way.Mentor, coach, and develop a globally distributed SRE team.Define goals and measurements to define success for your team.About You: 10+ years of software development and/or technical operations experience, and experience running large-scale applications with minimum 2 years of lead experience and a minimum 3 years in technical architect or lead experience.You have experience in SRE / DevOps, Infrastructure Engineering, and Systems Engineering.You have experience defining and implementing highly resilient and reliable applications.Experience building, maintaining and operating production systems (> 99.9% SLA) on On-prem or Cloud (AWS).You will Monitor, Debug & RCA for any service failure and involvement into complete development and deployment cycle.You have a understanding of development, debugging, administration and automation frameworks: C#/.NET, PowerShell, Python, Ansible,You have experience with Monitoring, logging, APM & other tools: AppD, ELK, Cloudwatch, NewRelic, MoogSoft, Solarwind.Experience with CI/CD tools: Git, Teamcity, Jenkin, Artifactory, Ansible, Harness, AWS deploy, Octopus, etc.Experience with container technologies: Kubernetes, DockerExperience with both Windows and Linux Operating SystemsStrong knowledge of AWS cloud service offerings covering serverless and containerized workloadsGood to have ITIL, HDI, AWS or any other Cloud certificationsWork some non-standard hours to support a global team and programs.Company Benefits and Perks:We work hard to embrace diversity and inclusion and encourage everyone at McAfee to bring their authentic selves to work every day. We offer a variety of social programs, flexible work hours and family-friendly benefits to all of our employees.Pension and Retirement PlansMedical, Dental and Vision CoveragePaid Time OffPaid Parental LeaveSupport for Community InvolvementWe're serious about our commitment to diversity which is why McAfee prohibits discrimination based on race, color, religion, gender, national origin, age, disability, veteran status, marital status, pregnancy, gender expression or identity, sexual orientation or any other legally protected status.Job Type:Experienced HirePrimary Location:US, Texas, PlanoAdditional Locations:

Keywords: McAfee, LLC, Plano , Lead Site Reliability Engineer, Other , plano, Texas

Click here to apply!

Didn't find what you're looking for? Search again!

I'm looking for
in category

Log In or Create An Account

Get the latest Texas jobs by following @recnetTX on Twitter!

Plano RSS job feeds