Supervisor - Server Repair Engineering
Company: Sms Infocomm Corporation
Location: Grapevine
Posted on: April 1, 2026
|
|
|
Job Description:
Description Position Overview We are seeking a senior
engineering leader to serve as the Supervisor of AI Server Repair
Engineering & Process. This is a foundational role responsible for
architecting, defining, and continuously improving the entire
technical framework for diagnosing and repairing our complex,
high-value AI server infrastructure. More than a traditional
supervisor, you are the lead repair engineer and process owner. You
will leverage your deep hardware expertise to develop systematic,
data-driven, and scalable repair processes from the ground up. You
will not only lead a team of technicians and junior engineers but
also act as their primary technical mentor and the engineering
liaison to our core Product Design and Quality teams. Your mission
is to transform our repair facility into a center of excellence by
embedding engineering discipline into every aspect of our service
operations. Key Responsibilities 1. Process Architecture &
Definition (Primary Focus): * Architect and Author: Design,
document, and deploy the end-to-end technical workflow for AI
server repair. This includes creating detailed Standard Operating
Procedures (SOPs), diagnostic flowcharts, decision trees, and work
instructions. * Test Plan Development: Define and validate
comprehensive test plans and validation criteria for all repaired
components and full systems, ensuring they meet strict performance
and reliability standards before being returned to service. *
Tooling & Automation: Identify, develop, and implement diagnostic
scripts, software tools, and physical fixtures to improve the
accuracy, consistency, and efficiency of the troubleshooting and
repair process. * Process Control: Establish critical control
points within the repair process to ensure quality and gather vital
failure data. 2. Advanced Engineering Support & Failure Analysis
(Primary Focus): * Technical Authority: Serve as the ultimate
escalation point for the most complex hardware failures that elude
standard diagnostic procedures. * Root Cause Analysis (RCA): Lead
systematic deep dives into new and recurring failure modes. Perform
board-level analysis, interpret schematics, and collaborate with
the team to isolate the root cause. * Engineering Feedback Loop:
Act as the primary technical interface between the repair center
and core Hardware Engineering/R&D. Consolidate, analyze, and
present failure data and RCA findings to influence future product
design for improved serviceability and reliability (Design for
Serviceability). 3. Operational Leadership & Team Enablement: *
Technical Mentorship: Lead and develop the technical capabilities
of the repair team. Provide hands-on training on new products,
advanced diagnostic techniques, and established repair processes. *
Enablement, Not Just Delegation: Empower the team by ensuring they
have the processes, tools, and knowledge required to succeed. Focus
on removing technical roadblocks and fostering an environment of
structured problem-solving. * Performance Management: Set clear
technical objectives, manage workflow priorities based on
engineering needs, and guide the professional growth of team
members. 4. Data-Driven Continuous Improvement: * Analyze Repair
Data: Systematically collect and analyze repair data (failure
modes, component usage, test yields) to identify trends and
opportunities for process optimization. * Drive Improvements:
Initiate and lead engineering change requests (ECRs) and process
improvement projects based on data analysis to enhance repair
quality, reduce turn-around time, and lower costs. Qualifications
Qualifications & Skills Required Qualifications (Must-Haves): *
Education: Bachelor’s degree in Electrical Engineering, Computer
Engineering, Manufacturing Engineering, or a closely related field.
* Experience: * 4 years in a technical engineering role such as
Test Engineering, Manufacturing Engineering, Hardware Sustaining,
or high-level Repair Engineering. * Proven track record of
developing and documenting technical processes (SOPs, test plans,
work instructions) from scratch in a manufacturing or repair
environment. * 3 years in a technical leadership role, mentoring
junior engineers or technicians. * Technical Expertise: *
Expert-level ability to read and interpret electronic schematics,
board layout files, and product specifications. * Strong, hands-on
experience with systematic hardware troubleshooting methodologies
for complex systems (e.g., servers, networking equipment). *
Demonstrated proficiency in scripting (Python, Bash, or similar) to
automate diagnostic tests and parse data logs. * Deep knowledge of
server components and architecture, including GPUs, high-speed
interconnects (InfiniBand/Ethernet), CPUs, and power systems.
Preferred Qualifications (Nice-to-Haves): * Master’s degree in
Electrical or Computer Engineering. * Experience with Design for
Manufacturability (DFM) or Design for Serviceability (DFS)
principles. * Certification and practical application of Lean
Manufacturing or Six Sigma methodologies. * Experience with
analyzing failure and yield data. * Hands-on experience with
board-level repair techniques (e.g., soldering, BGA rework) is a
strong plus.
Keywords: Sms Infocomm Corporation, Plano , Supervisor - Server Repair Engineering, IT / Software / Systems , Grapevine, Texas