Site Reliability Engineer
Company: TP-Link Systems Inc.
Location: Irvine
Posted on: February 15, 2026
|
|
|
Job Description:
Job Description Job Description At the forefront of the future
of connected living, TP-Link's Systems Inc. R&D Center in
Irvine, Southern California's innovation hub, spearheads research
and development of next-generation networking, IoT smart home
products, and software services. Our team of passionate engineers
are constantly innovating, engineering solutions that transform the
end user experience with simpler, smarter, and more reliable
connectivity. We're looking for a passionate and experienced Site
Reliability Engineer to join our team and play a crucial role in
ensuring our cloud platform's security, Reliability, scalability,
and operational excellence. About Us: Headquartered in the United
States, TP-Link Systems Inc. is a global provider of reliable
networking devices and smart home products, consistently ranked as
the world’s top provider of Wi-Fi devices. The company is committed
to delivering innovative products that enhance people’s lives
through faster, more reliable connectivity. With a commitment to
excellence, TP-Link serves customers in over 170 countries and
continues to grow its global footprint. We believe technology
changes the world for the better! At TP-Link Systems Inc, we are
committed to crafting dependable, high-performance products to
connect users worldwide with the wonders of technology. Embracing
professionalism, innovation, excellence, and simplicity, we aim to
assist our clients in achieving remarkable global performance and
enable consumers to enjoy a seamless, effortless lifestyle.
Responsibilities: Assist in implementing and operating
Microservices on Kubernetes cloud-based platforms. Collaborate with
the Cloud Technical Development and DevOps teams to deploy services
to the Multi-Cloud Platform. Conduct Load Tests and Chaos Tests to
ensure the scalability and reliability of microservices. Build
observability for Microservices and cloud platforms like AWS, OCI,
Azure, and GCP. Contribute to writing and executing disaster
recovery plans in collaboration with the Development and DevOps
teams. Help analyze and resolve production risks caused by
insufficient resources, such as node groups, CPU, memory, HPA
scheduling, JVM pre-warming, etc. Write and maintain scripts for
automation using languages like Python, Go, or Bash. Assist in
defining and maintaining the KPIs (SLA/SLO/SLI) for all cloud
microservices with development teams to better understand the
business. Create and maintain technical documentation, including
architecture diagrams, design documents, and standard operating
procedures. Ensure adherence to security and compliance standards,
including ISO27001, SOC2, and GDPR. Participate in incident
response efforts to troubleshoot and resolve production issues
quickly. Conduct post-incident analysis to identify root causes and
potential workarounds/solutions. Contribute to product/technology
selection, including implementation of POCs. Be adaptable to change
and evolving processes and tools. Participate in mentoring and
training less senior members of the team. Be part of the on-call
rotation and provide support after work hours and on weekends.
Other duties as assigned. Requirements Bachelor's degree in
Computer Science, Information Technology, or a related field. 1-3
years of experience as a Site Reliability Engineer or in a related
role. Proficiency in programming and scripting languages like Java,
Python, Bash, or PowerShell. Hands-on experience in SRE, DevOps,
cloud operations, and cloud security best practices. Basic
knowledge of security technologies, including Identity and access
management, Network security, Application security, and Data
protection. Strong problem-solving and analytical skills, with the
ability to work independently and as part of a team. Experience in
developing and maintaining technical documentation and implementing
compliance requirements. Additional Skills (Preferred): Relevant
cloud certifications include AWS Solutions Architect, Azure
Solutions Architect Expert, or GCP Professional Cloud Architect.
Experience with container orchestration technologies (e.g.,
Kubernetes) Benefits Salary range: $100,000 - $140,000 Free snacks
and drinks, and provided lunch on Fridays Fully paid medical,
dental, and vision insurance (partial coverage for dependents)
Contributions to 401k funds Bi-annual reviews, and annual pay
increases Health and wellness benefits, including free gym
membership Quarterly team-building events At TP-Link Systems Inc.,
we are continually searching for ambitious individuals who are
passionate about their work. We believe that diversity fuels
innovation, collaboration, and drives our entrepreneurial spirit.
As a global company, we highly value diverse perspectives and are
committed to cultivating an environment where all voices are heard,
respected, and valued. We are dedicated to providing equal
employment opportunities to all employees and applicants, and we
prohibit discrimination and harassment of any kind based on race,
color, religion, age, sex, national origin, disability status,
genetics, protected veteran status, sexual orientation, gender
identity or expression, or any other characteristic protected by
federal, state, or local laws. Beyond compliance, we strive to
create a supportive and growth-oriented workplace for everyone. If
you share our passion and connection to this mission, we welcome
you to apply and join us in building a vibrant and inclusive team
at TP-Link Systems Inc. Please, no third-party agency inquiries,
and we are unable to offer visa sponsorships at this time.
Keywords: TP-Link Systems Inc., Upland , Site Reliability Engineer, IT / Software / Systems , Irvine, California