IrvineRecruiter Since 2001
the smart solution for Irvine jobs

ProdOps Engineer

Company: iHerb
Location: Irvine
Posted on: November 26, 2022

Job Description:

ProdOps Engineer II A ProdOps Engineer at iHerb will be entrusted and empowered to use their skills to keep the lights on for not only our development and engineering teams but for the organization at large, as our team grows. By joining us to take on the daily operation of iHerb's vast infrastructure and services, we will enable our customers, partners, and other departments to accelerate iHerb's drive forward as an industry leader. You will also develop a broad, fulfilling technical knowledge of many tools, systems, and their many integrations. As a level two engineer, you will take extra steps to work with our partners and assist them in offloading their operational demands to our team by developing automation and incident playbooks. You will bring your advanced knowledge of scripting and automation to not only assist our customers, but to help ensure that ProdOps' own processes operate efficiently, reliably, and with the highest availability. Objectives of this Role

  • Operate the production environment by monitoring availability and having a holistic view of system health, utilization, and environmental changes.
  • Lead in developing proactive monitoring solutions, with a focus on streamlining remediation through response plays, scripting, and automation, whenever possible.
  • Continually improve the availability and reliability of production workloads by measuring system performance and taking steps to optimize and autoscale based on trends and changing demands.
  • Assist engineering and development with formulating playbooks for deployments and upgrades that ProdOps can use for future maintenance, with a focus on automating as much of the process as possible.
  • Contribute to 24x7 operational support, triage, and incident management for IT and beyond, as our influence grows. Daily Responsibilities
    • Respond to alerts with appropriate urgency and timing.
    • Gather and analyze system metrics to determine if changes to capacity or configuration should be made and work with the appropriate teams and resources to implement them, always considering opportunities to automate/autoscale.
    • Partner with engineering, development, and other teams to continually improve our operational support offerings and capabilities and provide demos and use cases for review.
    • Participate in system design and capacity planning, with a focus on operational readiness and relevant, effective monitoring.
    • Open, manage, and escalate incidents based on our capabilities and playbooks provided by our partners and customers throughout the organization, continually reviewing them to find ways to automate.
    • Manage problems and continually improve our tools, with a focus on automation to reduce repetitive incidents and degraded performance.
    • Communicate with affected departments about upcoming maintenance or open incidents.
    • Develop automated maintenance and incident notifications based on a correlation of alerts, system metrics, and findings from previous incidents.
    • Automate the generation of reports and dashboards, via the use of scripting languages and API calls.Required Skills and Qualifications
      • At least three years of experience in an internet/web operations environment.
      • At least one year of experience working in or closely with a NOC group.
      • Experience with enterprise networking (switching, routing, load balancing, and firewalls).
      • Experience with containers and orchestration (Kubernetes)
      • Experience with enterprise monitoring and status dashboarding solutions. (Datadog, Statuspage)
      • Skilled in scripting and process automation components (bash, Python, PowerShell, API, etc.,)
      • Skilled provisioning workloads and infrastructure in cloud environments (AWS, Azure, GCP)
      • In-depth, practical knowledge of DNS.
      • In-depth, practical knowledge and experience with Windows and Linux Server OS.
      • Must be able to quickly recognize, understand, and act on alerts from monitoring tools.
      • Must be able to manage and participate in the remediation of incidents of critical severity while maintaining a calm demeanor and attention to detail and process.
      • Must possess excellent verbal and written communication skills, with the ability to write timely status updates for critical incidents.
      • Must be driven, self-starting, and possesses a customer focused mindset.#LI-JC1

Keywords: iHerb, Irvine , ProdOps Engineer, Engineering , Irvine, California

Click here to apply!

Didn't find what you're looking for? Search again!

I'm looking for
in category

Log In or Create An Account

Get the latest California jobs by following @recnetCA on Twitter!

Irvine RSS job feeds