Skip to content

Platform#

Keeping systems online and running 24/7! The purpose of the Platform engineering team is to ensure the availability and health of client infrastructure. We ensure that we have an effective product to sell, and that it remains online, and in continually top condition. We are the team that allows amazee.io to provide the service that we offer to clients.

Responsibilities#

  • Manage, maintain and monitor all clusters 24/7
  • Manage, maintain and monitor global Lagoon infrastructure 24/7
  • Monitor all production sites 24/7
  • React to infrastructure alerts
  • React to outages reported from clients via Client Support Team
  • Provide emergency phone support outside office hours
  • Continuously improve amazee.io platform
  • Coordinate with external partners such as AWS, GCP, Azure and Fastly to ensure stable operations
  • Guarantee platform and website uptime SLAs
  • Coordinate with Lagoon Team for Lagoon features, releases, issues
  • Coordinate with and support amazee.io security team
  • Monitor, analyze, and optimize infrastructure costs with the help of knowledge from the Business Operations Team and tooling from the IT Team
  • Create and update statuspage entries during outages and maintenance
  • Write post-mortems for more significant outages on time

Non-Responsibilities#

  • 1st & 2nd level support
  • No direct communication with clients during outages (communication happens via statuspage)
  • Supporting client application issues or requests
  • Maintain Lagoon codebase
  • Create reports on site uptime

Workstream#

Roles#