All systems operational

Watermelon

100.0% uptime

Chatbot engine

Operational

Database

Operational

App Engine

Operational

Compute Engine

Operational

Messaging API

100.0% uptime

Website Widget

Operational

Chat Widget

Operational

Facebook Messenger

Operational

WhatsApp

Operational

Instagram

Operational

Payments

Operational

Stripe API

Operational

OpenAI

Operational

API

Operational

Apps

100.0% uptime

iOs app

Operational

Mac App

Operational

Windows app

Operational

Notice history

Feb, 2024

Outage of Conversations and Pulse overview
  • Postmortem
    Postmortem

    We want to provide you with a transparent overview of the recent system downtime incident, its resolution, and the preventive measures we've implemented to avoid similar occurrences in the future.

    Incident Timeline:

    Friday 16 February 2024

    • 8:30 AM: System outage detected; monitoring server and metrics.

      • Database CPU utilization peaked at 100%.

      • Overload observed on the API-GPT and token verification servers.

    • 8:45 AM: Rolled back to the previous day's release.

    • 9:00 AM: Issue persisted; continued monitoring and gathering system information. Identified a problem with token handling. Disabled token refreshes by one of our developers.

    • 10:00 AM: Restored the system to the state of the previous day, undoing all rollbacks.

    • 11:00 AM: Developers continue debugging and investigating potential workarounds or solutions for the token issue.

    • 12:00 PM - 1:00 PM: Developers set up the production environment locally for debugging

    • 3:15 PM - 5:00 PM: Additional logging deployed within the code.

    • 6:00 PM: Developer identified a discrepancy in database connections within the verification server. Adjusted the database host and deployed changes.

    • 6:30 PM - 7:30 PM: Most functionalities restored, except for the count feature. Developer discovered that tokens were not being sent along with requests.

    Saturday 17 February 2024

    • 10:00 AM: A solution for the count is deployed to production. Watermelon is fully up and running again.

    • 10:00 AM - 6:00 PM: Developers continue to monitor the situation.

    Sunday 18 February 2024

    • 09:00 AM - 6:00 PM: Developers continue to monitor the situation. Everything continues to work as expected.

    Root Cause Analysis:

    The system downtime was primarily caused by excessive token refreshing, leading to overload on the verification server. This resulted in a backlog of requests across multiple servers, including the verification, GPT, and total count servers.

    Actions Taken:

    1. Disabled all token refresh functionality and retained a single main refresh process. This may cause people to see a loading screen when returning to Watermelon in the browser.

    2. Reconnected the verification server to the main database.

    Preventive Measures:

    To prevent similar incidents in the future, we have:

    • Improve the token refresh functionality. This will decrease the load on the server.

    We apologize for any inconvenience this downtime may have caused and assure you that we are committed to maintaining the reliability and performance of our services.

    Thank you for your understanding and continued support.

  • Resolved
    Resolved

    This incident has been resolved.

  • Monitoring
    Update

    We've implemented a fix and are currently monitoring the situation. The count in conversations is up and running again as well. If you have a problem with logging in, please clear your cache and try again.

  • Monitoring
    Monitoring

    We've implemented a fix and are currently monitoring the situation. Everything is running again except the count in conversations.

  • Identified
    Update

    We have found a solution for the problem and actively working on fixing it.

  • Identified
    Identified

    We have identified this issue and are working on a solution.

  • Investigating
    Update

    We are continuing to investigate this issue.

  • Investigating
    Investigating

    We are currently investigating this incident.

    Conversations are not loading conversations, and the Pulse overview is not showing any chatbots. The Pulse and legacy chatbots continue to work on the selected channels and once the outage has been resolved, the conversations will be visible again.

Dec, 2023 to Feb, 2024