Email Issues (Resolved)
  • Priority - Critical
  • Affecting Server - plesk.aquiss.net
  • We are currently investigating authentication issues with email platforms


    This has been resolved

  • Date Opened - 07/06/2022 16:17
  • Last Updated - 07/06/2022 16:29
Broadband Latency & Packet Loss Issues (Resolved)
  • Priority - Medium
  • Affecting - Broadband
  • We have received multiple reports from customers reporting Packet Loss across their estate. We have been working with our customer to provide trace routes and evidence the impacts and in conjunction this was raised with our Network Operations Centre (NOC) to assist with investigations.

    During further investigations our NOC Engineers identified a potential issue across an element of our network causing Packet Loss to customers. In attempt to stabilise service this has been removed from the network, however, the result have not been positive.


    Completed Actions

    • P2 incident raised
    • MI Team engaged
    • TSC engaged with customers to gather impact
    • NOC engaged
    • NOC identified Packet Loss across the core
    • MI process invoked
    • 1 x Core link removed from the network in attempt to alleviate impact
    • 1 x Core link brought back into service
    • Network Engineers Engaged
    • Traffic forced to another leg of the core network
    • 1 x link costed out to force traffic onto the other 4 links
    • Further impacts identified and MI Team reconvened
    • Testing in the LAB commenced
    • Incident conditions replicated in the LAB
    • LAB Testing completed
    • Configuration change implemented at Slo2 to amend traffic priorities
    • Spares ordered

     

     

    Current Action Plan

    Engineers have escalated the incident to Senior Network Engineers to assist with investigations. At this juncture we have identified a potential capacity issue on our Core Network in London. A technical sessions is ongoing between our Engineering Teams to determine a plan to alleviate customer impact.

    In attempt to mitigate customer impact an element of the network was costed out and it was anticipated traffic would reroute across the network, however, this didn't have the desired effect and was reintroduced to alleviate further impact to our customers.

    The Major Incident Team are meeting again at 15:30 to review the outputs and review the recovery plan. At this point the root cause is not understood, however, this will be addressed post incident as part of the Post Incident Review.

    Our main priority is to deliver the levels of service that our customers deserve and as such we have invoked our Major Incident Process. We are working to affect a full restoration as soon as possible.


    Technical Teams in conjunction with the Major Incident Team have continued their investigations and implemented some configuration changes in attempt to mitigate impact to our customers. At this juncture this has not had the desired impact. The team have made one more change at 15:50 and it will take approx. 30 mins to see if this has the desired impact. NOC and Network Engineers Teams continue to monitor the network for stability across our core.

    The Major Incident Manager has requested the technical session continues and a contingency plan is provided to the Major Incident Team at 17:30


    Technical Teams have completed the configuration changes in attempt to mitigate customer impact, however, due to the drop in traffic volumes we are unable to confirm if this has the desired effect. Testing at this juncture confirms that Packet Loss has improved and the testing completed on the sample of Customer Premises Equipment (CPE) routers indicates no high levels of Packet Loss.

    A Technical Assistance Case (TAC) Case has been raised with Nokia to assist us with our investigations to 1. replicate the issue and 2. provide guidance on best practice to distribute traffic across the core devices. Following the feedback from Nokia our Engineering Teams will review any recommendations and present any required changes via the Emergency Change Advisory Board (ECAB). We have gathered historical data to assist Nokia with their investigations.

    The initial increase in traffic today is thought to be as a result of Microsoft Patch Tuesday which may have caused an increase in traffic volumes, however, we are unable to confirm this until we see traffic levels increase on the 12th of May 2022.

    The incident has been placed into an extended monitoring status and in the event of a reoccurrence the Major Incident Team will reconvene.


    NOC Teams have continued to monitor the service overnight and have confirmed that service has stabilised and no further impacts to service has been identified. We have been working with the vendor via a TAC and have received some recommendations from them and our Technical Teams are working through the feedback on a technical session to review the outputs and recommendation to test in the LAB before making any changes in the live environment.

    At this juncture we are confident that the configuration changes implemented during the incident yesterday have stabilised the network and we continue to monitor utilisation graphs to determine service impacts. The Major Incident Team will meet again at 16:15 to review the outputs and technical recommendations from the technical session.

    Our NOC teams continue to monitor traffic graphs throughout the day and will escalate any issues to the Major Incident Team.

    The incident remains in an extended monitoring status and in the event of a reoccurrence the Major Incident Team will reconvene.


    Senior Network Engineers have concluded their investigations and have devised two suggestions to mitigate impact across our network.

    1. Execute a configuration change at Slo2 on the Core network to amend traffic priority settings as recommended via the TAC case without the appropriate testing capabilities.

    2. Increase capacity from 40Gb - 50Gb between Lon44 - Slo2 which requires logistics, hardware and Field Resourcing to assist.

    Following the technical recommendations the Major Incident Team approved option 1 in attempt to stabilise traffic and alleviate impact. This has been implemented and monitoring at this juncture suggests no further impacts to service. NOC Engineers are monitoring the situation post change activity and the roll back plan has been shared with the NOC in the event we identify issues and have to revert the change.

    At this juncture we have exhausted all avenues of investigation in order to stabilise the network without implementing option 2. Major Incident Team have been stood down for this evening to allow the teams to rest and focus on the upgrade during Normal Working Hours, without potentially causing further impact and allowing the network a period to stabilise. However, the Technical Team are working with our suppliers and delivery partner to ensure we have the required spares to commence the upgrade on the 13th May 2022.

    A further call has been scheduled to review the implementation plan and agree the best suitable time to approve option 2 an Emergency Change on the 13th May 2022.

     

    Next Update: 13/05/2022 10:45

     

  • Date Opened - 11/05/2022 10:30 - 14/05/2022 15:39
  • Last Updated - 13/05/2022 08:45
Broadband Latency Issues (Resolved)
  • Priority - High
  • Affecting - Broadband
  • We are investigating reports of latency on our broadband services.


    15:49 (20th January) - We are still investigating the cause of latency issues being reported by a percentage of our customer base.


    18:55 (20th January) - Core network was tested this evening, but no issues could be highlighted. Therefore, this has been escalated to Cisco to investigate further.


    05:57 (21st January) - This problem continues to be investigated with Cisco, however, no progress has been made overnight. We have requested that this be escalated.


    10:08 (21st January) - Cisco are now actively engaged with the case and working with NOC.


    14:01 (21st January) - Cisco are considering a hardware swap on the fibre termination kit. Remote hands have been dispatched to site.


    14:53 (21st January) - Remote hands have been engaged to carry out remedial work at the data centre in relation to the issue however our monitoring indicates that this has not improved the situation. We are continuing to liaise with the hardware vendors for troubleshooting purposes.


    17:09 (21st January) - Following hardware configuration changes, we are seeing services returning to normal.


    18:45 (21st January) - Customer feedback is showing that services have returned to normal. This mirrors the network data we are seeing. We therefore consider this resolved, however, any unexpected slow speeds, we would recommend a quick reboot of your router to refresh session data.

  • Date Opened - 20/01/2022 14:23
  • Last Updated - 21/01/2022 18:45
Broadband Radius Problems (Resolved)
  • Priority - Critical
  • Affecting System - Radius
  • We are currently investigating issues with radius authentication that is impacting some broadband services.


    Description: Major Incident - Multiple DSL services impacted

    Current Impact: Believe this will be affecting any managed DSL circuits as well as PWAN DSL, getting multiple calls into the desk

    Incident Start Time: 2021-10-19 09:03:40

    Major Incident Raised: 2021-10-19 09:41:53

    Major Incident Manager: Stephen Martin

    Location/City: Multiple

    Update Number: 1

    Completed Actions:

    • P1 raised
    • MI process invoked
    • Data Centre hands and eyes engaged
    • Hands and eyes working on recovery with our NOC

    Current Action Plan:
    Our NOC Engineers continue to work with the Hands and Eyes at the Data Centre in attempt to recover the primary device.

    In parallel our teams are working to bring the secondary device online and manually move traffic in attempt to recover services. The underlying root cause and mitigating actions will be tracked via the Post Incident Review.

    Next Update: 12:00

  • Date Opened - 19/10/2021 09:41
  • Last Updated - 19/10/2021 11:39
Issues with emails (Inbound and Outbound) (Resolved)
  • Priority - Critical
  • Affecting Server - plesk.aquiss.net
  • We are investigating reports of customers not receiving emails this morning across our hosting network.


    We are pushing through a backlog of circa 120000 emails that have hit out network since 2.00am, that have failed to be delivered. This queue will take a while to process fully.


    This has now been resolved and emails are once again flowing in and out of the network.

  • Date Opened - 13/09/2021 10:35
  • Last Updated - 13/09/2021 11:39
PPP / Stale Sessions (Resolved)
  • Priority - Medium
  • Affecting - Broadband
  • We are receiving a high volume of support enquires where customers appear unable to get online, in some cases getting BTWholesale holding pages, which appear to linked to stale sessions. We are currently investigating.

    We would recommend customers in possible stale session situations to power down routers for 15 mins.


    We believe this should now be resolved.

    If your connection is still unable to connect. Please power down your router for 15 mins, before trying again. This will allow any stale sessions to clear.

  • Date Opened - 06/08/2021 08:34
  • Last Updated - 06/08/2021 11:24
Major Incident - Network traffic via Slo2 and Lon5 affected (Resolved)
  • Priority - High
  • Affecting - Broadband
  • We are aware of an ongoing incident affecting a number of services. Our Engineers are conducting preliminary investigations to determine the comprehensive Service Impact Analysis.
     
    We apologise for the inconvenience caused. Our main priority is to deliver the levels of service that our customers deserve and as such we have invoked our Major Incident Process. Aquiss are working to affect a full restoration as soon as possible.
     
    We will continue to issue regular updates until all services have been restored. Further details will be issued within 15 minutes.


    Update 11:14

    Current Impact: Service restored. Customers may have experienced a brief outage across internet services between 10:23 - 10:27. Our internet peering links in Slo2 and Lon5 experience a brief flap, however, service has not been restored and our Engineers are conducting a full analysis to identify the root cause of the outage.

    Incident Start Time: 2021-04-15 10:37:49

    Major Incident Raised: 2021-04-15 10:52:36 
      
    Location: Multiple

    Completed Actions:
    • Major Incident Process invoked
    • Service restored
    • Major Incident Team engaged

    Current Action Plan:
    • Monitor services
    • Continue root cause investigations

    Next Update: 12:30pm


    Final 12:19

    Resolution Notes: Aquiss Technical Teams have concluded their investigations and identified the root cause of the issue as an internal process and control failure during investigation activity. Several remedial actions have been identified and these will form part of the Post Incident Review.

  • Date Opened - 15/04/2021 10:38
  • Last Updated - 15/04/2021 12:20
AWS Routing Issues (Resolved)
  • Priority - High
  • Affecting - Broadband
  • We are currently investigating reports of routing issues to websites and services using AWS (Amazon Web Services). This appears to be related to an routing issues being reported off our network within London.

  • Date Opened - 23/03/2021 15:35 - 24/03/2021 13:47
  • Last Updated - 23/03/2021 15:37
BT Planned Work: Nottingham TE (Resolved)
  • Priority - Low
  • Affecting - Core Network
  • Due to on going improvements to the power infrastructure within the BT POP’s, The above TE requires both the ATS and UPS to be refreshed.

    Therefore, BT will be undertaking maintenance work within the below window which will require the power feeds to be moved, whilst an ATS & UPS are swapped within the rack. Although the core network router is not expected to experience any loss of service it should still be considered at risk.

  • Date Opened - 21/01/2021 21:00 - 25/01/2021 12:31
  • Last Updated - 20/01/2021 13:04
BT Planned Work: Sheffield TE (Resolved)
  • Priority - Low
  • Affecting - Core Network
  • Due to on going improvements to the power infrastructure within the BT POP’s, The above TE requires both the ATS and UPS to be refreshed.

    Therefore, BT will be undertaking maintenance work within the below window which will require the power feeds to be moved, whilst an ATS & UPS are swapped within the rack. Although the core network router is not expected to experience any loss of service it should still be considered at risk.

  • Date Opened - 28/01/2021 21:00 - 05/02/2021 10:41
  • Last Updated - 20/01/2021 10:52
BT Planned Work: Leeds TE (Resolved)
  • Priority - Low
  • Affecting - Core Network
  • Due to on going improvements to the power infrastructure within the BT POP’s, The above TE requires both the ATS and UPS to be refreshed.

    Therefore, BT will be undertaking maintenance work within the above window which will require the power feeds to be moved, whilst an ATS & UPS are swapped within the rack. Although the core network router is not expected to experience any loss of service it should still be considered at risk.


    Update : 14/01/2021 15:37

    The above works have been cancelled due to adverse weather conditions in the local area. These works will be rescheduled with the new date communicated out accordingly.

     

  • Date Opened - 14/01/2021 20:30 - 15/01/2021 05:00
  • Last Updated - 14/01/2021 15:37
BT Planned Work: Kingston TE & colindale.core (Resolved)
  • Priority - Low
  • Affecting - Core Network
  • Due to on going improvements to the power infrastructure within the BT POP’s, Kingston TE requires both the ATS and UPS to be refreshed. Although only Kingston is directly effected due to network design customers with services on Colindale core should also be considered at risk.

    Therefore, BT will be undertaking maintenance work within the above window which will require the power feeds to be moved, whilst an ATS & UPS are swapped within the rack. Although the core network router is not expected to experience any loss of service, customers connecting into the adjacent equipment may experience a small period of downtime.


    Update : 08/01/2021 08:05

    This work has been completed successfully.

  • Date Opened - 07/01/2021 20:30 - 08/01/2021 05:00
  • Last Updated - 08/01/2021 08:05
Broadband Slow Speeds and High Latency (Resolved)
  • Priority - Low
  • Affecting - Broadband
  • We are currently investigating reports of packet loss and slow speeds affecting broadband circuits, across all regions of the UK. The problem is further complicated that experience of the problem appears to be affecting a small subset of customers, however, this subset keeps changing every few days. This problem appears to within the Openreach network with network traffic, prior to the handover to our network LNSs.

    Further updates will follow as and when they become available and we apologise for any inconvenience caused.

     

    Update : 10/12/2020 18:24

    We are continuing to investigate reports of packet loss, however we would request any affected customers to provide our support desk with supporting evidence in the form of pathpings or winMTR data to allow further troubleshooting.


    Update : 11/12/2020 17:15

    Openreach are continuing to investigate the issue on their side and further updates are expected in the next two hours.


    Update : 12/12/2020 10:28

    Openreach advise that packet loss was observed when testing and advise that the there is a potential issue on their core network. Further troubleshooting will be required.

    We will update once we have further information from them.


    Update : 12/12/2020 12:07

    Openreach advise that the packet loss does not appear to occur overnight, therefore this has been passed to their day shift to investigate today.  Intermittent loss has been seen again this morning so they are continuing to trace the root cause of the issue.

    Our escalation point is aiming to get an update to us by 1400 today, so we will aim to update via this post shortly thereafter.


    Update : 12/12/2020 15:27

    Openreach have advised that the intermittent packet loss that was seen at 0800 this morning has not been present since, which has hindered progress with their fault finding.

    Investigations continue and Openreach expect to be able to provide a further update by around 1700 today.


    Update : 13/12/2020 08:30

    Openreach have advised that their 3rd line team are not observing any packet loss whilst testing. We have been advised that they will continue to troubleshoot out of hours, however, they believe that the packet loss may not present itself until Monday morning during peak hours. We have stressed to them that all efforts must be made by them to resolve this before Monday.


    Update : 14/12/2020 08:13

    Openreach advise that 2nd line and 3rd line team have been monitoring their network for signs of packet loss. In the last 48 hours no packet loss has been observed. As of this, they have been unable to replicate the issue out of core hours.

    Openreach’s 3rd line team are monitoring the network this morning for when the packet loss reappears, to ascertain where in the network the issue lies to resolve.

    A further update will be provided following further troubleshooting.


    Update : 14/12/2020 15:23

    Openreach advise they have seen no further packet loss today and our monitoring confirms this. Openreach are continuing to investigate and have setup additional monitors to assist this. We will provide a further update once we have any more information.


    Update : 15/12/2020 15:26

    We have started to see packet loss return after our monitoring has detected an increase. We have once again referred this back to Openreach for investigation.


    Update : 16/12/2020 12:53

    We have once again, around the same time of day, started to get reports of slow speeds and packet loss. However, this time, it seems to be a different portion of our customer base than seen previously. We have once again referred this back to Openreach for investigation.


    Update : 16/12/2020 18:59

    We do understand that customers who are affected are looking for fix times, which at present we simply can't give. The very nature of this unusual floating problem within Openreach network is proving somewhat challenging to pinpoint a cause, however, senior Openreach engineers are engaged with our Network Operations team currently and will continue to investigate.


    Update : 16/12/2020 20:38

    Following further conversations with Openreach senior engineers and our Network Operations team, we have seen a change in how traffic is entering our network at around 8.00pm which has resulted in problems disappearing instantly. All lines we are monitoring have at around 8.00pm seen a similar response with packet loss disappearing. Early reports from customers arriving in to our support desk this evening confirm things appear to have returned to normal service.

    As the packet loss seems to occur predominantly during daytime hours, it is likely to be tomorrow morning before any further update can be provided and confirmation of this evenings efforts confirmed.


    Update : 17/12/2020 11:48

    An emergency change is currently being planned in order to assist with troubleshooting the issues currently being seen. This is being discussed at present so that it can be decided when best to carry this out. A further update will be posted at 1600 or upon receipt of further information.


    Update : 17/12/2020 21:51

    We are confident that this situation now appears to have been resolved following emergency work and changes performed this evening. We have lowered the priority of this case to Low and will monitor for 24 hours before closing the case.

  • Date Opened - 10/12/2020 15:20
  • Last Updated - 18/12/2020 12:06
Emergency Work: DSL Platform (Resolved)
  • Priority - Medium
  • Affecting - Broadband
  • Please be aware that we will be carrying out remedial work on the traffic shapers within the DSL broadband network. No downtime is expected to services, however broadband services should be considered at risk for the duration of this work.

    We apologize for the short notice and any inconvenience this may have caused.

  • Date Opened - 17/12/2020 18:00 - 17/12/2020 21:00
  • Last Updated - 17/12/2020 21:42