Broadband Outage (Resolved)
  • Priority - Critical
  • Affecting - Broadband Services
  • We are currently aware of a loss of service for some broadband customers. This follows reported planned work by one of our suppliers that has failed overnight.

    We have been advised that during this work, major core hardware failed, resulting in new hardware needing to be dispatched. This is not expected to arrive on site until 8.00am

    We have been advised that loss of broadband services for those affected is expected to last until 11am.

    We have expressed our extreme shock at how this planned work has been managed and will be seeking answers.

    Regards

    Martin Pitt
    Managing Director

     



    We are seeing some customers coming back online.

    We recommend that all customers affected, to power down routers for at least 10 minutes, before trying again, as this may clear the fault in some cases, but note, not for everyone.

    This fault remains open and we are being advised by upstream suppliers that they still expect issues to remain to at least 11am (subject to change)

     


     

    We are being advised that normal services are expected to be restored by 1.00pm. Engineering teams are on site for the upstream supplier and presently doing final checks to replacement hardware.

    Customers may see a brief network drop in the next couple of hours as primary routes are restored.

     


     

    All services should be restored as expected. 

    If you believe you may still have a service impacting issue, we first ask you to switch your router off for 10 minutes. 

    If rebooting does not resolve your issue, we would kindly ask that you raise a case directly with our support desk via https://support.aquiss.net 

    We would like to offer our apologies for the events outside of our control today, but please be assured we have requested a full report.

     

    Regards 

    Martin Pitt
    Managing Director
    Aquiss Limited

  • Date Opened - 21/08/2024 05:04 - 21/08/2024 13:33
  • Last Updated - 21/08/2024 13:33
Global IT Issues (Resolved)
  • Priority - Medium
  • Affecting - Service Operations
  • We are monitoring reports of global IT problems, reported to be caused by problems with CrowdStrike, a security solution that operates within Windows Operating systems, but often used by business clients, systems and internal networks, all of which have an indirect on all of us.

    All systems at Aquiss are fine and operating as expected. We have completed our business continuity checks with all our suppliers, to confirm everyone is operating as expected, currently.

    We are however aware of reports that a number of websites are experiencing issues, such as taking payments and bookings and therefore Aquiss customers should be mindful that problems may take place for a few days. Aquiss has no control over these experiences.

  • Date Opened - 19/07/2024 10:24 - 20/07/2024 07:07
  • Last Updated - 19/07/2024 10:31
PPP Cannot Connect Issues (South Coast) (Resolved)
  • Priority - Critical
  • Affecting - CityFibre Broadband Circuits
  • We are investigating a regional issue (along the south coast) of customers that are unable to connect across the CityFibre network.


    This has now been resolved.

  • Date Opened - 01/06/2024 14:58
  • Last Updated - 03/06/2024 10:42
Planned Work: Telehouse DSL Platform (Resolved)
  • Priority - Medium
  • Affecting - Broadband Services
  • During the below reported window, we will be performing service impacting maintenance on the Telehouse DSL core.

    At the start of the maintenance window, customers connected to the Telehouse side of the platform will be forcibly disconnected and will allowed to reconnect to the Interxion infrastructure for the duration of the work. Once the rack migration has been completed, sessions will be rebalanced across the platform by performing additional disconnections from the Interxion side.

    For the avoidance of doubt, all broadband services connected to the Telehouse side of the platform at the start of the window will experience a brief outage while sessions are torn down and reconnected. All services connected to the Interxion side of the platform may experience a brief outage towards the end of the window when rebalancing occurs.

    We apologise for any inconvenience this may cause.

  • Date Opened - 13/06/2024 00:01 - 19/07/2024 10:31
  • Last Updated - 28/05/2024 16:32
Planned Maintenance : Interxion DC Rack Migration (Resolved)
  • Priority - Medium
  • Affecting - Broadband Services
  • Due to our Datacentre supplier carrying out works to replace the CRAC (Aircon) units, we will be performing a rack migration of the Interxion DSL platform in order to protect our services and prevent unplanned outages.

    At the start of the maintenance window, sessions connected to the Interxion side of the platform will be forcibly disconnected and will allowed to reconnect to the Telehouse infrastructure for the duration of the work. Once the rack migration has been completed, sessions will be rebalanced across the platform by performing additional disconnections from the Telehouse side.

    For the avoidance of doubt, all DSL services connected to the Interxion side of the platform at the start of the window will experience a brief outage while sessions are torn down and reconnect. All services connected to the Telehouse side of the platform may experience a brief outage towards the end of the window when rebalancing occurs.

    We apologise for any inconvenience this may cause.


    This work has been completed

  • Date Opened - 27/04/2024 00:01 - 27/04/2024 07:01
  • Last Updated - 27/04/2024 07:01
Broadband Latency Increases (Resolved)
  • Priority - High
  • Affecting - Broadband Circuits
  • We are investigating reports of increased latency (and in turned reduced speeds) that have started at around 8.00pm

  • Date Opened - 04/04/2024 20:00 - 05/04/2024 09:53
  • Last Updated - 04/04/2024 21:00
Broadband Session Drops (Resolved)
  • Priority - Medium
  • Affecting - Broadband Connections
  • We currently investigating an unexpected drop in broadband connections. This is being investigated.

    If you are unable to get back online, we recommend powering down your router, at the mains, for 5 mins, before restarting.

  • Date Opened - 04/04/2024 13:03 - 04/04/2024 17:36
  • Last Updated - 04/04/2024 13:08
Broadband Slow Performance (Resolved)
  • Priority - Medium
  • Affecting - Openreach LNSs
  • We are currently investigating slower than expected broadband performance via Openreach circuits.

    At present the sample size of reports is small and no geographical pattern or common network path.


    Further customer reports are arriving into our support channels.

    An open dialogue with all network parties involved is currently taking place to investigate cases further.


    LNS4 has been highlighted as having issues and is currently being reviewed by NOC.

    Customers that are experiencing slow speeds are advised to reboot routers, which will then restart your connection with alternative network equipment.


    We are still dealing with reports of slow speeds for a small selection of customers.

    Measures and tests have been performed by ourselves that show this to be a problem within BT Wholesale network prior to traffic passing to network assets we can influence.

    This has been escalated within BT Wholesale who are investigating.


    Whilst we have not received any meaningful updates from BT Wholesale, customer updates are that speeds for those affected seem to be returning to normal.

    Whilst we will take this as a positive update, the lack of updates from wholesale supply, be it any progress, fixes or if parties are still investigating, is not acceptable to us. Please be assured we have expressed our concerns accordingly.

    We therefore still consider services to be at risk at this time.


    Following continued reports from customers that services are back to normal, we are closing this case.

    We once again apologise to customers who had service disruption.

  • Date Opened - 20/03/2024 10:43 - 21/03/2024 15:25
  • Last Updated - 21/03/2024 15:25
Planned Work: Core Router Software Upgrade (Resolved)
  • Priority - High
  • Affecting - Broadband
  • During the reported time period, software upgrades to core routers in London will be performed.

    All directly connected services, such as broadband, may experience a number of small outages, up to 3 hours in total. Any traffic that would usually be routed through this node will take an alternative path for the duration of the work.

    We apologise in advance for any inconvenience caused.

    Please DO NOT factory reset your router during this time.

  • Date Opened - 12/03/2024 22:00 - 15/03/2024 07:31
  • Last Updated - 11/03/2024 11:38
Planned Work: Core Router Software Upgrade (Resolved)
  • Priority - High
  • Affecting - Broadband
  • During the reported time period, software upgrades to core routers in London will be performed.

    All directly connected services, such as broadband, will experience a number of outages of up to 3 hours in total. Any traffic that would usually be routed through this node will take an alternative path for the duration of the work.

    We apologise in advance for any inconvenience caused.

    Please DO NOT factory reset your router during this time.

  • Date Opened - 13/03/2024 22:00 - 15/03/2024 07:31
  • Last Updated - 11/03/2024 10:12
Planned Work (Resolved)
  • Priority - High
  • Affecting - Broadband
  • During the reported time period, software upgrades to core routers at Sovereign House, London will be performed.

    All directly connected services, such as broadband, will experience a number of outages of up to 4 hours in total. Any traffic that would usually be routed through this node will take an alternative path for the duration of the work.

    We apologise in advance for any inconvenience caused.

    Please DO NOT factory reset your router during this time.

  • Date Opened - 20/02/2024 22:00 - 21/02/2024 17:16
  • Last Updated - 14/02/2024 08:16
Planned Work: Core Router Software Upgrade (Resolved)
  • Priority - High
  • Affecting - Broadband Services
  • During the published window listed below, we will carry out software upgrades on our core router at Telehouse East, London. All directly connected services will experience a number of outages of up to 30 mins in total. Any traffic that would usually be routed through this node will take an alternative path for the duration of the work.

    We apologise in advance for any inconvenience caused.

  • Date Opened - 01/02/2024 22:00 - 05/02/2024 09:15
  • Last Updated - 01/02/2024 08:36
Planned Work: Core Router Software Upgrade (Resolved)
  • Priority - High
  • Affecting - Broadband Services
  • During the above window, we will carry out software upgrades on our core router at Telehouse East, London. All directly connected services will experience a number of outages of up to 30 mins in total. Any traffic that would usually be routed through this node will take an alternative path for the duration of the work.

    We apologise in advance for any inconvenience caused.


    This work is progressing well


    This work has been completed

  • Date Opened - 31/01/2024 23:00 - 01/02/2024 06:00
  • Last Updated - 01/02/2024 08:34
Broadband Services at Risk (Resolved)
  • Priority - Low
  • Affecting - Broadband Services
  • We are currently monitoring a loss of power within London, that could have impact on broadband services, therefore we consider these at risk, whilst we investigate further.

  • Date Opened - 04/11/2023 16:54 - 05/11/2023 08:43
  • Last Updated - 04/11/2023 16:55
Planned Work: Core Router Software Upgrade (Resolved)
  • Priority - Medium
  • Affecting - Broadband Services
  • During the above window, we will carry out software upgrades on our core router at Sovereign House, London. All directly connected services will experience a number of outages of up to 4 hours in total. Any traffic that would usually be routed through this node will take an alternative path for the duration of the work.

    We apologise in advance for any inconvenience caused.

  • Date Opened - 31/10/2023 22:00 - 04/11/2023 17:34
  • Last Updated - 17/10/2023 23:30
Loss of routing (Resolved)
  • Priority - Medium
  • Affecting - Broadband Services
  • We are investigating loss of connectivity across broadband services.


    This appears to be linked to planned work, between Tuesday 17/10/2023 22:00 to Wednesday 18/10/2023 06:00 2023, that we were not been informed about.

  • Date Opened - 17/10/2023 22:53 - 18/10/2023 08:00
  • Last Updated - 17/10/2023 23:28
Planned Work: Core Router Software Upgrade (Resolved)
  • Priority - Medium
  • Affecting - Broadband Services
  • During the above window, we will carry out software upgrades on our core router at Interxion, London. All directly connected services will experience a number of outages of up to 4 hours in total. Any traffic that would usually be routed through this node will take an alternative path for the duration of the work.

    We apologise in advance for any inconvenience caused.

  • Date Opened - 24/10/2023 22:00 - 26/10/2023 09:08
  • Last Updated - 17/10/2023 23:27
Leased Line Services (Resolved)
  • Priority - Critical
  • Affecting - Leased Lines
  • We are aware of an ongoing incident affecting a number of services. Our Engineers are conducting preliminary investigations to determine the comprehensive Service Impact Analysis.

    We apologise for the inconvenience caused. Our main priority is to deliver the levels of service that our customers deserve and as such we have invoked our Major Incident Process.


    Update Number:1

    Current Impact: 1345 Services across Metro and Wholesale Services

    Incident Start Time: 22:00 15/08/2023

    Major Incident Manager: Ashley Wilson

    Next Update: 11:00

    Summary Update:

    Following a planned maintenance equipment at LON55 appears to have suffered a hardware failure. The Line card will not reboot so all wholesale and Metro Services which pass over LON55 are without service.

    Replacement equipment has been dispatched to site and is due to arrive by 10am, once the equipment has arrived this will be configured and replaced. Engineers are onsite and awaiting the arrival of the equipment.

    Completed Actions:

    • Major Incident Team engaged.

    • NOC Engaged

    • Replacement Equipment Ordered

    • ETA for Equipment 10am

    • Engineers Engaged

    • Engineers Arrived on site


    Update Number:2

    Current Impact: 1345 Services across Metro and Wholesale Services

    Major Incident Manager: Ashley Wilson

     

    Next Update: 14:00

     

    Summary Update:

    Following a planned maintenance equipment at LON55 appears to have suffered a hardware failure. The Line card will not reboot so all wholesale and Metro Services which pass over LON55 are without service.

    Replacement equipment has been dispatched to site and was due to arrive by 10am this has now been delayed and escalations have been placed in our supplier to provide us with a new ETR for the equipment to arrive, it is believed that the courier is in traffic and should arrive in due course.

    The Engineer onsite has commenced preparation and has backed up everything from the old device and the old equipment has been disconnected and cabling labelled ready to replace the equipment once it arrives.

     

    Completed Actions:

    • Major Incident Team engaged.

    • NOC Engaged

    • Replacement Equipment Ordered

    • ETA for Equipment 10am

    • Engineers Engaged

    • Engineers Arrived on site.

    • Site preparation Completed.


    Update Number: 3

    Current Impact: 1345 Services across Metro and Wholesale Services

    Major Incident Manager: Ashley Wilson

     

    Next Update: 17:30

     

    Summary Update:

    Hardware arrived at site at 11:45 and has now been fitted in place, NOC are currently applying the relevant software updates before they apply the final configuration.

    The Software updates are estimated to take 4 hours and the remainder configuration will take around 3 hours. We will provide the next update after the software has been successfully updated. 

     

    Completed Actions:

    • Major Incident Team engaged.

    • NOC Engaged

    • Replacement Hardware Ordered

    • ETA for Hardware 10am

    • Engineers Engaged

    • Engineers Arrived on site.

    • Site preparation Completed. 

    • Hardware arrived at site 11:45

    • Hardware fitted in to site.


    Current Impact: We believe all services are restored.

    Incident Resolution Time: 13:27 16/08/2023

    Major Incident Manager: Ashley Wilson

     

    Resolution Notes: 

    Following a planned maintenance equipment at LON55 suffered a hardware failure. The Line card would not reboot so all wholesale and Metro Services which pass over LON55 are without service.

    Hardware has been replaced and the software has been updated and configuration applied, and service has now been restored. 

    If customers continue to experience further issues, please re-boot equipment. If issues persist following any re-boot, please contact the Aquiss support team to report.

     

    Completed Actions:

    • Major Incident Team engaged.

    • NOC Engaged

    • Replacement Hardware Ordered

    • ETA for Hardware 10am

    • Engineers Engaged

    • Engineers Arrived on site.

    • Site preparation Completed. 

    • Hardware arrived at site 11:45

    • Hardware fitted in to site.

    • Software update applied 

    • Configuration applied 

  • Date Opened - 16/08/2023 02:23
  • Last Updated - 16/08/2023 14:06
Broadband Latency & Packet Loss Issues (Resolved)
  • Priority - Low
  • Affecting - Broadband
  • We are investigating reports of packetloss on broadband circuits that started at 10:45am today.


    We have just witnessed an unexpected largescale drop of PPP sessions across the Openreach network. We are investigating, but we believe this is linked to the packet loss reports we have been seeing from earlier today.


    We continue to receive reports of packetloss from customers. These seem to come and ago, but not to everyone at the same time, but likewise are appearing at different LNSs handoffs. This is still being investigated.


    We have not seen any further reports of issues for the past couple of hours, so we are lowering the priority to Low. We continue to investigate this with our upstream suppliers.


    Other than one single report during the evening that could be related, our support desk was silent overnight. We continue to monitor, but at this stage we believe the problem is/was within BT Wholesale.


    We have started to get a handful of further reports, since midday, regarding similar experiences to yesterday. At present there is no pattern and for the most part it's not affecting the same group of customers as yesterday.


    All monitoring is showing that the reported problems have resolved themselves.

  • Date Opened - 05/07/2023 11:33 - 08/07/2023 07:29
  • Last Updated - 07/07/2023 14:39
Datacentre Migration (Resolved)
  • Priority - High
  • Affecting Server - plesk.aquiss.net
  • We would like to inform you about our plans to migrate all our UK hosting services to a new state-of-the-art data centre located in Pendlebury, Manchester during May 2023.

    At Aquiss, your experience while using our services is our utmost priority. To honour our ongoing commitment to consistently improve our service offerings and provide innovative hosting solutions, we have chosen to move all our UK servers to a new modern, purpose-built facility. We have been at our current location at Teledata since 2008 and they can no longer provide the capacity and flexibility necessary to cater to our customer's future needs.

    Our new data centre boasts many improved features such as redundant network connectivity, redundant UPS, standby generators, redundant cooling, 24/7 on-site security, VESDA and fire suppression. The facility is also ISO 27001 certified.

     

    When will the migration take place?
    The migration is scheduled for May 18th (23:00) until May 19th 2023 (06:00). We will inform you about the specific overnight maintenance windows for each of your services soon via email. During these windows, there will be an outage as we physically move our servers the 16 miles between the two data centres.

    We want to assure you that we are taking all necessary precautions to ensure a smooth transition between the two locations. We will be transporting the servers in special shock-mounted server racks to ensure that they remain physically safe and secure throughout the move. Additionally, we will conduct full backups of all systems before the physical move takes place to minimise the risk of data loss.

    Will IP addresses change?
    No. Your assigned IP addresses for hosting packages will be the same before and after the migration. This means there's also no DNS changes, no DNS propagation period.


    Will I need to reconfigure anything?
    No. Your hosting packages and service will be moved exactly as they are, so they will be the same when they reach the new location as when they're shut down at the old location.


    How much downtime will I experience?
    We're expecting less than 4 hours downtime per server.

     

    We've made every effort to minimise the downtime as much as practically possible. Servers take a long time to shut down (up to 30 minutes), and that time can be variable depending on the workload of each one. We are only moving a small number of servers at a time to ensure that the migration proceeds as expeditiously as possible.

  • Date Opened - 18/05/2023 23:00 - 22/05/2023 08:21
  • Last Updated - 05/05/2023 17:14
Broadband Speed Issues (Resolved)
  • Priority - Low
  • Affecting - Broadband
  • We are getting a few reports are slow performance on broadband services that started circa 13:00

    Whilst our local core network appears fine, we are investigating possible issues off our network.

  • Date Opened - 21/11/2022 13:55 - 22/11/2022 07:51
  • Last Updated - 21/11/2022 13:57
Broadband Issues (Resolved)
  • Priority - Low
  • Affecting - Broadband
  • We are currently investigating unusual reports of webpages failing to load for some customers.


    Location/City: Nationwide

    Completed Actions:

    - Reports of circuit impact into the TSC
    - TSC engaged NOC for initial investigations
    - NOC confirmed an issue seen on active monitoring
    - MI process engaged
    - MI accepted
    - Internal Bridge call scheduled
    - NOC investigations ongoing with several examples of affected circuits provided from information gathering by TSC
    - Further impact to Consumer circuits discovered and acknowledged
    - NOC investigations determined an issue within the core network emanating from a specific location
    - NOC contacted hardware supplier and raised a Priority 1 case
    - All logs provided to hardware supplier for analysis
    - Internal Bridge call convened
    - Conference call between NOC and hardware supplier convened
    - Following discussions between NOC and our hardware supplier, there have been developments on this incident in regards to restoration.
    - It has been found that the origin point of the issue is on a line card situated within a core network device.
    - Soft clear of card performed without success
    - Full remote reboot of car performed which was successful for a period of approx. 30 mins before the issue manifested again
    - Further internal call held with NOC and Hardware Supplier to agree next steps
    - Escalation made to Hardware Supplier confirm part availability and Engineer ETA
    - Part sourcing resolved
    - Engineer details confirmed and will be collecting at 0700.
    - Access request to DC in confirmed
    - Issue with retrieving parts from location resolved
    - Engineer attended Slough DC
    - Engineer has completed card swap successfully
    - Testing and checks completed
    - BGP reenabled
    - Network stability confirmed


    Current Action Plan:

    A Major Incident has been declared as of 12:25 following the confirmation of an issue affecting our core network. Initial impact assessment has determined that both leased line circuits and standard broadband circuits are affected.

    Currently, we are investigating the issue alongside the NOC team and are working to restore service as quickly as possible. An internal bridge call is being held at 12:45 to discuss the current impact and determine a plan of action for restoring service.

    We apologise for the disruption and hope to have a further update soon.


    Current Action Plan:

    At this juncture, it has been determined that several tunnels are currently flapping following initial investigations by NOC.

    We are working to determine as to why the tunnels are flapping in order for restoration works to begin. At this point, we are unable to provide an estimated time of repair however once one is available we will provide it as soon as possible.

    We apologise for the disruption and hope to have a further update soon.

    Next Update:
    15:30


    Current Action Plan:

    NOC have been investigating further and have determined that the issue is emanating from a specific location with the core network. As such, this device has been checked and the hardware supplier has been contacted directly for a Priority 1 case to be raised with them. During this time, NOC have attempted to alleviate some issues across the core network by performing a workaround to reroute traffic to avoid the suspected affected device.

    All of the requisite logs have been provided and NOC are working alongside the hardware supplier to determine our next course of action. As of yet, it is still unclear as to when full service can be restored however we will endeavour to provide this as soon as we have it available.

    We apologise for the disruption and hope to have a further update soon.

    Next Update:
    17:30


    Current Action Plan:

    Following discussions between NOC and our hardware supplier, there have been developments on this incident in regards to restoration.

    It has been found that the origin point of the issue is on a line card situated within a core network device. During these investigations, two major actions have taken place in an attempt to restore service. The first action was to perform a soft clear on the line card itself which appeared to restore service for a short period before the issue resurfaced. And the second action was performing a full remote reboot of the affected line card.

    During a period of monitoring, it has been seen that an issue has surfaced following the full reboot which is currently being investigated on a conference call between NOC and our hardware supplier. Following on from the results of these investigations, the next course of action would be for engineers to be engaged to attend the site of the affected line card in order to perform a physical reseat of the line card itself.

    Further to this, we are also considering a line card replacement should the issue resurface following a short monitoring period after the line card reseat.

    Next Update:
    08:30


    Current Action Plan:

    Testing and checks in progress before moving BGP traffic back to original routing

    Next Update:
    10:30


    Current Action Plan:

    NOC Engineers have advised they are seeing network instability issues and are currently investigating.

    Initial assessment appears to have identified that post the reintroduction of traffic following repairs stability was observed to deteriorate. This recovery step has been reverted and investigation continues with Engineers and Network Equipment Supplier TAC

    Next Update:
    15:00


    Current Action Plan:

    Engineers have arrived on site and will be completing the card swap within the next 30 minutes

    There remains no customer service impact that we are aware of, with services either taking alternative routes round the network or utilising their designed service resiliency at this location.

    Next Update:
    21:00


    Current Action Plan:

    Current service impact - None, all resilient ports are back in service

    Network impact - None, restored to previous state.

    Monitoring will now commence for 24 hours, after this time period the 2 costed out links will be brought back into service sequentially under controlled engineer conditions.

    Further update will be posted prior to commencement of work to bring the 2 links back into service

    Next Update:
    12:00 Sunday 24th July


    Current Action Plan:

    Monitoring continues

    Current Service impact - None, all resilient ports are back in service

    Current Network impact - None, restored to previous state.

    Two 2 core internal links which are still costed out will be reintroduced this evening under controlled conditions.

    Next Update
    20:00

  • Date Opened - 22/07/2022 12:32 - 25/07/2022 08:12
  • Last Updated - 24/07/2022 12:58
Email Issues (Resolved)
  • Priority - Critical
  • Affecting Server - plesk.aquiss.net
  • We are currently investigating authentication issues with email platforms


    This has been resolved

  • Date Opened - 07/06/2022 16:17
  • Last Updated - 07/06/2022 16:29
Broadband Latency & Packet Loss Issues (Resolved)
  • Priority - Medium
  • Affecting - Broadband
  • We have received multiple reports from customers reporting Packet Loss across their estate. We have been working with our customer to provide trace routes and evidence the impacts and in conjunction this was raised with our Network Operations Centre (NOC) to assist with investigations.

    During further investigations our NOC Engineers identified a potential issue across an element of our network causing Packet Loss to customers. In attempt to stabilise service this has been removed from the network, however, the result have not been positive.


    Completed Actions

    • P2 incident raised
    • MI Team engaged
    • TSC engaged with customers to gather impact
    • NOC engaged
    • NOC identified Packet Loss across the core
    • MI process invoked
    • 1 x Core link removed from the network in attempt to alleviate impact
    • 1 x Core link brought back into service
    • Network Engineers Engaged
    • Traffic forced to another leg of the core network
    • 1 x link costed out to force traffic onto the other 4 links
    • Further impacts identified and MI Team reconvened
    • Testing in the LAB commenced
    • Incident conditions replicated in the LAB
    • LAB Testing completed
    • Configuration change implemented at Slo2 to amend traffic priorities
    • Spares ordered

     

     

    Current Action Plan

    Engineers have escalated the incident to Senior Network Engineers to assist with investigations. At this juncture we have identified a potential capacity issue on our Core Network in London. A technical sessions is ongoing between our Engineering Teams to determine a plan to alleviate customer impact.

    In attempt to mitigate customer impact an element of the network was costed out and it was anticipated traffic would reroute across the network, however, this didn't have the desired effect and was reintroduced to alleviate further impact to our customers.

    The Major Incident Team are meeting again at 15:30 to review the outputs and review the recovery plan. At this point the root cause is not understood, however, this will be addressed post incident as part of the Post Incident Review.

    Our main priority is to deliver the levels of service that our customers deserve and as such we have invoked our Major Incident Process. We are working to affect a full restoration as soon as possible.


    Technical Teams in conjunction with the Major Incident Team have continued their investigations and implemented some configuration changes in attempt to mitigate impact to our customers. At this juncture this has not had the desired impact. The team have made one more change at 15:50 and it will take approx. 30 mins to see if this has the desired impact. NOC and Network Engineers Teams continue to monitor the network for stability across our core.

    The Major Incident Manager has requested the technical session continues and a contingency plan is provided to the Major Incident Team at 17:30


    Technical Teams have completed the configuration changes in attempt to mitigate customer impact, however, due to the drop in traffic volumes we are unable to confirm if this has the desired effect. Testing at this juncture confirms that Packet Loss has improved and the testing completed on the sample of Customer Premises Equipment (CPE) routers indicates no high levels of Packet Loss.

    A Technical Assistance Case (TAC) Case has been raised with Nokia to assist us with our investigations to 1. replicate the issue and 2. provide guidance on best practice to distribute traffic across the core devices. Following the feedback from Nokia our Engineering Teams will review any recommendations and present any required changes via the Emergency Change Advisory Board (ECAB). We have gathered historical data to assist Nokia with their investigations.

    The initial increase in traffic today is thought to be as a result of Microsoft Patch Tuesday which may have caused an increase in traffic volumes, however, we are unable to confirm this until we see traffic levels increase on the 12th of May 2022.

    The incident has been placed into an extended monitoring status and in the event of a reoccurrence the Major Incident Team will reconvene.


    NOC Teams have continued to monitor the service overnight and have confirmed that service has stabilised and no further impacts to service has been identified. We have been working with the vendor via a TAC and have received some recommendations from them and our Technical Teams are working through the feedback on a technical session to review the outputs and recommendation to test in the LAB before making any changes in the live environment.

    At this juncture we are confident that the configuration changes implemented during the incident yesterday have stabilised the network and we continue to monitor utilisation graphs to determine service impacts. The Major Incident Team will meet again at 16:15 to review the outputs and technical recommendations from the technical session.

    Our NOC teams continue to monitor traffic graphs throughout the day and will escalate any issues to the Major Incident Team.

    The incident remains in an extended monitoring status and in the event of a reoccurrence the Major Incident Team will reconvene.


    Senior Network Engineers have concluded their investigations and have devised two suggestions to mitigate impact across our network.

    1. Execute a configuration change at Slo2 on the Core network to amend traffic priority settings as recommended via the TAC case without the appropriate testing capabilities.

    2. Increase capacity from 40Gb - 50Gb between Lon44 - Slo2 which requires logistics, hardware and Field Resourcing to assist.

    Following the technical recommendations the Major Incident Team approved option 1 in attempt to stabilise traffic and alleviate impact. This has been implemented and monitoring at this juncture suggests no further impacts to service. NOC Engineers are monitoring the situation post change activity and the roll back plan has been shared with the NOC in the event we identify issues and have to revert the change.

    At this juncture we have exhausted all avenues of investigation in order to stabilise the network without implementing option 2. Major Incident Team have been stood down for this evening to allow the teams to rest and focus on the upgrade during Normal Working Hours, without potentially causing further impact and allowing the network a period to stabilise. However, the Technical Team are working with our suppliers and delivery partner to ensure we have the required spares to commence the upgrade on the 13th May 2022.

    A further call has been scheduled to review the implementation plan and agree the best suitable time to approve option 2 an Emergency Change on the 13th May 2022.

     

    Next Update: 13/05/2022 10:45

     

  • Date Opened - 11/05/2022 10:30 - 14/05/2022 15:39
  • Last Updated - 13/05/2022 08:45
Broadband Latency Issues (Resolved)
  • Priority - High
  • Affecting - Broadband
  • We are investigating reports of latency on our broadband services.


    15:49 (20th January) - We are still investigating the cause of latency issues being reported by a percentage of our customer base.


    18:55 (20th January) - Core network was tested this evening, but no issues could be highlighted. Therefore, this has been escalated to Cisco to investigate further.


    05:57 (21st January) - This problem continues to be investigated with Cisco, however, no progress has been made overnight. We have requested that this be escalated.


    10:08 (21st January) - Cisco are now actively engaged with the case and working with NOC.


    14:01 (21st January) - Cisco are considering a hardware swap on the fibre termination kit. Remote hands have been dispatched to site.


    14:53 (21st January) - Remote hands have been engaged to carry out remedial work at the data centre in relation to the issue however our monitoring indicates that this has not improved the situation. We are continuing to liaise with the hardware vendors for troubleshooting purposes.


    17:09 (21st January) - Following hardware configuration changes, we are seeing services returning to normal.


    18:45 (21st January) - Customer feedback is showing that services have returned to normal. This mirrors the network data we are seeing. We therefore consider this resolved, however, any unexpected slow speeds, we would recommend a quick reboot of your router to refresh session data.

  • Date Opened - 20/01/2022 14:23
  • Last Updated - 21/01/2022 18:45
Broadband Radius Problems (Resolved)
  • Priority - Critical
  • Affecting System - Radius
  • We are currently investigating issues with radius authentication that is impacting some broadband services.


    Description: Major Incident - Multiple DSL services impacted

    Current Impact: Believe this will be affecting any managed DSL circuits as well as PWAN DSL, getting multiple calls into the desk

    Incident Start Time: 2021-10-19 09:03:40

    Major Incident Raised: 2021-10-19 09:41:53

    Major Incident Manager: Stephen Martin

    Location/City: Multiple

    Update Number: 1

    Completed Actions:

    • P1 raised
    • MI process invoked
    • Data Centre hands and eyes engaged
    • Hands and eyes working on recovery with our NOC

    Current Action Plan:
    Our NOC Engineers continue to work with the Hands and Eyes at the Data Centre in attempt to recover the primary device.

    In parallel our teams are working to bring the secondary device online and manually move traffic in attempt to recover services. The underlying root cause and mitigating actions will be tracked via the Post Incident Review.

    Next Update: 12:00

  • Date Opened - 19/10/2021 09:41
  • Last Updated - 19/10/2021 11:39
Issues with emails (Inbound and Outbound) (Resolved)
  • Priority - Critical
  • Affecting Server - plesk.aquiss.net
  • We are investigating reports of customers not receiving emails this morning across our hosting network.


    We are pushing through a backlog of circa 120000 emails that have hit out network since 2.00am, that have failed to be delivered. This queue will take a while to process fully.


    This has now been resolved and emails are once again flowing in and out of the network.

  • Date Opened - 13/09/2021 10:35
  • Last Updated - 13/09/2021 11:39
PPP / Stale Sessions (Resolved)
  • Priority - Medium
  • Affecting - Broadband
  • We are receiving a high volume of support enquires where customers appear unable to get online, in some cases getting BTWholesale holding pages, which appear to linked to stale sessions. We are currently investigating.

    We would recommend customers in possible stale session situations to power down routers for 15 mins.


    We believe this should now be resolved.

    If your connection is still unable to connect. Please power down your router for 15 mins, before trying again. This will allow any stale sessions to clear.

  • Date Opened - 06/08/2021 08:34
  • Last Updated - 06/08/2021 11:24
Major Incident - Network traffic via Slo2 and Lon5 affected (Resolved)
  • Priority - High
  • Affecting - Broadband
  • We are aware of an ongoing incident affecting a number of services. Our Engineers are conducting preliminary investigations to determine the comprehensive Service Impact Analysis.
     
    We apologise for the inconvenience caused. Our main priority is to deliver the levels of service that our customers deserve and as such we have invoked our Major Incident Process. Aquiss are working to affect a full restoration as soon as possible.
     
    We will continue to issue regular updates until all services have been restored. Further details will be issued within 15 minutes.


    Update 11:14

    Current Impact: Service restored. Customers may have experienced a brief outage across internet services between 10:23 - 10:27. Our internet peering links in Slo2 and Lon5 experience a brief flap, however, service has not been restored and our Engineers are conducting a full analysis to identify the root cause of the outage.

    Incident Start Time: 2021-04-15 10:37:49

    Major Incident Raised: 2021-04-15 10:52:36 
      
    Location: Multiple

    Completed Actions:
    • Major Incident Process invoked
    • Service restored
    • Major Incident Team engaged

    Current Action Plan:
    • Monitor services
    • Continue root cause investigations

    Next Update: 12:30pm


    Final 12:19

    Resolution Notes: Aquiss Technical Teams have concluded their investigations and identified the root cause of the issue as an internal process and control failure during investigation activity. Several remedial actions have been identified and these will form part of the Post Incident Review.

  • Date Opened - 15/04/2021 10:38
  • Last Updated - 15/04/2021 12:20
AWS Routing Issues (Resolved)
  • Priority - High
  • Affecting - Broadband
  • We are currently investigating reports of routing issues to websites and services using AWS (Amazon Web Services). This appears to be related to an routing issues being reported off our network within London.

  • Date Opened - 23/03/2021 15:35 - 24/03/2021 13:47
  • Last Updated - 23/03/2021 15:37
BT Planned Work: Nottingham TE (Resolved)
  • Priority - Low
  • Affecting - Core Network
  • Due to on going improvements to the power infrastructure within the BT POP’s, The above TE requires both the ATS and UPS to be refreshed.

    Therefore, BT will be undertaking maintenance work within the below window which will require the power feeds to be moved, whilst an ATS & UPS are swapped within the rack. Although the core network router is not expected to experience any loss of service it should still be considered at risk.

  • Date Opened - 21/01/2021 21:00 - 25/01/2021 12:31
  • Last Updated - 20/01/2021 13:04
BT Planned Work: Sheffield TE (Resolved)
  • Priority - Low
  • Affecting - Core Network
  • Due to on going improvements to the power infrastructure within the BT POP’s, The above TE requires both the ATS and UPS to be refreshed.

    Therefore, BT will be undertaking maintenance work within the below window which will require the power feeds to be moved, whilst an ATS & UPS are swapped within the rack. Although the core network router is not expected to experience any loss of service it should still be considered at risk.

  • Date Opened - 28/01/2021 21:00 - 05/02/2021 10:41
  • Last Updated - 20/01/2021 10:52
BT Planned Work: Leeds TE (Resolved)
  • Priority - Low
  • Affecting - Core Network
  • Due to on going improvements to the power infrastructure within the BT POP’s, The above TE requires both the ATS and UPS to be refreshed.

    Therefore, BT will be undertaking maintenance work within the above window which will require the power feeds to be moved, whilst an ATS & UPS are swapped within the rack. Although the core network router is not expected to experience any loss of service it should still be considered at risk.


    Update : 14/01/2021 15:37

    The above works have been cancelled due to adverse weather conditions in the local area. These works will be rescheduled with the new date communicated out accordingly.

     

  • Date Opened - 14/01/2021 20:30 - 15/01/2021 05:00
  • Last Updated - 14/01/2021 15:37
BT Planned Work: Kingston TE & colindale.core (Resolved)
  • Priority - Low
  • Affecting - Core Network
  • Due to on going improvements to the power infrastructure within the BT POP’s, Kingston TE requires both the ATS and UPS to be refreshed. Although only Kingston is directly effected due to network design customers with services on Colindale core should also be considered at risk.

    Therefore, BT will be undertaking maintenance work within the above window which will require the power feeds to be moved, whilst an ATS & UPS are swapped within the rack. Although the core network router is not expected to experience any loss of service, customers connecting into the adjacent equipment may experience a small period of downtime.


    Update : 08/01/2021 08:05

    This work has been completed successfully.

  • Date Opened - 07/01/2021 20:30 - 08/01/2021 05:00
  • Last Updated - 08/01/2021 08:05
Broadband Slow Speeds and High Latency (Resolved)
  • Priority - Low
  • Affecting - Broadband
  • We are currently investigating reports of packet loss and slow speeds affecting broadband circuits, across all regions of the UK. The problem is further complicated that experience of the problem appears to be affecting a small subset of customers, however, this subset keeps changing every few days. This problem appears to within the Openreach network with network traffic, prior to the handover to our network LNSs.

    Further updates will follow as and when they become available and we apologise for any inconvenience caused.

     

    Update : 10/12/2020 18:24

    We are continuing to investigate reports of packet loss, however we would request any affected customers to provide our support desk with supporting evidence in the form of pathpings or winMTR data to allow further troubleshooting.


    Update : 11/12/2020 17:15

    Openreach are continuing to investigate the issue on their side and further updates are expected in the next two hours.


    Update : 12/12/2020 10:28

    Openreach advise that packet loss was observed when testing and advise that the there is a potential issue on their core network. Further troubleshooting will be required.

    We will update once we have further information from them.


    Update : 12/12/2020 12:07

    Openreach advise that the packet loss does not appear to occur overnight, therefore this has been passed to their day shift to investigate today.  Intermittent loss has been seen again this morning so they are continuing to trace the root cause of the issue.

    Our escalation point is aiming to get an update to us by 1400 today, so we will aim to update via this post shortly thereafter.


    Update : 12/12/2020 15:27

    Openreach have advised that the intermittent packet loss that was seen at 0800 this morning has not been present since, which has hindered progress with their fault finding.

    Investigations continue and Openreach expect to be able to provide a further update by around 1700 today.


    Update : 13/12/2020 08:30

    Openreach have advised that their 3rd line team are not observing any packet loss whilst testing. We have been advised that they will continue to troubleshoot out of hours, however, they believe that the packet loss may not present itself until Monday morning during peak hours. We have stressed to them that all efforts must be made by them to resolve this before Monday.


    Update : 14/12/2020 08:13

    Openreach advise that 2nd line and 3rd line team have been monitoring their network for signs of packet loss. In the last 48 hours no packet loss has been observed. As of this, they have been unable to replicate the issue out of core hours.

    Openreach’s 3rd line team are monitoring the network this morning for when the packet loss reappears, to ascertain where in the network the issue lies to resolve.

    A further update will be provided following further troubleshooting.


    Update : 14/12/2020 15:23

    Openreach advise they have seen no further packet loss today and our monitoring confirms this. Openreach are continuing to investigate and have setup additional monitors to assist this. We will provide a further update once we have any more information.


    Update : 15/12/2020 15:26

    We have started to see packet loss return after our monitoring has detected an increase. We have once again referred this back to Openreach for investigation.


    Update : 16/12/2020 12:53

    We have once again, around the same time of day, started to get reports of slow speeds and packet loss. However, this time, it seems to be a different portion of our customer base than seen previously. We have once again referred this back to Openreach for investigation.


    Update : 16/12/2020 18:59

    We do understand that customers who are affected are looking for fix times, which at present we simply can't give. The very nature of this unusual floating problem within Openreach network is proving somewhat challenging to pinpoint a cause, however, senior Openreach engineers are engaged with our Network Operations team currently and will continue to investigate.


    Update : 16/12/2020 20:38

    Following further conversations with Openreach senior engineers and our Network Operations team, we have seen a change in how traffic is entering our network at around 8.00pm which has resulted in problems disappearing instantly. All lines we are monitoring have at around 8.00pm seen a similar response with packet loss disappearing. Early reports from customers arriving in to our support desk this evening confirm things appear to have returned to normal service.

    As the packet loss seems to occur predominantly during daytime hours, it is likely to be tomorrow morning before any further update can be provided and confirmation of this evenings efforts confirmed.


    Update : 17/12/2020 11:48

    An emergency change is currently being planned in order to assist with troubleshooting the issues currently being seen. This is being discussed at present so that it can be decided when best to carry this out. A further update will be posted at 1600 or upon receipt of further information.


    Update : 17/12/2020 21:51

    We are confident that this situation now appears to have been resolved following emergency work and changes performed this evening. We have lowered the priority of this case to Low and will monitor for 24 hours before closing the case.

  • Date Opened - 10/12/2020 15:20
  • Last Updated - 18/12/2020 12:06
Emergency Work: DSL Platform (Resolved)
  • Priority - Medium
  • Affecting - Broadband
  • Please be aware that we will be carrying out remedial work on the traffic shapers within the DSL broadband network. No downtime is expected to services, however broadband services should be considered at risk for the duration of this work.

    We apologize for the short notice and any inconvenience this may have caused.

  • Date Opened - 17/12/2020 18:00 - 17/12/2020 21:00
  • Last Updated - 17/12/2020 21:42