- Priority - Low
- Affecting - Broadband
-
We are getting a few reports are slow performance on broadband services that started circa 13:00
Whilst our local core network appears fine, we are investigating possible issues off our network.
- Date Opened - 21/11/2022 13:55 - 22/11/2022 07:51
- Last Updated - 21/11/2022 13:57
- Priority - Low
- Affecting - Broadband
-
We are currently investigating unusual reports of webpages failing to load for some customers.
Location/City: Nationwide
Completed Actions:
- Reports of circuit impact into the TSC
- TSC engaged NOC for initial investigations
- NOC confirmed an issue seen on active monitoring
- MI process engaged
- MI accepted
- Internal Bridge call scheduled
- NOC investigations ongoing with several examples of affected circuits provided from information gathering by TSC
- Further impact to Consumer circuits discovered and acknowledged
- NOC investigations determined an issue within the core network emanating from a specific location
- NOC contacted hardware supplier and raised a Priority 1 case
- All logs provided to hardware supplier for analysis
- Internal Bridge call convened
- Conference call between NOC and hardware supplier convened
- Following discussions between NOC and our hardware supplier, there have been developments on this incident in regards to restoration.
- It has been found that the origin point of the issue is on a line card situated within a core network device.
- Soft clear of card performed without success
- Full remote reboot of car performed which was successful for a period of approx. 30 mins before the issue manifested again
- Further internal call held with NOC and Hardware Supplier to agree next steps
- Escalation made to Hardware Supplier confirm part availability and Engineer ETA
- Part sourcing resolved
- Engineer details confirmed and will be collecting at 0700.
- Access request to DC in confirmed
- Issue with retrieving parts from location resolved
- Engineer attended Slough DC
- Engineer has completed card swap successfully
- Testing and checks completed
- BGP reenabled
- Network stability confirmed
Current Action Plan:
A Major Incident has been declared as of 12:25 following the confirmation of an issue affecting our core network. Initial impact assessment has determined that both leased line circuits and standard broadband circuits are affected.
Currently, we are investigating the issue alongside the NOC team and are working to restore service as quickly as possible. An internal bridge call is being held at 12:45 to discuss the current impact and determine a plan of action for restoring service.
We apologise for the disruption and hope to have a further update soon.
Current Action Plan:
At this juncture, it has been determined that several tunnels are currently flapping following initial investigations by NOC.
We are working to determine as to why the tunnels are flapping in order for restoration works to begin. At this point, we are unable to provide an estimated time of repair however once one is available we will provide it as soon as possible.
We apologise for the disruption and hope to have a further update soon.
Next Update:
15:30
Current Action Plan:
NOC have been investigating further and have determined that the issue is emanating from a specific location with the core network. As such, this device has been checked and the hardware supplier has been contacted directly for a Priority 1 case to be raised with them. During this time, NOC have attempted to alleviate some issues across the core network by performing a workaround to reroute traffic to avoid the suspected affected device.
All of the requisite logs have been provided and NOC are working alongside the hardware supplier to determine our next course of action. As of yet, it is still unclear as to when full service can be restored however we will endeavour to provide this as soon as we have it available.
We apologise for the disruption and hope to have a further update soon.
Next Update:
17:30
Current Action Plan:
Following discussions between NOC and our hardware supplier, there have been developments on this incident in regards to restoration.
It has been found that the origin point of the issue is on a line card situated within a core network device. During these investigations, two major actions have taken place in an attempt to restore service. The first action was to perform a soft clear on the line card itself which appeared to restore service for a short period before the issue resurfaced. And the second action was performing a full remote reboot of the affected line card.
During a period of monitoring, it has been seen that an issue has surfaced following the full reboot which is currently being investigated on a conference call between NOC and our hardware supplier. Following on from the results of these investigations, the next course of action would be for engineers to be engaged to attend the site of the affected line card in order to perform a physical reseat of the line card itself.
Further to this, we are also considering a line card replacement should the issue resurface following a short monitoring period after the line card reseat.
Next Update:
08:30
Current Action Plan:
Testing and checks in progress before moving BGP traffic back to original routing
Next Update:
10:30
Current Action Plan:
NOC Engineers have advised they are seeing network instability issues and are currently investigating.
Initial assessment appears to have identified that post the reintroduction of traffic following repairs stability was observed to deteriorate. This recovery step has been reverted and investigation continues with Engineers and Network Equipment Supplier TAC
Next Update:
15:00
Current Action Plan:
Engineers have arrived on site and will be completing the card swap within the next 30 minutes
There remains no customer service impact that we are aware of, with services either taking alternative routes round the network or utilising their designed service resiliency at this location.
Next Update:
21:00
Current Action Plan:
Current service impact - None, all resilient ports are back in service
Network impact - None, restored to previous state.
Monitoring will now commence for 24 hours, after this time period the 2 costed out links will be brought back into service sequentially under controlled engineer conditions.
Further update will be posted prior to commencement of work to bring the 2 links back into service
Next Update:
12:00 Sunday 24th July
Current Action Plan:
Monitoring continues
Current Service impact - None, all resilient ports are back in service
Current Network impact - None, restored to previous state.
Two 2 core internal links which are still costed out will be reintroduced this evening under controlled conditions.
Next Update
20:00 - Date Opened - 22/07/2022 12:32 - 25/07/2022 08:12
- Last Updated - 24/07/2022 12:58
- Priority - Critical
- Affecting Server - plesk.aquiss.net
-
We are currently investigating authentication issues with email platforms
This has been resolved
- Date Opened - 07/06/2022 16:17
- Last Updated - 07/06/2022 16:29
- Priority - Medium
- Affecting - Broadband
-
We have received multiple reports from customers reporting Packet Loss across their estate. We have been working with our customer to provide trace routes and evidence the impacts and in conjunction this was raised with our Network Operations Centre (NOC) to assist with investigations.
During further investigations our NOC Engineers identified a potential issue across an element of our network causing Packet Loss to customers. In attempt to stabilise service this has been removed from the network, however, the result have not been positive.
Completed Actions• P2 incident raised
• MI Team engaged
• TSC engaged with customers to gather impact
• NOC engaged
• NOC identified Packet Loss across the core
• MI process invoked
• 1 x Core link removed from the network in attempt to alleviate impact
• 1 x Core link brought back into service
• Network Engineers Engaged
• Traffic forced to another leg of the core network
• 1 x link costed out to force traffic onto the other 4 links
• Further impacts identified and MI Team reconvened
• Testing in the LAB commenced
• Incident conditions replicated in the LAB
• LAB Testing completed
• Configuration change implemented at Slo2 to amend traffic priorities
• Spares orderedCurrent Action Plan
Engineers have escalated the incident to Senior Network Engineers to assist with investigations. At this juncture we have identified a potential capacity issue on our Core Network in London. A technical sessions is ongoing between our Engineering Teams to determine a plan to alleviate customer impact.
In attempt to mitigate customer impact an element of the network was costed out and it was anticipated traffic would reroute across the network, however, this didn't have the desired effect and was reintroduced to alleviate further impact to our customers.
The Major Incident Team are meeting again at 15:30 to review the outputs and review the recovery plan. At this point the root cause is not understood, however, this will be addressed post incident as part of the Post Incident Review.
Our main priority is to deliver the levels of service that our customers deserve and as such we have invoked our Major Incident Process. We are working to affect a full restoration as soon as possible.
Technical Teams in conjunction with the Major Incident Team have continued their investigations and implemented some configuration changes in attempt to mitigate impact to our customers. At this juncture this has not had the desired impact. The team have made one more change at 15:50 and it will take approx. 30 mins to see if this has the desired impact. NOC and Network Engineers Teams continue to monitor the network for stability across our core.
The Major Incident Manager has requested the technical session continues and a contingency plan is provided to the Major Incident Team at 17:30
Technical Teams have completed the configuration changes in attempt to mitigate customer impact, however, due to the drop in traffic volumes we are unable to confirm if this has the desired effect. Testing at this juncture confirms that Packet Loss has improved and the testing completed on the sample of Customer Premises Equipment (CPE) routers indicates no high levels of Packet Loss.
A Technical Assistance Case (TAC) Case has been raised with Nokia to assist us with our investigations to 1. replicate the issue and 2. provide guidance on best practice to distribute traffic across the core devices. Following the feedback from Nokia our Engineering Teams will review any recommendations and present any required changes via the Emergency Change Advisory Board (ECAB). We have gathered historical data to assist Nokia with their investigations.
The initial increase in traffic today is thought to be as a result of Microsoft Patch Tuesday which may have caused an increase in traffic volumes, however, we are unable to confirm this until we see traffic levels increase on the 12th of May 2022.
The incident has been placed into an extended monitoring status and in the event of a reoccurrence the Major Incident Team will reconvene.
NOC Teams have continued to monitor the service overnight and have confirmed that service has stabilised and no further impacts to service has been identified. We have been working with the vendor via a TAC and have received some recommendations from them and our Technical Teams are working through the feedback on a technical session to review the outputs and recommendation to test in the LAB before making any changes in the live environment.
At this juncture we are confident that the configuration changes implemented during the incident yesterday have stabilised the network and we continue to monitor utilisation graphs to determine service impacts. The Major Incident Team will meet again at 16:15 to review the outputs and technical recommendations from the technical session.
Our NOC teams continue to monitor traffic graphs throughout the day and will escalate any issues to the Major Incident Team.
The incident remains in an extended monitoring status and in the event of a reoccurrence the Major Incident Team will reconvene.
Senior Network Engineers have concluded their investigations and have devised two suggestions to mitigate impact across our network.
1. Execute a configuration change at Slo2 on the Core network to amend traffic priority settings as recommended via the TAC case without the appropriate testing capabilities.
2. Increase capacity from 40Gb - 50Gb between Lon44 - Slo2 which requires logistics, hardware and Field Resourcing to assist.Following the technical recommendations the Major Incident Team approved option 1 in attempt to stabilise traffic and alleviate impact. This has been implemented and monitoring at this juncture suggests no further impacts to service. NOC Engineers are monitoring the situation post change activity and the roll back plan has been shared with the NOC in the event we identify issues and have to revert the change.
At this juncture we have exhausted all avenues of investigation in order to stabilise the network without implementing option 2. Major Incident Team have been stood down for this evening to allow the teams to rest and focus on the upgrade during Normal Working Hours, without potentially causing further impact and allowing the network a period to stabilise. However, the Technical Team are working with our suppliers and delivery partner to ensure we have the required spares to commence the upgrade on the 13th May 2022.
A further call has been scheduled to review the implementation plan and agree the best suitable time to approve option 2 an Emergency Change on the 13th May 2022.
Next Update: 13/05/2022 10:45
- Date Opened - 11/05/2022 10:30 - 14/05/2022 15:39
- Last Updated - 13/05/2022 08:45
- Priority - High
- Affecting - Broadband
-
We are investigating reports of latency on our broadband services.
15:49 (20th January) - We are still investigating the cause of latency issues being reported by a percentage of our customer base.
18:55 (20th January) - Core network was tested this evening, but no issues could be highlighted. Therefore, this has been escalated to Cisco to investigate further.
05:57 (21st January) - This problem continues to be investigated with Cisco, however, no progress has been made overnight. We have requested that this be escalated.
10:08 (21st January) - Cisco are now actively engaged with the case and working with NOC.
14:01 (21st January) - Cisco are considering a hardware swap on the fibre termination kit. Remote hands have been dispatched to site.
14:53 (21st January) - Remote hands have been engaged to carry out remedial work at the data centre in relation to the issue however our monitoring indicates that this has not improved the situation. We are continuing to liaise with the hardware vendors for troubleshooting purposes.
17:09 (21st January) - Following hardware configuration changes, we are seeing services returning to normal.
18:45 (21st January) - Customer feedback is showing that services have returned to normal. This mirrors the network data we are seeing. We therefore consider this resolved, however, any unexpected slow speeds, we would recommend a quick reboot of your router to refresh session data.
- Date Opened - 20/01/2022 14:23
- Last Updated - 21/01/2022 18:45
- Priority - Critical
- Affecting System - Radius
-
We are currently investigating issues with radius authentication that is impacting some broadband services.
Description: Major Incident - Multiple DSL services impacted
Current Impact: Believe this will be affecting any managed DSL circuits as well as PWAN DSL, getting multiple calls into the desk
Incident Start Time: 2021-10-19 09:03:40
Major Incident Raised: 2021-10-19 09:41:53
Major Incident Manager: Stephen Martin
Location/City: Multiple
Update Number: 1
Completed Actions:
• P1 raised
• MI process invoked
• Data Centre hands and eyes engaged
• Hands and eyes working on recovery with our NOC
Current Action Plan:
Our NOC Engineers continue to work with the Hands and Eyes at the Data Centre in attempt to recover the primary device.
In parallel our teams are working to bring the secondary device online and manually move traffic in attempt to recover services. The underlying root cause and mitigating actions will be tracked via the Post Incident Review.
Next Update: 12:00 - Date Opened - 19/10/2021 09:41
- Last Updated - 19/10/2021 11:39
- Priority - Critical
- Affecting Server - plesk.aquiss.net
-
We are investigating reports of customers not receiving emails this morning across our hosting network.
We are pushing through a backlog of circa 120000 emails that have hit out network since 2.00am, that have failed to be delivered. This queue will take a while to process fully.
This has now been resolved and emails are once again flowing in and out of the network.
- Date Opened - 13/09/2021 10:35
- Last Updated - 13/09/2021 11:39
- Priority - Medium
- Affecting - Broadband
-
We are receiving a high volume of support enquires where customers appear unable to get online, in some cases getting BTWholesale holding pages, which appear to linked to stale sessions. We are currently investigating.
We would recommend customers in possible stale session situations to power down routers for 15 mins.
We believe this should now be resolved.
If your connection is still unable to connect. Please power down your router for 15 mins, before trying again. This will allow any stale sessions to clear.
- Date Opened - 06/08/2021 08:34
- Last Updated - 06/08/2021 11:24
- Priority - High
- Affecting - Broadband
-
We are aware of an ongoing incident affecting a number of services. Our Engineers are conducting preliminary investigations to determine the comprehensive Service Impact Analysis.
We apologise for the inconvenience caused. Our main priority is to deliver the levels of service that our customers deserve and as such we have invoked our Major Incident Process. Aquiss are working to affect a full restoration as soon as possible.
We will continue to issue regular updates until all services have been restored. Further details will be issued within 15 minutes.
Update 11:14
Current Impact: Service restored. Customers may have experienced a brief outage across internet services between 10:23 - 10:27. Our internet peering links in Slo2 and Lon5 experience a brief flap, however, service has not been restored and our Engineers are conducting a full analysis to identify the root cause of the outage.
Incident Start Time: 2021-04-15 10:37:49
Major Incident Raised: 2021-04-15 10:52:36
Location: Multiple
Completed Actions:
• Major Incident Process invoked
• Service restored
• Major Incident Team engaged
Current Action Plan:
• Monitor services
• Continue root cause investigations
Next Update: 12:30pm
Final 12:19
Resolution Notes: Aquiss Technical Teams have concluded their investigations and identified the root cause of the issue as an internal process and control failure during investigation activity. Several remedial actions have been identified and these will form part of the Post Incident Review.
- Date Opened - 15/04/2021 10:38
- Last Updated - 15/04/2021 12:20
- Priority - High
- Affecting - Broadband
-
We are currently investigating reports of routing issues to websites and services using AWS (Amazon Web Services). This appears to be related to an routing issues being reported off our network within London.
- Date Opened - 23/03/2021 15:35 - 24/03/2021 13:47
- Last Updated - 23/03/2021 15:37
- Priority - Low
- Affecting - Core Network
-
Due to on going improvements to the power infrastructure within the BT POP’s, The above TE requires both the ATS and UPS to be refreshed.
Therefore, BT will be undertaking maintenance work within the below window which will require the power feeds to be moved, whilst an ATS & UPS are swapped within the rack. Although the core network router is not expected to experience any loss of service it should still be considered at risk.
- Date Opened - 21/01/2021 21:00 - 25/01/2021 12:31
- Last Updated - 20/01/2021 13:04
- Priority - Low
- Affecting - Core Network
-
Due to on going improvements to the power infrastructure within the BT POP’s, The above TE requires both the ATS and UPS to be refreshed.
Therefore, BT will be undertaking maintenance work within the below window which will require the power feeds to be moved, whilst an ATS & UPS are swapped within the rack. Although the core network router is not expected to experience any loss of service it should still be considered at risk.
- Date Opened - 28/01/2021 21:00 - 05/02/2021 10:41
- Last Updated - 20/01/2021 10:52
- Priority - Low
- Affecting - Core Network
-
Due to on going improvements to the power infrastructure within the BT POP’s, The above TE requires both the ATS and UPS to be refreshed.
Therefore, BT will be undertaking maintenance work within the above window which will require the power feeds to be moved, whilst an ATS & UPS are swapped within the rack. Although the core network router is not expected to experience any loss of service it should still be considered at risk.
Update : 14/01/2021 15:37
The above works have been cancelled due to adverse weather conditions in the local area. These works will be rescheduled with the new date communicated out accordingly.
- Date Opened - 14/01/2021 20:30 - 15/01/2021 05:00
- Last Updated - 14/01/2021 15:37
- Priority - Low
- Affecting - Core Network
-
Due to on going improvements to the power infrastructure within the BT POP’s, Kingston TE requires both the ATS and UPS to be refreshed. Although only Kingston is directly effected due to network design customers with services on Colindale core should also be considered at risk.
Therefore, BT will be undertaking maintenance work within the above window which will require the power feeds to be moved, whilst an ATS & UPS are swapped within the rack. Although the core network router is not expected to experience any loss of service, customers connecting into the adjacent equipment may experience a small period of downtime.
Update : 08/01/2021 08:05
This work has been completed successfully.
- Date Opened - 07/01/2021 20:30 - 08/01/2021 05:00
- Last Updated - 08/01/2021 08:05
- Priority - Low
- Affecting - Broadband
-
We are currently investigating reports of packet loss and slow speeds affecting broadband circuits, across all regions of the UK. The problem is further complicated that experience of the problem appears to be affecting a small subset of customers, however, this subset keeps changing every few days. This problem appears to within the Openreach network with network traffic, prior to the handover to our network LNSs.
Further updates will follow as and when they become available and we apologise for any inconvenience caused.
Update : 10/12/2020 18:24
We are continuing to investigate reports of packet loss, however we would request any affected customers to provide our support desk with supporting evidence in the form of pathpings or winMTR data to allow further troubleshooting.
Update : 11/12/2020 17:15
Openreach are continuing to investigate the issue on their side and further updates are expected in the next two hours.
Update : 12/12/2020 10:28
Openreach advise that packet loss was observed when testing and advise that the there is a potential issue on their core network. Further troubleshooting will be required.
We will update once we have further information from them.
Update : 12/12/2020 12:07
Openreach advise that the packet loss does not appear to occur overnight, therefore this has been passed to their day shift to investigate today. Intermittent loss has been seen again this morning so they are continuing to trace the root cause of the issue.Our escalation point is aiming to get an update to us by 1400 today, so we will aim to update via this post shortly thereafter.
Update : 12/12/2020 15:27
Openreach have advised that the intermittent packet loss that was seen at 0800 this morning has not been present since, which has hindered progress with their fault finding.Investigations continue and Openreach expect to be able to provide a further update by around 1700 today.
Update : 13/12/2020 08:30
Openreach have advised that their 3rd line team are not observing any packet loss whilst testing. We have been advised that they will continue to troubleshoot out of hours, however, they believe that the packet loss may not present itself until Monday morning during peak hours. We have stressed to them that all efforts must be made by them to resolve this before Monday.
Update : 14/12/2020 08:13
Openreach advise that 2nd line and 3rd line team have been monitoring their network for signs of packet loss. In the last 48 hours no packet loss has been observed. As of this, they have been unable to replicate the issue out of core hours.
Openreach’s 3rd line team are monitoring the network this morning for when the packet loss reappears, to ascertain where in the network the issue lies to resolve.
A further update will be provided following further troubleshooting.
Update : 14/12/2020 15:23
Openreach advise they have seen no further packet loss today and our monitoring confirms this. Openreach are continuing to investigate and have setup additional monitors to assist this. We will provide a further update once we have any more information.
Update : 15/12/2020 15:26
We have started to see packet loss return after our monitoring has detected an increase. We have once again referred this back to Openreach for investigation.
Update : 16/12/2020 12:53
We have once again, around the same time of day, started to get reports of slow speeds and packet loss. However, this time, it seems to be a different portion of our customer base than seen previously. We have once again referred this back to Openreach for investigation.
Update : 16/12/2020 18:59
We do understand that customers who are affected are looking for fix times, which at present we simply can't give. The very nature of this unusual floating problem within Openreach network is proving somewhat challenging to pinpoint a cause, however, senior Openreach engineers are engaged with our Network Operations team currently and will continue to investigate.
Update : 16/12/2020 20:38
Following further conversations with Openreach senior engineers and our Network Operations team, we have seen a change in how traffic is entering our network at around 8.00pm which has resulted in problems disappearing instantly. All lines we are monitoring have at around 8.00pm seen a similar response with packet loss disappearing. Early reports from customers arriving in to our support desk this evening confirm things appear to have returned to normal service.
As the packet loss seems to occur predominantly during daytime hours, it is likely to be tomorrow morning before any further update can be provided and confirmation of this evenings efforts confirmed.
Update : 17/12/2020 11:48
An emergency change is currently being planned in order to assist with troubleshooting the issues currently being seen. This is being discussed at present so that it can be decided when best to carry this out. A further update will be posted at 1600 or upon receipt of further information.
Update : 17/12/2020 21:51
We are confident that this situation now appears to have been resolved following emergency work and changes performed this evening. We have lowered the priority of this case to Low and will monitor for 24 hours before closing the case.
- Date Opened - 10/12/2020 15:20
- Last Updated - 18/12/2020 12:06
- Priority - Medium
- Affecting - Broadband
-
Please be aware that we will be carrying out remedial work on the traffic shapers within the DSL broadband network. No downtime is expected to services, however broadband services should be considered at risk for the duration of this work.
We apologize for the short notice and any inconvenience this may have caused.
- Date Opened - 17/12/2020 18:00 - 17/12/2020 21:00
- Last Updated - 17/12/2020 21:42