Enter Service Status
All Systems Operational
Enter Cloud Suite ? Operational
ECS IT-MIL1 ? Operational
ECS DE-FRA1 ? Operational
ECS NL-AMS1 ? Operational
Network ? Operational
Hosting Services ? Operational
CloudUP / SelfServer ? Operational
Customer Care Team ? Operational
Operational
Degraded Performance
Partial Outage
Major Outage
Maintenance
Scheduled Maintenance
Dear Customer,
in order to improve network availability and bandwidth between nodes, to fix security vulnerabilities and known bugs and to increase Block Storage pool, we are planning a major update of our entire infrastructure.

To minimize downtimes and update operations impact, the update will be carried out over four different nights in the Italy Milano 1 Region.
Amsterdam and Frankfurt region will follow, and you’ll get specific notice beforehand.

================

Here’s what we are going to do:

Block Storage - we’ll increase the Ceph SATA PGs (placement groups) in order to reflect a more consistent growth. The new SSD-type pool will not be affected at all. During this operation, due to the internal cluster rebalancing operations on drives, you may experience inability to access volumes mounted on VMs. This kind of operation may result in longer execution cycles, therefore we’ll split it over four different nights to avoid uncomfortable unavailability windows.

Networking - we’ll apply some L3 availability improvements that will require us to migrate your vRouter to another gateway node. This will cause a 30 to 60 second internet glitch for your tenant.

Compute - we’ll proceed to update kernels and drivers (especially the network ones) in order to increase stability and gain a 5x bandwidth when transferring data between VMs running on different compute nodes.

================

During this planned maintenance (# 3 out of 4) we’ll proceed as follows:

NIGHT 3/4 - April 25th
11PM - 2AM CEST (UTC+2) -> Resize of Block Storage Cluster. Potential volume performance degradation.
2AM - 7AM CEST (UTC+2) -> E3-class Compute nodes upgrade and reboot, step 1/2. Downtime per node: 30 minutes.

NIGHT 4/4 - April 27th
11PM - 2AM CEST (UTC+2) -> Resize of Block Storage Cluster. Potential volume performance degradation.
2AM - 7AM CEST (UTC+2) -> E3-class Compute nodes upgrade and reboot, step 2/2. Downtime per node: 30 minutes.

=======================

We apologise for any inconvenience. Feel free to contact us for any possible question at support@entercloudsuite.com.
Best regards,

The Enter Cloud Suite Team
Posted on Apr 5, 15:32 CEST
Dear Customer,
in order to improve network availability and bandwidth between nodes, to fix security vulnerabilities and known bugs and to increase Block Storage pool, we are planning a major update of our entire infrastructure.

To minimize downtimes and update operations impact, the update will be carried out over four different nights in the Italy Milano 1 Region.
Amsterdam and Frankfurt region will follow, and you’ll get specific notice beforehand.

================

Here’s what we are going to do:

Block Storage - we’ll increase the Ceph SATA PGs (placement groups) in order to reflect a more consistent growth. The new SSD-type pool will not be affected at all. During this operation, due to the internal cluster rebalancing operations on drives, you may experience inability to access volumes mounted on VMs. This kind of operation may result in longer execution cycles, therefore we’ll split it over four different nights to avoid uncomfortable unavailability windows.

Networking - we’ll apply some L3 availability improvements that will require us to migrate your vRouter to another gateway node. This will cause a 30 to 60 second internet glitch for your tenant.

Compute - we’ll proceed to update kernels and drivers (especially the network ones) in order to increase stability and gain a 5x bandwidth when transferring data between VMs running on different compute nodes.

================

During this planned maintenance (# 4 out of 4) we’ll proceed as follows:

NIGHT 4/4 - April 27th
11PM - 2AM CEST (UTC+2) -> Resize of Block Storage Cluster. Potential volume performance degradation.
2AM - 7AM CEST (UTC+2) -> E3-class Compute nodes upgrade and reboot, step 2/2. Downtime per node: 30 minutes.

=======================

We apologise for any inconvenience. Feel free to contact us for any possible question at support@entercloudsuite.com.
Best regards,

The Enter Cloud Suite Team
Posted on Apr 5, 15:34 CEST
Past Incidents
Apr 25, 2017

No incidents reported today.

Apr 24, 2017

No incidents reported.

Apr 23, 2017

No incidents reported.

Apr 22, 2017

No incidents reported.

Apr 21, 2017
Resolved - Dear all,
Based upon the IPMI logs, a power supply failure event triggered the rebalancing of whole the power feed on a secondary power supply, which also unexpectedly resulted in being faulty, and therefore leaving the node unpowered. This was as more strange as both power supplies are now working properly.
The investigation that followed led us to believe that a hardware failure is misleading the hardware power management to consider both of the power supplies faulty at once, and we involved our hardware provider in the research of the root cause.
In the meantime we'll provide to replace one at a time the parts involved and we'll test the system thoroughly before putting it back into production soon this weekend.

We are trying to work around some limitations in Neutron Kilo that did not allow us yet to apply integrated procedures for the vRouter migration, or to backport them from Mitaka.
We are in the process of writing some piece of software to fix this and to manage events like this in a more timely manner.
We think we can pretty soon reduce the downtime expected to few seconds instead of minutes, but this is just a temporary fix. We'll work in the meantime to a "zero ping loss" node failure management system, but this involves also upgrading to Neutron Mitaka as we expect to do in the next couple of months, where HA functionalities are natively developed.

We apologize for the inconvenience
Best Regards,

Enter Cloud Suite Staff
Apr 21, 18:41 CEST
Monitoring - All vRouters have been migrated on another node. All instances are back online. We are investigating about the failure.
Apr 20, 12:38 CEST
Identified - Dear all,
we are experiencing a L3 network node failure in the Germany region. Some vRouters may be down, therefore some tenants may have lost Internet connectivity. We'll proceed to migrate them on another node.
We are investigating and we'll be back soon with updates.
Regards,

Enter Cloud Suite Staff
Apr 20, 12:10 CEST
Completed - The update was successful and the maintenance, after a significant monitoring window, can be closed.
Apr 21, 18:22 CEST
In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary.
Apr 20, 23:00 CEST
Update - Dear all,
due to unexpected speedup progress during the first upgrade step, we are here slightly changing the schedule for tonight operations.

Here's the new plan:

STEP 2/4 - April 20th
11PM - 7AM CEST (UTC+2) -> E1 and E2-class Compute nodes upgrade and reboot, step 2/2. Downtime per node: 30 minutes.

As you can see, the block storage and the L3 network operations have been already completed in Step 1 on April 18th, and have consequently been removed from the schedule. This allows us to push the rollout of compute upgrades a little ahead. The overall maintenance window remains unchanged.
Hope this doesn't bother you too much.
Best regards,

Enter Cloud Suite Staff
Apr 20, 11:12 CEST
Scheduled - Dear Customer,
in order to improve network availability and bandwidth between nodes, to fix security vulnerabilities and known bugs and to increase Block Storage pool, we are planning a major update of our entire infrastructure.

To minimize downtimes and update operations impact, the update will be carried out over four different nights in the Italy Milano 1 Region.
Amsterdam and Frankfurt region will follow, and you’ll get specific notice beforehand.

================

Here’s what we are going to do:

Block Storage - we’ll increase the Ceph SATA PGs (placement groups) in order to reflect a more consistent growth. The new SSD-type pool will not be affected at all. During this operation, due to the internal cluster rebalancing operations on drives, you may experience inability to access volumes mounted on VMs. This kind of operation may result in longer execution cycles, therefore we’ll split it over four different nights to avoid uncomfortable unavailability windows.

Networking - we’ll apply some L3 availability improvements that will require us to migrate your vRouter to another gateway node. This will cause a 30 to 60 second internet glitch for your tenant.

Compute - we’ll proceed to update kernels and drivers (especially the network ones) in order to increase stability and gain a 5x bandwidth when transferring data between VMs running on different compute nodes.

================

During this planned maintenance (# 2 out of 4) we’ll proceed as follows:

STEP 2/4 - April 20th
11PM - 3AM CEST (UTC+2) -> Resize of Block Storage Cluster. Potential volume performance degradation.
12AM - 2AM CEST (UTC+2) -> L3 Network update. VM loss of Internet reachability up to 60 seconds.
2AM - 7AM CEST (UTC+2) -> E1 and E2-class Compute nodes upgrade and reboot, step 2/2. Downtime per node: 30 minutes.

NIGHT 3/4 - April 25th
11PM - 2AM CEST (UTC+2) -> Resize of Block Storage Cluster. Potential volume performance degradation.
2AM - 7AM CEST (UTC+2) -> E3-class Compute nodes upgrade and reboot, step 1/2. Downtime per node: 30 minutes.

NIGHT 4/4 - April 27th
11PM - 2AM CEST (UTC+2) -> Resize of Block Storage Cluster. Potential volume performance degradation.
2AM - 7AM CEST (UTC+2) -> E3-class Compute nodes upgrade and reboot, step 2/2. Downtime per node: 30 minutes.

=======================

We apologise for any inconvenience. Feel free to contact us for any possible question at support@entercloudsuite.com.
Best regards,

The Enter Cloud Suite Team
Apr 5, 15:30 CEST
Apr 19, 2017
Completed - Dear all,
we're in the process of completing the last operations needed to consider this first upgrade step completed.
The Ceph rebalancing is taking longer than expected, and we have an estimated time of completion around 11AM today. Nevertheless, since early this morning, block storage operations have returned to a normal state and -except a temporary randomly increased disk latency- we don't see any parameter out of a normal scale.
This latency effect is fading as time goes by.

In the meantime, let us share some feedbacks after this upgrade.

Block Storage - We are now running on Ceph version "J" (Jewel), the latest LTS ("Long-Term Supported") version. The upgrade included an extension of the PG's (Placement Groups) of the cluster, that will allow us to extend the cluster up to hundreds of TB per region without increasing the cluster size any more in the meantime. Also, we are preparing to switch from 2x replica of your data to a more secure 3x replica: the former replica set may be still enough already, but as long as we are increasing the number of drives it also raises the statistical risk of a single drive (that is, an OSD) failure, and we want to keep a fair timeframe and safety margin when dealing with common failures like mechanical ones.
The benefits from this updates will be on performance (IOPS) and also on drive access latency, starting from a minimum improvement of -10%. More details will follow when we'll have real time usage stats.
The bad news is that we have to apologize for the block storage unavailability that lea d to a 100% stuck of all volumes in the window 1:30AM-3:30AM. This was not expected and such behaviour never showed up during our tests in staging. We're investigating why this happened and how to avoid it in the future.
The good news , though, is that the whole cluster upgrade and PG increase was performed during one single run, therefore we are canceling all the block storage activities planned in the next steps. So, fortunately, you don't have to prepare for further downtimes.
WARNING: all volumes created after 8PM on April 18th cannot be attached to VMs existing and already running at that date. These VMs need a reboot to mount these volumes.

Networking - we have replaced the network nodes and the migration of your vRouter and DHCP server was almost flawless. Some of you may not have even noticed a single packet loss. Some VPNaaS may have to be restarted, we're currently working on this issue. Now you can benefit of an improved availability od the DHCP servers and we are ready to improve the availability of vRouters as well.

Computing - We have started the upgrade of a small batch of compute nodes as planned, and there are no significant updates on this activity. On the already updated nodes you may benefit from an increase bandwidth on your VMs, up to 9x or 10x the previous performance, both to Internet and to other VMs, as long as they are both on updated nodes.

That's it for today! Feel free to contact us for any question or issue at support@entercloudsuite.com.
Have a great day,

Enter Cloud Suite Staff
Apr 19, 10:41 CEST
In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary.
Apr 18, 23:00 CEST
Scheduled - Dear Customer,
in order to improve network availability and bandwidth between nodes, to fix security vulnerabilities and known bugs and to increase Block Storage pool, we are planning a major update of our entire infrastructure.

To minimize downtimes and update operations impact, the update will be carried out over four different nights in the Italy Milano 1 Region.
Amsterdam and Frankfurt region will follow, and you’ll get specific notice beforehand.

================

Here’s what we are going to do:

Block Storage - we’ll increase the Ceph SATA PGs (placement groups) in order to reflect a more consistent growth. The new SSD-type pool will not be affected at all. During this operation, due to the internal cluster rebalancing operations on drives, you may experience inability to access volumes mounted on VMs. This kind of operation may result in longer execution cycles, therefore we’ll split it over four different nights to avoid uncomfortable unavailability windows.

Networking - we’ll apply some L3 availability improvements that will require us to migrate your vRouter to another gateway node. This will cause a 30 to 60 second internet glitch for your tenant.

Compute - we’ll proceed to update kernels and drivers (especially the network ones) in order to increase stability and gain a 5x bandwidth when transferring data between VMs running on different compute nodes.

================

During this planned maintenance (# 1 out of 4) we’ll proceed as follows:

STEP 1/4 - April 18th
11PM - 3AM CEST (UTC+2) -> Resize of Block Storage Cluster. Potential volume performance degradation.
12AM - 3AM CEST (UTC+2) -> L3 Network update. VM loss of Internet reachability up to 60 seconds.
4AM - 8AM CEST (UTC+2) -> E1 and E2-class Compute nodes upgrade and reboot, step 1/2. Downtime per node: 30 minutes.

STEP 2/4 - April 20th
11PM - 3AM CEST (UTC+2) -> Resize of Block Storage Cluster. Potential volume performance degradation.
12AM - 2AM CEST (UTC+2) -> L3 Network update. VM loss of Internet reachability up to 60 seconds.
2AM - 7AM CEST (UTC+2) -> E1 and E2-class Compute nodes upgrade and reboot, step 2/2. Downtime per node: 30 minutes.

NIGHT 3/4 - April 25th
11PM - 2AM CEST (UTC+2) -> Resize of Block Storage Cluster. Potential volume performance degradation.
2AM - 7AM CEST (UTC+2) -> E3-class Compute nodes upgrade and reboot, step 1/2. Downtime per node: 30 minutes.

NIGHT 4/4 - April 27th
11PM - 2AM CEST (UTC+2) -> Resize of Block Storage Cluster. Potential volume performance degradation.
2AM - 7AM CEST (UTC+2) -> E3-class Compute nodes upgrade and reboot, step 2/2. Downtime per node: 30 minutes.

=======================

We apologise for any inconvenience. Feel free to contact us for any possible question at support@entercloudsuite.com.
Best regards,

The Enter Cloud Suite Team
Apr 5, 15:28 CEST
Apr 17, 2017

No incidents reported.

Apr 16, 2017

No incidents reported.

Apr 15, 2017

No incidents reported.

Apr 14, 2017

No incidents reported.

Apr 13, 2017

No incidents reported.

Apr 12, 2017

No incidents reported.

Apr 11, 2017

No incidents reported.