Cloud Migration and Availability Index

Infrastructure Migration

The Cloud Transition

The Cloud may not be in-your-face.  But it is pervasive, and gradually taking over many aspects of our traditional IT systems. Companies are not yet making wholesale transitions from existing data-centres and on-premise assets to Cloud. However, when infrastructure reviews occur, whether to upgrade or add new resources, the Cloud beckons. Questions about total cost of ownership (TCO), scalability, time-to-market, etc will influence decision makers.  For each one of these, the Cloud offers a compelling alternative. It is likely that in the next two decades, only a few companies will still maintain their infrastructure on premise.

The Status Quo - On-premise Deployment Design
Let us assume then that ACME plc has made a decision. Business has been persuaded, either by hype or fundamentals, that the Cloud is the strategic target. Architectural leadership has been mobilised and a decision taken to draw up a roadmap for Cloud adoption. What next? In this article, we look at four primary considerations that architects must carefully examine when migrating to the Cloud. These are: sizing, failover, scaling and access. Everything else builds on the foundation that is synthesised from these four dimensions.

Sizing: What Specification of Infrastructure Should be Provisioned

Statistics are invaluable. Node sizing should be empathetic to existing use profile. It may be okay to guess at first, but it saves so much time to know in advance. For each Cloud instance, the node(s) provisioned should be selected to meet latency and throughput required to support 120% of anticipated production load. The sizing could be either singular or plural. Singular, as in one node with enough capacity to bear all load; or plural, i.e. a number of nodes that can, between them, satisfy demand. But the baseline should exceed the present need.

Resizing in the Cloud may be quick and easy, but the decision making might not be so. If in doubt, over-provision. It is easy to downsize later, and the organisation avoids the risk of loss of business due to performance or availability problems. Default sizing is simple, i.e. geography localised and singular. But there could be exceptional scenarios where geographic distribution must be combined with plural sizing. More about that later.

Failover: How is System Failure Mitigated

Given proper sizing, as above, the next dimension to consider is failure and recovery. If or when a properly sized machine fails; what happens next? Let’s take the simple approach first and we will revisit this later. There should be a distribution of node(s) across Cloud locations, so that the failure of one node does not result in service unavailability. Service recovery should occur in a different Cloud location. This reduces the likelihood of contagion from the original failure location while maintaining service continuity. An interesting aspect of failure management is implicit resilience, i.e. what measure of interuption can our infrastructure handle?

The complement of the nodes that provide a service across Cloud location(s) is a resource group. The group resilience is the count of simultaneous failures that can be managed while maintaining SLAs. The higher the count, the larger the number of nodes and Cloud locations involved. Resiliency has a price tag though. More machines (virtual) will multiply cost and increase the percentage of idle/redundant resources in the Cloud platform.

Scaling: How are Additional Resources Provisioned

As resource demand grows organically, or due to unexpected spikes, infrastructure should respond, automagically! Traditionally, scaling was a bureaucractic and technical journey. With Cloud, scaling is merely a change of configuration. Where singular sizing has been used, another node of the same size could be added. This is horizontal scaling. Adding more nodes to singular sized nodes would multiply capacity. It is linear, but there is no guarantee of commensurate increase in demand or resource usage. There is an alternative design that is more efficient: programmatic vertical scaling. A simple algorithm can be applied to automatically scale resources; up or down, by a fraction rather than a multiple.

Cloud platforms record a raft of events about the resources deployed. Customers can tap in to these events to scale in response to demand. On AWS, CloudWatch alarms can trigger a Lambda function, which in turn effects a rolling upgrade on EC2 nodes; upscaling node size before autoscaling. By leveraging statistics for baseline sizing and monitoring demand, we can guarantee day zero availability and decent response in infrastructure provisioning. Increasing capacity as demand grows and shrinking it if or when spikes even out.

Access: How do Clients Connect to Cloud Services

The fourth dimension is access. As on-premise, so also with Cloud. There is no value in having resources that are locked away from everyone and everything. Our clients need access to our Cloud based services, so also partners involved in our service chain. Unless we are migrating all at once, it is likely that we will also need access to some on-premise infrastructure. Our design must provide the access paths and levels, as well as the constraints that keep authorised clients within band and everyone else out. To achieve this we would use such facilities as the Virtual Private Network (VPN), the load balancer, firewalls and others. Beyond the basics of who’s in and who’s out though, there is a service that we must provide to clients and partners.

The key here is to be simple and unobtrusive; placing minimal burdens on clients, partners and our applications/services.

By default we would use load balancers to decouple clients from service providers. Cloud load-balancers spread requests among available service providers. They are not geography specific and simplify access and security for clients and service provider.  Our Cloud landscape is elegant and uncomplicated, with singular entry points for each service.  One consideration could however force radical change to this default: Geographic Affinity (GA).  Geographic affinity is a requirement to pin clients to a specific physical/geographical service provider.  It could be zonal or regional. GA is often driven by regulatory, localisation, performance or security concerns.

But some GA drivers can be conflicting. For example, performance (latency sensitive applications) might be a problem where localisation (regional) is required. Invariably, GA tilts our architecture towards plurality of nodes and complications in managing performance and synchronisation of state. Architects must balance, sometimes conflicting, needs to avoid creating virtual silos in the Cloud.

Cloud Deployment Design
Cloud Chaos

The Availability Index

So far we have been working forwards from an existing status quo to a target architecture. We have also adopted an exclusively technical perspective. What would be better is to take a business perspective. To approach our context top down. We should ask: what infrastructure is needed to support our business vision, now and into the near future? What level of availability is enough to provide service that exceeds client needs. In asking these questions, we encounter a new concept: “the Granularity of Perception”. This can be described as the number of microseconds, milliseconds, seconds, minutes, or more that impacts our service(s), as perceived by clients. Simply put: how slowly can we blink before our clients start to notice that our eyes have moved. As this number (granularity) increases, the required level of availability decreases. The table below provides a rough guide, with descriptions.

Availability Index Description
1 Cluster enabled, auto recovery, no fail 24×7, latency intolerant, high-frequency, geography affinity
3 Cluster enabled, auto recovery, no fail 24×7, latency intolerant, medium frequency
5 Cluster enabled, auto failover, business hours, latency tolerant, low frequency
7 Non clustered, manual failover, business hours, latency tolerant, low frequency

The goal of architects should be to design a Cloud platform that delivers a granularity that is finer than the perception of clients.  Using the table above as a guide, architects should play out scenarios with the service portfolio against each index.  Starting with the least to the highest.  Once the required availability index is determined, it should be relatively easy to identify the dimensions to support it.

Conclusion

As organisations embark on the journey of digital transformation, one early change is often Cloud adoption. This is because the Cloud provides a catalysing medium in which many solutions are easier and quicker to provision.  In moving from on-premise/data-centre resources to the Cloud, architects must resist the temptation to simply lift-and-shift.  Rather, the digital transformation journey should re-examine the fitness-for-purpose of existing solutions, platforms and infrastructure. There is a famous quote by Jack Welch, former CEO of General Electric. He said, If the rate of change on the outside exceeds the rate of change on the inside, then the end is near.. In a rapidly evolving globalised economy, business agility is becoming a precondition for survival.

The availability index is a simple, logical, technology-agnostic technique for conceptual reasoning about a Cloud landscape.  Determination of the availability index helps to reveal shared profiles for similar subsystems.  The profiles are logical and help estimate the resources required to support a genre of subsystem.  Each logical profile can then be mapped to specific Cloud infrastructure and captured as customisable templates.  The logical profiles provide architects with a starting point for solution designs.  The infrastructure templates serve as a baseline for DevOps teams.  Each artefact is likely to go through a number of evolutions.  However, it is vital that both views are kept in sync at all times.

Organisations that leverage this approach will see a marked improvement in the consistency of infrastructure designs.  Other benefits include faster turnaround of solutions, and systems that balance technical capability with business needs and aspirations. Architecture teams that leverage the availability index position their organisations for superior agility and competitiveness in the global economy.


Oyewole, Olanrewaju J (Mr.)
Internet Technologies Ltd.
lanre@net-technologies.com
www.net-technologies.com

AWS vs Oracle Cloud – My Experience

Oracle Cloud

Preamble

Having had a good impression from my use of Amazon Web Services (AWS), I decided to take a look at Oracle Cloud. AWS is of course the market leader but Oracle, Microsoft et al have joined the fray with competing services. What follows is nothing like a Gartner report, rather it is my personal experience of the Oracle and AWS services, as an end user. AWS is already well known and is perhaps the benchmark by which all others are presently measured. This article maintains that same perspective. The narrative is of my experience of Oracle Cloud and how it compares with AWS.  To keep this brief, I will only mention the cons.

Setting Up

As with every online service, Cloud or otherwise, you need to sign up and configure an account with the provider. The Oracle Cloud account set up involved more steps than the AWS. The telling difference though was in the time it took for the account to be available for use. Whereas the AWS account was all ready to go within a day, it took about 3 working days for my Oracle account.

The second difference was in communication. AWS sent me one email with all I needed to get started; Oracle sent me about 5 emails, each with useful information. I had some difficulty logging on to the Oracle service at first. But this was because I thought, wrongly, that the 3 emails I had received contained all that I needed to log in. The 4th email was the one I needed and with it in hand, login was easy and I could start using the serivces – the 5th email was for the VPN details.

Oracle Cloud Services

Having set up the account, I was now able to login and access the services. I describe my experience under four headings: user interface, service granularity, provisioning and pricing.

:User Interface

I will describe the interface usability in three aspects: consistency, latency, and reliability.  First is consistency.  On logging in to the Oracle Cloud, there is an icon on the top left hand corner that brings up the dashboard.  This is similar to AWS.  However, clicking that same button in the default, database, storage and SOA home pages results in a different list of items in the dashboard display.  This can be confusing as users may think they have done something wrong, or lost access to one of their cloud services.  The second, latency, is also a big problem with some in-page menus.  Response time for drop-down lists can be painfully slow and there is no visual indicator (hourglass) of background processing.  In extreme cases, latency becomes an issue of reliability.  There are times when in-page menus simply failed to display despite several clicks and page refreshes.

:Service Granularity

The area of concern is IaaS, and the two services I had issue with were compute and storage.  The choice of OS, RAM, CPU, etc. available when selecting a compute image is quite limited.  Mostly Oracle Linux and a chained increase in RAM and CPU specifications.  When creating storage; it appears that there is only one “online” storage class available – standard container.  The usage of the terms “container” and “application container” was a bit confusing in this context.  This is especially so when trying to create storage for a database that will be used by Oracle SOA Cloud.

:Provisioning

Provisioning is the fulfilment of the request to create a service package (IaaS, PaaS or SaaS).  The time it took to create a database (PaaS) instance was in excess of 30 minutes.  Perhaps this was due to the time of day and high concurrency.  Nevertheless, given that I could run a script to do same on my laptop within 10 minutes, one would expect equal or better from the cloud.  The delay is even longer with creation of the Oracle SOA Cloud instance; this took well over 2 hours to complete.  Scripted creation of instances would be much quicker on a modest PC.  For cloud services, images should provide even quicker initialisation of instances from templates.

:Pricing

This could be the elephant in the room.  Whereas options were few and insufficient in IaaS, the array of pricing for almost identical PaaS offerings was long and rather confusing.  Unlike AWS, there are only two schedules: hourly or monthly.  There are no options to reserve or bid for capacity.  Finally, even the lowest prices are quite high from the perspective of an SME.  The granularity of billing needs to be reduced or the composition of IaaS and PaaS should give greater flexibility.  Either way, entry prices need to be attractive to a larger segment of the market.

Summary

A great impediment to comparison is the short trial period allowed for Oracle Cloud services.  The 30 day allowance is insufficient, except for those with a prepared plan and a dedicated resource.  Such an exercise would in itself amount to no more than benchmarking, leaving little room for gaining a real feel for the services.

We should set aside the latency issues in setup, user interface and provisioning.  These are easy problems to resolve and it is likely that Oracle will fix these very soon.  The output of provisioning for Oracle-specific PaaS and SaaS services was very good and compares favourably with AWS.  One advantage of the Oracle PaaS is the simple configuration of requisite security to gain access for the first time.  This meant that the PaaS services were very quickly available for use without further tweaking of settings.  The shortcoming, as previously stated, is that provisioning can take quite a while.  Overall, the use of the Oracle PaaS was seamless and integration with on-premise resources was easy.  The only exception being JDeveloper, which could not integrate directly with Oracle SOA Cloud instances.

Competition

AWS has the benefit of early entry and has a far richer set of services.  But the feature set is not an issue since Oracle is not into Cloud as an end, but as a means to extend the availability of existing products and services.  However, even in the limited subset where there is an overlap, AWS provides finer granularity of services/features, better interface usability, and a much more alluring pricing model.

Oracle has fought many corporate and technology battles over the years.  The move to Cloud space is yet another frontier and changes will be needed in the following areas.

  • Open up the options and granularity of IaaS offerings
  • Address significant latencies in provisioning PaaS services
  • Revise the pricing model to accommodate SMEs
  • Totally refresh the flow and performance of web pages

The Cloud has arrived, like the Internet, underestimated initially, but it promises likewise to revolutionise IT.  This market will certainly benefit from competition and I surely hope that Oracle will take up the gauntlet and offer us a compelling alternative to AWS – horizontal and vertical.

God bless!

Oyewole, Olanrewaju J (Mr.)
Internet Technologies Ltd.
lanre@net-technologies.com
www.net-technologies.com
Mobile: +44 793 920 3120

Impressive Amazon Web Services: First Glance

Impressive Amazon Web Services (AWS)

Amazon Web Services: Background

Amazon Web Services (AWS) have revolutionised the way we view IT provisioning.  AWS makes so many things easier, and often cheaper too.  The benefits scale from the SME right up to corporates; no segment is left out.  Complexity is abstracted away, and with a little effort, large and/or complex systems can be built with a few clicks and some configuration.

Architecture

We decided to take a quick look and see just how much the AWS could offer low-budget SMEs.  Using our company’s existing platform as the subject.  We have one Oracle database and a handful of MySQL databases; an application server and a Web Server fronting for the application server and several CMS-driven sites.  The application server runs Java web services that use data from the Oracle database.  The web server hosts the pages for the Java application.  It also servers a number of WordPress/PHP sites that run on data from the MySQL databases.  The logical view is illustrated in the diagram below:
AWS nettech Logical ViewWe could map the logical view to one-to-one service units in AWS, or rationalise the target resources used.  AWS provides services for computation for web and application (EC2) Shell scripting (OpsWorks), data (RDS) and static web and media (S3), and other useful features; Elastic IP, Lambda, IAM.  So, we have the option to map each of the logical components to an individual AWS service.  This would give us the most flexible deployment and unrivalled NFR guarantees of security, availability and recoverability.  However, there would be a cost impact, increased complexity, and there could be issues with performance.

Solutions

Going back to our business case and project drivers; cost reduction is highlighted.  After some consideration two deployment options were produced (below), and we therefore chose the consolidated view. The Web, application and data components were targeted at the EC2 instance as they all require computation facilities.  All the media files were targeted at the S3 bucket.  The database data files could have been located on the S3 bucket but for the issue of latency, and costs that would accumulate from repeated access.

AWS nettech physical viewThe media files were targeted to the S3 bucket due to their number/size (several Gbs).  The separation ensures that the choice of EC2 instance is not unduly influenced by storage requirements.  The consolidated view allows us to taste-and-see; starting small and simple.  Over time we will monitor, review and if need be, scale-up or scale-out to address any observed weaknesses.

Migration

Having decided on the target option, the next thing was to plan the migration from the existing production system.  An outline of the plan follows:

  1. Copy resources (web, application, data, media) from production to a local machine – AWS staging machine
  2. Create the target AWS components  – EC2 and S3, as well as an Elastic IP and the appropriate IAM users and security policies
  3. Transfer the media files to the S3 bucket
  4. Initialise the EC2 instance and update it with necessary software
  5. Transfer the web, application and data resources to the EC2 instance
  6. Switch DNS records to point at the new platform
  7. Evaluate the service in comparison to the old platform

AWS nettech physical view III

Implementation

The time arrived to actualise the migration plan.  A scripted approach was chosen as this allows us to verify and validate each step in detail before actual execution.  Automation also provided a fast route to the status quo ante, should things go wrong.  Once again we had the luxury of a number of options:

  • Linux tools
  • Ansible
  • AWS script (Chef, bash, etc.)

Given the knowledge that we had in-house and the versatility of the operating system (OS) of the staging machine, Linux was chosen.  Using a combination of AWS command line interface (CLI) tools for Linux, shell scripts, and the in-built ssh and scp tools the detailed migration plan was to be automated.  Further elaboration of the migration plan into an executable schedule produced the following outline:

  1. Update S3 Bucket
  2. Copy all web resources (/var/www) from the staging machine to the S3 bucket
  1. Configure EC2 Instance
  2. Install necessary services: apt update, Apache, Tomcat, PHP, MySQL
  3. Add JK module to Apache, having replicated required JK configuration files from staging machine
  4. Enable SSL for Apache … having replicated required SSL certificate files
  5. Fix an incorrect default value in Apache’s ssl.conf
  6. Configure group for ownership of web server files www
  7. Configure file permissions in /var/www
  8. replicate MySQL dump file from staging machine
  9. Recreate MySQL databases, users, tables, etc.
  10. Restart services: MySQL, Tomcat, Apache
  11. Test PHP then remove the test script …/phpinfo.php
  12. Install the S3 mount tool
  13. Configure the S3 mount point
  14. Change owner, permissions on the S3 mounted directories and files – for Apache access
  15. Replicate application WAR file from staging machine
  16. Restart services: MySQL, Tomcat, Apache
  1. Finalise Cutover
  2. Update DNS records at the DNS registrar and flush caches
  3. Visit web and application server pages

Anonymised scripts here: base, extra

A few observations are worthy of note, regarding the use of S3.  AWS needs to make money on storage.  It should therefore not be surprising that updates to permissions/ownership, in addition to the expected read/write/update/delete, count towards usage.  Access to the S3 mount point from the EC2 instance can be quite slow.  But there is a workaround: use aggressive caching in the web and application servers.  Caching also helps to reduce the ongoing costs of repeated reads to S3 since the cached files will be hosted on the EC2 instance.  Depending on the time of day, uploads to S3 can be fast or very slow.

Change Management

The cut-over to the new AWS platform was smooth. The web and application server resources were immediately accessible with very good performance for the application server resources.  Performance for the web sites with resources on S3 was average.  Planning and preparation took about two days.  The implementation of the scripts for migration automation took less than 30 minutes to complete.  This excludes the time taken to upload files to the S3 bucket and update their ownership and permissions.  Also excluded is the time taken to manually update the DNS records and flush local caches.

Overall, this has been a very successful project and it lends great confidence to further adoption of more solutions from the AWS platform.

The key project driver, cost-saving, was realised, with savings of about 50% in comparison with the existing dedicated hosting platform.  Total cost of ownership (TCO) improves comparatively as time progresses.  The greatest savings are in the S3 layer, and this might also improve with migration to RDS and Lightsail.

In the next instalment, we will be looking to extend our use of AWS from the present IaaS to PaaS.  In particular, comparison of the provisioning and usability of AWS and Oracle cloud for database PaaS.  Have a look here in the next couple of weeks for my update on that investigation.

 


Oyewole, Olanrewaju J (Mr.)
Internet Technologies Ltd.
lanre@net-technologies.com
www.net-technologies.com
Mobile: +44 793 920 3120