Skip to content

Blog

Designing and Planning GCP Solutions for Business Requirements

  • Business Use Case & Product Strategy
  • Cost Optimization
  • Dovetail with Application Design
  • Integration with External Systems
  • Movement of Data
  • Security
  • Measuring Success
  • Compliance and Observability

Business requirements dictate technical requirements implicitly. From statements like:

  • EHR Healthcare provides B2B services to various entities, including vendors, insurance providers, and network directories.
  • Different entities will need to have varying levels of access to read and modify records and information. This implies the need for a robust access control system.
  • Given the nature of their work, EHR Healthcare needs to ensure that their services are always available. High availability is thus a core business requirement.
  • Some of the information that entities will access is regulated, so compliance with relevant data protection and privacy laws is a must.
  • Confidentiality is crucial since EHR Healthcare deals with sensitive health data.
  • The company wants to track the number and type of data accessed and gain insights into trends. This suggests a need for a comprehensive analytics solution.
  • Different entities involved possess varying levels of expertise, which might require the development of user-friendly interfaces or provision of training for the effective use of EHR Healthcare’s systems.

::: tip Minimal Effort Predictions Cloud AutoML is a cloud-based tool that allows developers to train machine learning models with minimal effort. It is designed to make the process of training machine learning models easier and faster. Cloud AutoML is based on the Google Cloud Platform and offers a variety of features that make it a powerful tool for machine learning. :::

  • A publicly exposed API or set of APIs needs to be developed to facilitate interactions between various entities.
  • Access restrictions must be applied at the API level to adhere to the varying access rights of different entities.
  • There will be involvement of legacy systems due to insurance entities. This implies the need for systems integration or migration strategies.
  • Redundant infrastructure is required to ensure high availability and continuous operation of the services.
  • Data lifecycle management must be implemented, considering regulation, insights, and access controls.
  • Given the nature of their work, EHR Healthcare needs to employ Cloud Machine Learning to build insight models faster than they can be planned and built. This indicates a requirement for machine learning capabilities in their infrastructure.
  • Mountkirk Games develops and operates online video games. They need a robust and scalable solution to handle high scores and player achievements.
  • They aim to collect minimal user data for personalizing the gaming experience, complying with data privacy regulations.
  • The solution must be globally available to cater to their worldwide player base.
  • They seek low latency to ensure a smooth and responsive gaming experience.
  • Mountkirk Games expresses interest in Managed services which can automatically scale to meet demand.
  • A globally available high score and achievement system is needed to keep track of player progress and milestones.
  • User data needs to be collected and processed in a manner that is privacy-compliant and secure.
  • The system must provide low latency to ensure a seamless gaming experience, which may require a global distribution of resources.
  • Managed services can be used to handle automatic scaling, reducing the overhead of manual resource management.

::: tip Business to Technical Requirements When designing a new project, while collecting and studying business requirements, you’ll have to translate those into technical requirements. You’ll find that there’s not a one to one relationship. One technical solution may meet two business requirements. While one business requirement might encapsulate several solutions. :::

  • TerramEarth manufactures heavy equipment for the construction and mining industries. They want to leverage their extensive collection of IoT sensor data to improve their products and provide better service to their customers.
  • They aim to move their existing on-premises data infrastructure to the cloud, indicating a need for a comprehensive and secure cloud migration strategy.
  • IoT data needs to be ingested and processed in real-time. This involves creating a robust pipeline for data ingestion from various IoT devices, and real-time data processing capabilities.
  • A robust data analytics solution is needed to derive insights from the sensor data. This requires the deployment of big data analytics tools that can process and analyze large volumes of sensor data.
  • A migration plan is needed to move existing data and systems to the cloud.
  • This involves choosing the right cloud services for storage, computation, and analytics, and planning the migration process to minimize downtime and data loss.

::: tip Extract, Transform, Load It is what it says. It takes large volumes of data from different sources. Transforms it to useable data, and makes available the results somewhere for retrieval by others.

Cloud Datafusion handles these tasks for data scientists and makes it easy to transfer data between various data sources. It offers a simple drag-and-drop interface that makes it easy to connect to different data sources, transform and clean data, and load it into a centralized data warehouse. Cloud Datafusion is a cost-effective solution for businesses that need to quickly and easily integrate data from multiple sources. :::

  • The Helicopter Racing League (HRL) organizes and manages helicopter races worldwide. They aim to enhance the spectator experience by providing real-time telemetry and video feed for each race.
  • HRL wants to archive all races for future viewing on demand. This will allow fans and analysts to revisit past races at their convenience.
  • A robust data analytics solution is required to gain insights into viewer behavior and preferences. This will help HRL understand their audience better and make data-informed decisions to improve the viewer experience.
  • The solution must be highly available and scalable to handle spikes during race events. This is essential to ensure a seamless live streaming experience for viewers, regardless of the number of concurrent viewers.
  • Real-time data processing capability is needed to handle race telemetry data. This involves setting up a system that can ingest and process high volumes of data in real time.
  • A scalable video streaming solution is needed to broadcast races worldwide. This system must be capable of handling high video quality and large volumes of concurrent viewers without degradation of service.
  • Archival storage is needed for storing race videos for on-demand viewing. This involves choosing a storage solution that is cost-effective, secure, and capable of storing large volumes of video data.
  • An analytics solution is needed for analyzing viewer behavior and preferences. This requires the deployment of data analytics tools that can process and analyze viewer data to provide actionable insights.

Business requirements will affect application design when applications are brought into the cloud. In every set of requirements, stated or unstated will be the desire to reduce cost.

  • Licensing Costs
  • Cloud computing costs
  • Storage
  • Network Ingress and Egress Costs
  • Operational Personnel Costs
  • 3rd Party Services Costs
  • Sanctions on missed SLA costs
  • Inter-connectivity charges

These contribute to the Total Cost Ownership(TCO) of a cloud project.

Google has a set of managed services like Cloud SQL which remove the low level work from running these services yourself.

Some of these include:

  • Compute Engine
    • Virtual machines running in Google’s data center.
  • Cloud Storage
    • Object storage that’s secure, durable, and scalable.
  • Cloud SDK
    • Command-line tools and libraries for Google Cloud.
  • Cloud SQL
    • Relational database services for MySQL, PostgreSQL, and SQL Server.
  • Google Kubernetes Engine
    • Managed environment for running containerized apps.
  • BigQuery
    • Data warehouse for business agility and insights.
  • Cloud CDN
    • Content delivery network for delivering web and video.
  • Dataflow
    • Streaming analytics for stream and batch processing.
  • Operations
    • Monitoring, logging, and application performance suite.
  • Cloud Run
    • Fully managed environment for running containerized apps.
  • Anthos
    • Platform for modernizing existing apps and building new ones.
  • Cloud Functions
    • Event-driven compute platform for cloud services and apps.
  • And dozens more.

To see an exhaustive list, please see My List of All GCP Managed Services

::: tip Reducing Latency on Image Heavy Applications Google Cloud CDN is a content delivery network that uses Google’s global network of edge locations to deliver content to users with low latency. It is a cost-effective way to improve the performance of your website or web application by caching static and dynamic content at the edge of Google’s network. Cloud CDN can also be used to deliver content from your own servers, or from a content provider such as a CDN or a cloud storage service.

Using Google’s Cloud CDN in combination with multi-regional storage will reduce load time. :::

Many times when computing needs are considered, certain services with availability requirements lower than others can benefit from reduced-level services. If a job that must be processed can have those processes paused during peak times but can otherwise run normally, it can be preempted.

Reduced level services:

  • Preemptible Virtual Machines
  • Spot VMs
  • Standard Networking
  • Pub/Sub Lite
  • Durable Reduced Availability Storage

Preemptible VMs are shutdown after 24 hours and Google can pause them at any time. Running process on those vms do not stop but they slow to a crawl and speed back up when services become available. You can write a robust application by setting it up to detect the preemptions. These VMs cost 60-90% or so less than their standard counterparts.

Preemptible VMs also get discounts on volumes and GPUs. Managed resource groups will replace a preempted VM when it is suspended after 24 hours. Preemptible VMs can use other services to reduce the overall cost of using those services with VMs.

::: warning Live Migration Preemptible and Spot VMs are not eligible for live migration. :::

Spot VMs are the next generation Preemptible virtual machine. Though spot VMs are not automatically restarted, they can run for longer than 23 hours. Spot VMs can be set to a stopped state or be deleted on preemption. With a managed resource group of spot VMs, one can set the VMs to be deleted and replaced when resources are available.

Premium Networking is the default, but Standard Tier Networking is a lower performing option. With Standard Tier Networking, Cloud Load Balancing is only regional load balancing and not global balancing. Standard Networking is not compliant with the global SLA

Pub/Sub is an extremely scalable but Pub/Sub Lite can be scaled providing lower levels of cost-effective service.

Pub/Sub come with features such as parallelism, automatic scaling, global routing, regional and global endpoints.

Pub/Sub Lite is less durable and less available than Pub/Sub. Messages can only be replicated to a single zone, while Pub/Sub has multizonal replication within a region. Pub/Sub Lite users also have to manage resource capacity themselves.

But if it meets your needs, Pub/Sub Lite is 80% cheaper.

App Engine Standard allows scaling down to zero, though the trade offs are that you can only use a set of languages, can only write to /tmp with java, can’t write with python. Standard apps cannot access GCP services, cannot modify the runtime, or have background processes, though they can have background threads.

These are buckets which have an SLA of 99% availability instead of equal to and greater than 99.99% availability. Storage operations are divided into class A and class B operations:

APIClass A($0.10*/10,000 ops)Class B($0.01*/10,000 ops)
JSONstorage.*.insert1storage.*.get
JSONstorage.*.patchstorage.*.getIamPolicy
JSONstorage.*.updatestorage.*.testIamPermissions
JSONstorage.*.setIamPolicystorage.*AccessControls.list
JSONstorage.buckets.liststorage.notifications.list
JSONstorage.buckets.lockRetentionPolicyEach object change notification
JSONstorage.notifications.delete
JSONstorage.objects.compose
JSONstorage.objects.copy
JSONstorage.objects.list
JSONstorage.objects.rewrite
JSONstorage.objects.watchAll
JSONstorage.projects.hmacKeys.create
JSONstorage.projects.hmacKeys.list
JSONstorage.*AccessControls.delete
XMLGET ServiceGET Bucket (when retrieving bucket configuration or when listing ongoing multipart uploads)
XMLGET Bucket (when listing objects in a bucket)GET Object
XMLPOSTHEAD

* DRA Pricing

Sort your data along a spectrum of most frequent to infrequent use. Spread your data along the following:

  • Memory Caching
  • Live Database
  • Time-series Database
  • Object Storage
    • Standard
    • Nearline
    • Coldline
    • Archive
  • Onprem, Offline storage

Objects have a storage class of either standard, nearline, coldline, or archive. Storage classes can be changed on single objects along this direction. You cannot move a storage class to a more frequent use class, only the opposite. You can move frequency lower on an object.

standard -> nearline -> coldline -> archive

storage classstandardnearlinecoldlinearchive
accessing at least once perweekmonthquarteryear

::: tip Time series data Time series data is a type of data that is collected over time. This data can be used to track trends and patterns over time. Time series data can be collected manually or automatically. Automatic time series data collection is often done using sensors or other devices that collect data at regular intervals. This data can be used to track the performance of a system over time, or to predict future trends. These are examples of time-series data:

  • MRTG graph data
  • SNMP polled data
  • Everything a fitbit records
  • An EKG output

Time series data is best stored in BigTable which handles this workload better than BigQuery or CloudSQL. :::

Once we have these requirements, our minds already start placing the need in the right product, though we may be provisionally thinking about it. The same thing should be happening when you think of dependencies.

Let’s review the business needs of our use cases.

Business requirements dictate technical requirements implicitly. From statements like:

Section titled “Business requirements dictate technical requirements implicitly. From statements like:”
  • EHR Healthcare provides B2B services to various entities, vendors, insurance providers, network directories, etc.
  • Different entities have different access rights to read and edit records and information.
  • Different entities possess varying levels of expertise.
  • The services must always be up and running.
  • Some information accessed by entities is regulated.
  • Confidentiality is of utmost importance.
  • The company wishes to track the number and type of data accessed to gain insights into trends.
  • They will need to publicly expose an API or set of them.
  • Access restrictions must be applied at the API level.
  • There will be legacy systems involved because of insurance entities.
  • Infrastructure redundancy is necessary.
  • Data Lifecycles must consider regulation, insights, and access controls.
  • Cloud Machine Learning can be leveraged to build insight models faster than they can be planned and built. ::: tip Cloud Dataflow Cloud dataflow is a cloud-based data processing service for batch and streaming data. It is a fully managed service that is designed to handle large data sets with high throughput and low latency. Cloud dataflow is a serverless platform that can scale automatically to meet the needs of your application. It is a cost-effective solution that allows you to pay only for the resources you use. :::
  • The Helicopter Racing League (HRL) organizes and manages helicopter races worldwide.
  • HRL wants to enhance the spectator experience by providing real-time telemetry and video feed for each race.
  • HRL wants to archive all races for future viewing on demand.
  • A robust data analytics solution is needed to gain insights into viewer behavior and preferences.
  • The solution must be highly available and scalable to handle spikes during race events.
  • Real-time data processing capability is needed to handle race telemetry data.
  • A scalable video streaming solution is needed to broadcast races worldwide.
  • Archival storage is needed for storing race videos for on-demand viewing.
  • An analytics solution is needed for analyzing viewer behavior and preferences.
  • The solution must be highly available and scalable to handle traffic spikes during races. ::: tip Service Level Objectives Business requirements typically demand these common type of SLOs.
  • High Availability SLO Always accessible.
  • Durability SLO Always kept.
  • Reliability SLO Always meeting workloads.
  • Scalability SLO Always fitting its workloads.

:::

  • Mountkirk Games develops and operates online video games.
  • They need a solution to handle high scores and player achievements.
  • They need to collect minimal user data for personalizing the gaming experience.
  • The solution must be globally available and provide low latency.
  • They are interested in Managed services which can automatically scale.
  • A globally available high score and achievement system is needed.
  • User data needs to be collected and processed in a privacy-compliant manner.
  • The system must provide low latency for a smooth gaming experience.
  • Managed services can be used to handle automatic scaling.

::: tip Global Up-to-Date Data Cloud spanner is the best option for an SQL based global records storage with a High Consistency SLO. :::

  • TerramEarth manufactures heavy equipment for the construction and mining industries.
  • They want to leverage their vast trove of IoT sensor data to improve their products and provide better service to their customers.
  • They want to move their existing on-premises data infrastructure to the cloud.
  • IoT data needs to be ingested and processed in real-time.
  • A robust data analytics solution is needed to derive insights from the sensor data
  • A migration plan is needed to move existing data and systems to the cloud.

::: tip Cloud Dataproc Cloud Dataproc is a cloud-based platform for processing large data sets. It is designed to be scalable and efficient, and to handle data processing workloads of all types. Cloud Dataproc is based on the open-source Apache Hadoop and Apache Spark platforms, and provides a simple, cost-effective way to process and analyze data in the cloud. :::

Business requirements help us know what platforms to connect and how they will work. Those same requirements will tell us what data is stored, how often, for how long, and who and what workloads have access to it.

What is the distance between where the data is stored and where it is processed? What volume of data will be moved between storage and processing during an operation or set of operations? Are we using stream or batch processing?

The first question’s answer influences both the read and write times and the network costs associated with transferring the data. Creating replicas in regions nearer to the point of processing will increase read times, but will only decrease network costs in a ‘replicate one time, read many times’ situation. Using storage solutions with a single write host will not improve replication times.

The second questions’s answer influences time and cost as well. On a long enough timeline, all processes fail. Build shorter running processes and design reconnecting robust processes.

The third question’s answer and future-plans answer will influence how you perform batch processing. Are you going to migrate from batch to stream?

StyleProsCons
Batchtolerates latency, on time datainterval updates, queue buildup
Streamrealtimelate/missing data

::: tip If using VMs for batch processing, use preemptible VMs to save money. :::

At what point does data lose business value? With email, the answer is never, people want their past emails, they want all their backed up emails delivered. But with other kinds of data, like last year’s deployment errors, lose certain levels of value as it becomes less actionable now.

You’ll have to design processes for removing less valuable data from persistent storage locations and stored in archival locations or deleted. How long data is stored for each set of data will have a great affect on an architectural design.

The volumes of data and how it will scale up when business goals are met or exceeded need to be planned for or else there will be a dreaded redesign and unnecessary iterations.

Storage related managers will need to know the volume and frequency of data storage and retrieval so they can plan for their duties and procedures which touch your design.

::: tip Factors of Volume and Load The main factors that affect volume are the number of data generators or sensors. If you consider each process that can log as a sensor, the more you log the higher your volume in Cloud Logging, the higher the processing costs in BigQuery and so forth.

  • Number of hosts
  • Number of logging processes
  • Network Connectivity
  • Verbosity Configuration

:::

Many businesses are under regulatory constraints. For example, “Mountkirk” receives payment via credit cards. So they must be PCI compliant and financial services laws apply their receiving payment.

  • Health Insurance Portability and Accountability Act (HIPAA) is United States legislation that provides data privacy and security regulations for safeguarding medical information.
  • General Data Protection Regulation (GDPR) a set of regulations that member states of the European Union must implement in order to protect the privacy of digital data.
  • The Sarbanes-Oxley (SOX) Act a number of provisions designed to improve corporate governance and address corporate fraud.
  • Children’s Online Privacy Protection Act (COPPA) is a U.S. law that requires website operators to get parental consent before collecting children’s personal information online.
  • Payment Card Industry Data Security Standard (PCI DSS) is a set of security standards designed to protect cardholders’ information.
  • Gram-Leach Bliley Act (GLBA) designed to protect consumers’ personal financial information held by financial institutions.

::: tip Compliance TLDR In the United States

  • SOX regulates financial records of corporate institutions.
  • HIPPA regulates US companies protecting consumer access and the privacy of medical data.
  • PCI DSS is a standard for taking credit cards which processing underwriters may require an e-commerce vendor to abide by.

In Europe

  • GDPR regulates information stored by companies operating in Europe for its protection and privacy.

:::

When we know what regulations apply to our workload it is easier to plan our design accordingly. Regulations can apply to jurisdictions like HealthCare or like the State of California. Operating within a jurisdiction means you’ll have to research your industry’s governance and what it may be subject to.

Regulations on data slant toward protecting the consumer and giving them greater rights over their information and who it is shared with. You can review privacy policies per Country at Privacy Law By County

Architects not only need to comply with these laws, but kindle the the spirit of the law within themselves, that of protecting the consumer. Architects need to analyze each part of their design and ask themselves how is the consumer protected when something goes wrong?

Access controls need to cascade in such a way that permissions are restrictive and then opened, and not the other way around. Data needs to be encrypted at rest and in transit and potentially in memory. Networks need firewall and systems need verification of breaches through logging. One can use the Identity Aware Proxy and practice Defense in Depth.

The Sarbanes-Oxley (SOX) Act aims to put controls on data that make tampering more difficult. I worked for a SOX compliant business, IGT PLC and we had to take escrow of code, making versions of code we deployed immutable so it could be audited. In this case, tampering with the data was made more difficult by using an escrow step in the data processing flows. Other business might need to store data for certain number of years while also being immutable or having some other condition applied to it.

IS, information security, infosec or cybersecurity is the practice or discipline of keeping information secure. Secured information as a business need comes from the need for confidentiality, the need for lack of tampering, and availability. Unavailable systems are generally secure. No can remotely compromise a computer, for instance, that has no network interface.

Businesses need to limit access to data so that only the legal, ethical and appropriate parties can read, write, or audit the data. In addition to compliance with data regulations, competing businesses have a need to keep their information private so that competitors cannot know their trade secrets, plans, strategies, and designs.

Google cloud offers several options for meeting these needs. Encryption at rest and in transit is a good start. Memory encryption using N2D compute instances and shielded VMs make the system the least compromisable.

Other offerings include Google Secret Manager, Cloud KMS for keeping Google from reading the data except for the least-access cases you let it. When using customer supplied keys, they are stored outside of Google’s own key management infrastructure.

Protected networks keep data confidential. Services also can be configured for maximum protection. For instance, consider these apache configuration directives:

ServerTokens Prod
ServerSignature Off
<Directory /opt/app/htdocs>
Options -Indexes
</Directory>
FileETag None

Similar directives in other service configuration make confidential your software versions and system software. In fact, turning ServerTokens and ServerSignature off and prod is a PCI DSS requirement.

Determine the methods of authentication how methods of authorization can compromise confidentiality.

::: tip Dealing with Inconsistent Message Delivery Cloud Pubsub is a messaging service that allows applications to exchange messages in a reliable and scalable way. It is a fully managed service that can be used to build applications that require high throughput and low latency.

If Applications are working synchronously, decouple them and have the reporters interact with a third services that is always available and that autoscales. :::

Data Integrity is required by regulations which focus on making data tamper-proof, but normally is simply a business requirement. You need your records to be consistent and reflect reality. Data Integrity is also about keeping it in that state.

Ways in Google cloud that you can promote and increase data integrity are to use ways to promote data integrity in Google cloud are to use tools like Data Loss Prevention (DLP) and Data Encryption. You should also enforce least privilege, use strong data encryption methods, and use access control lists.

Colocate report data instead of drawing on active data. That way, if data is tampered with discrepancies exist directly within the app. The search for these discrepancies can be automated into their own report.

DDos attacks, ransomware, and disgruntled administrators and bad faith actors threaten the availability of data.

You can combat ransomware with a well hardened IaC(Infrastructure as Code) pattern culling resources which have their availability degraded, restoring their data and stateful information from trusted disaster recovery provisions.

When designing a project, design around these scenarios to ensure a business can survive malicious activity. Design a project which can not only survive a malicious attack but one that can also continue to be available during one.

::: tip Keeping Data Entirely Secret Cloud KMS is a cloud-based key management system that allows you to manage your cryptographic keys in a secure, centralized location. With Cloud KMS, you can create, use, rotate, and destroy cryptographic keys, as well as control their permissions. Cloud KMS is integrated with other Google Cloud Platform (GCP) services, making it easy to use your keys with other GCP products.

When you manage the encryption keys Google uses to encrypt your data, the data is kept secret from anyone who doesn’t have access to decrypt it, which requires access to uses those keys. :::

As businesses move to agile continuous deployment and integration, they want to see reports of the deployments going well, development costs decreasing, the speed of development therefore increasing. Amid all of this the want to measure the overall success of an endeavor so they can correctly support the resources which will increase the bottom line.

::: tip Continuous Integration & Delivery The benefits of CICD to business requirements is that it enables smaller incremental trunk-based development. This shortens the feedback loop, reduces risks to services during deployment, increases the speed of debuging, isolates featuresets to known risks. :::

The first to two important measurements is Key Performance Indicators(KPIs). The other is Return on Investment(ROI). KPIs measures of value of some portion of business activity which can be used as a sign things are well and an effort is achieving its objectives. A KPI for an automation team of reliability engineers might be a certain percentage as a threshold of failed deployments to successful ones.

Cloud migration projects have KPIs which the project manager can use to gauge the progress of the overall migration. Another KPI might be having a set of databases migrated to cloud and no longer being used on premises. KPIs are particular to a projects own needs.

::: tip Improving SQL Latency Export unaccessed data older than 90 days from the database and prune those records. Store these exports in Google Cloud Storage in Coldline or Archive class buckets. :::

Operations departments will use KPIs to determine if they are handling the situations they set out to address. Product support teams can use KPIs to determine if they are helping their customers use their product to the degrees which mean the business objectives. Cloud Architects will need to know which KPIs will the used to measure the success of the project being designed. The help the architect understand what takes priority and what motivates decision-makers to invest in a project or business effort.

::: tip Total Cost of Ownership When Managers and Directors Only Compare Infrastructure Costs Calculate the TCO of legacy projects against planned cloud projects. Calculate the potential ROI with regard to the TCO of the investment. Use this wider scope to compare the true cost of running legacy projects or forgoing cloud migrations. :::

Return on investment is the measure of how much of a financial investment pays off. ROI is a percentage that measures the difference between the business before and after the investment. The profit or loss after an investment divided by the total value of the investment. So:

$ROI=\left(\frac {investment\ value-cost\ of\ investment} {cost\ of\ investment} \right) \times 100$

Lets work this out for a 1 year period. Host U Online bought $3000 in network equipment and spent $6000 to migrate to fiber. The total cost of investing in fiber was $9000. They began reselling their fiber internet to sublets in the building. In one year the acquire six customers totalling $12,000 per month. A year’s revenue from the investment is $144,000.

$\left(\frac {135000} {9000} \right) \times 100 = 1500%$

This is a real scenario I orchestrated for a real company. Our return on investment, the ROI, was a tremendous 1500%.

In a cloud migration project the investment costs includes costs Google cloud services and infrastructure, personnel costs, and vendor costs. You should include expenses saved in the value of the investment.

::: tip Reducing Costs When designing for cost reduction, there are three options you should strongly consider:

The goals and concepts that the organization places high value upon will be underlying the KPIs and ROI measures.

  • Understanding the sample requirements word for word
  • Knowing the meanings of business terms like TCO, KPI, ROI
  • Learn about what Google services are for what use cases
  • Understanding managing data
  • Understanding how compliance with law can affect the architecture of a solution
  • Understand the business impetus behind the aspects of security pertaining to business requirements
    • Confidentiality
    • Integrity
    • Availabiltiy
  • Understand the motives behind KPIs

List of All Managed Google Cloud Platform(GCP) Services

ServiceTypeDescription
AutoML TablesAI and Machine LearningMachine learning models for structured data
Recommendations AIAI and Machine LearningPersonalized recommendations
Natural Language AIAI and Machine LearningEntity recognition, sentiment analysis, language identification
Cloud TranslationAI and Machine LearningTranslate between two languages
Cloud VisionAI and Machine LearningUnderstand contents of images
Dialogflow EssentialsAI and Machine LearningDevelopment suite for voice to text
BigQueryAnalyticsData warehousing and analytics
BatchComputefully managed batch jobs at scale
VMware Engine(GCVE)Computerunning VMware workloads on GCP
Cloud DatalabAnalyticsInteractive data analysis tool based on Jupyter Notebooks
Data CatalogAnalyticsManaged and scalable metadata management service
DataprocAnalyticsManaged hadoop and Spark service
Dataproc MetastoreAnalyticsManaged Apache Hive
Cloud ComposerAnalyticsData workflow orchestration service
Cloud Data FusionAnalyticsData integration and ETL tool
Data CatalogAnalyticsMetadata management service
DataflowAnalyticsStream and Batch processing
Cloud SpannerDatabaseGlobal relational database
Cloud SQlDatabaseRegional relational database
Cloud Deployment ManagerDevelopmentInfrastructure-as-code service
Cloud Pub/SubMessagingMessaging service
BigtableStorageWide column, NoSQL databases
Cloud Data TransferStorageBulk data transfer service
Cloud MemorystoreStorageManaged cache service using Redis or memcached
Cloud StorageStorageManaged object storage
Cloud FilestoreStorageManaged shared files via NFS or mount
Cloud DNSNetworkingManaged DNS with API for publishing changes
Cloud IDSNetworkingIntrusion Detection Systems
Cloud Armor Managed Protection PlusNetworkingDDos Protection with Cloud Armor’s AI adaptive protection
Service DirectoryNetworkingManaged Service registry
Cloud LoggingOperationsFully managed log aggregator
AI Platform Neural Architecture Search (NAS)AI PlatformAI Search
AI Platform Training and PredictionAI PlatformNAS training
NotebooksAI PlatformJupyterLab environment
ApigeeAPI ManagementAPI Gateway security and analysis
API GatewayAPI ManagementAPI Gateways
Payment GatewayAPI Managementintegration with real-time payment systems like UPI
Issuer SwitchAPI Managementuser transactor deployment
Anthos Service MeshHybrid/Multi-Clouddevide up gke traffic into workloads and secure them with istio
BigQuery OmniAnalysisUse BigQuery to query other clouds
BigQuery Data Transfer ServiceAnalysisMigrate data to BigQuery
Database Migration ServiceStoragefully-managed migration service
Migrate to Virtual MachinesMigrationmigrate workloads at scale into Google Cloud Compute Engine
Cloud Data Loss PreventionSecurity and Identitydiscover, classify, and protect your most sensitive data
Cloud HSMSecurity and IdentityFully managed hardware security module
Managed Service for Microsoft Active Directory (AD)Identity & AccessManaged Service for Microsoft Active Directory
Cloud RunServerless ComputingRun serverless containers
Cloud SchedulerServerless Computingcron job scheduler
Cloud TasksServerless Computingdistributed task orchestration
EventarcServerless ComputingEvent rules between gcp services
WorkflowsServerless Computingreliably execute sequences of operations across APIs or services
IoT CoreInternet of ThingsCollect, process, analyze and visualize data from Iot devices in real time
Cloud HealthcareHealthcare and Life Sciencessend, receive, store, query, transform, and analyze healthcare and life sciences data
Game ServersMedia and Gamingdeploy and manage dedicated game servers across multiple Agones clusters

Designing and Planning Solutions in Google Cloud with GCP Architecture

  • Business Use Case & Product Strategy
  • Cost Optimization
  • Dovetail with Application Design
  • Integration with External Systems
  • Movement of Data
  • Planning decision trade-offs
  • Build, buy, modify, deprecate
  • Measuring Success
  • Compliance and Observability

Collecting & Reviewing Business Requirements

Section titled “Collecting & Reviewing Business Requirements”

Architects begin by collecting business requirements and other required information. Architects are always solving design patterns for the current unique mix of particular business needs. So every design is different. Because of this, you cannot reuse as a template a previous design even if it solved a similar use case.

This unique mix of business requirements and needs is what we’ll call the Operational Topology. An Architect begins their work by making a survey of this landscape.

The peaks and valleys, inlets and gorges of this topological map include things like:

  • Pressure to reduce costs.
  • Speeding up the rate at which software is changed and released.
  • Measuring Service Level Objectives(SLOs)
  • Reducing incidents and recovery time.
  • Improving legal compliance.

::: tip Incident An incident is a period of time where SLOs are not met. Incidents are disruptions in a service’s availability therefore becoming degraded. :::

The use of Managed services places certain duties on specialized companies who can reduce the cost of management by focusing on that discipline’s efficiency. This enables your business to consolidate its focus on its trade and products.

Managed services remove from an engineering team’s focus those concerns such as provisioning setup, initial configuration, traffic volume increases, upgrades, and more. If planned properly, this will reduce costs but those projections need to be verified. Workloads need to be separated in scope of their availability requirements. Workloads that don’t need highly available systems can use preemptive workloads. Pub/Sub Lite trades availability for cost. Auto-scaling and scaling down to zero, for instance, enable cost savings in tools like Cloud Run and App Engine Standard. Compute Engine Managed Instance Groups will scale up with load and back down to their set minimum when that load subsides.

We want to accelerate all development to a speed of constant innovation, the CI/CD singularity. This is what success means. Again, using Managed Services enables this by letting developers and release engineers focus on other things besides infrastructure management. The services Google hosts and manages and offers allows developers without domain expertise in those fields to use those services.

Continuous Integration and Deployment enable quick delivery of minor changes so that reviews can be quick and tracked work can be completed like lightning. Automated testing and reporting can be built into these delivery pipelines so that developers can release their own software and get immediate feedback about what it is doing in development and integration environments.

However, sometimes there are tacit business requirements that prevent you from using one of these solutions on every asset a business needs to maintain. You may be tasked to architect solutions around an ancient monolithic service which cannot be delivered to production in an agile manner. Planning to get out of this situation is your job and selling that plan to decision makers is also your goal. You have to believe in your designs and be an optimist that these specifications are all that is needed to meet the Operational Topology.

You may break apart the giant macroservice into microservices, but even if you do, that’s the future, what to do now? Do you rip and replace, meaning rebuild the app from scratch? Do you lift and shift, bring the macroservice into a compute engine while moving to microservices later. Finally, you could convert to microservices as you move it into cloud striking a hybrid between the two. Business requirements will point the way to the correct solution every time without fail.

An application’s requirements which surround how available it needs to be to those whom it serves is called Service Level Objectives. Accounting systems might not need to be running except during business hours, while Bill pay applications that customers use will need to always be available. These two different systems used by two different audiences needs two different Service Level Objectives.

SLOs specify things like uptime, page load time. These events are recorded within Cloud Logging. When they are not met Alerts can be created with Cloud Monitoring. The data points in these logs are called Service Level Indicators(SLIs). An SLO is a formal definition of a threshold which SLIs need to stay compliant with.

When services become unavailable or degrades, an Incident has occurred. A Business’s response to an incident may vary from company to company, but for the most part, every company has some sort of response system.

Collecting metrics and log entries along the way reduce the time it takes to recover from incidents because it illuminates the states of parts of the system when the error occurred. The first thing a reliability engineer does is look at logs on a problematic system. If one can see all logs from all components in one place at the same time one can better put together a complete story rather than having to revise the story continually as the information about the problem is discovered.

The Big five most architects have to worry about are:

  • Health Insurance Portability and Accountability Act(HIPPA), a healthcare regulation
  • Children’s Online Privacy Protection Act (COPPA), a privacy regulation
  • Sarbanes-Oxley Act(SOX), a financial reporting regulation
  • Payment Card Industry Data Standard(PCI), Compliance data regulation protection for credit card processing
  • General Data Protection Regulation(GDPR), a European Union privacy regulation

Compliance with these means controlling who has access to read and change the regulated data, how and where it is stored, how long it must be retained. Architects track and write schemes of controls which meet these regulations.

Capital expenditures are funds used to purchase or improve fixed assets, such as land, buildings, or equipment. This type of spending is typically used to improve a company’s long-term prospects, rather than for day-to-day operations. Because of this, capital expenditures can be a significant financial decision for a business, and one that should not be made lightly.

Implementation of controls on access, storage, and lifecycle of sensitive data.

Digital transformation is the process of using digital technologies to create new or improved business processes, products, and services. It can be used to improve customer experience, operational efficiency, and competitive advantage. In order to be successful, digital transformation must be driven by a clear strategy and executed with careful planning and execution.

Governance is the process by which organizations are directed and managed. It includes the creation and implementation of policies, the setting of goals, and the monitoring of progress. Good governance is essential for the success of any organization, as it ensures that resources are used efficiently and effectively. There are four main principles of good governance: accountability, transparency, participation, and inclusiveness. Accountability means that those in positions of authority are held accountable for their actions. Transparency means that information is readily available and accessible to those who need it. Participation means that all stakeholders have a say in decision-making. Inclusiveness means that all voices are heard and considered. These principles are essential for the success of any organization.

A key performance indicator (KPI) is a metric used to evaluate the success of an organization or individual in achieving specific goals. KPIs are often used in business to track progress and compare performance against objectives. While there are many different KPIs that can be used, some common examples include measures of sales, profitability, productivity, customer satisfaction, and safety.

A line of business (LOB) is a group of products or services that are related to each other. Businesses often have multiple lines of business, each with its own set of customers, products, and services. For example, a company that sells both cars and trucks would have two lines of business: automotive and commercial vehicles. Lines of business can be created for different reasons. Sometimes, businesses create lines of business to take advantage of different market opportunities. Other times, businesses create lines of business to better serve their customers’ needs. Lines of business can be a helpful way for businesses to organize their products and services. By creating lines of business, businesses can more easily target their marketing and sales efforts.

Operational expenditures are the costs associated with running a business on a day-to-day basis. They can include everything from rent and utilities to payroll and inventory costs. For many businesses, operational expenditures are the largest category of expenses. Managing operational expenditures is a key part of running a successful business. Careful planning and budgeting can help keep costs under control and ensure that the business is able to generate enough revenue to cover all of its expenses. Operational expenditures can have a major impact on a business’s bottom line. Therefore, it is important to carefully track and manage these costs. Doing so can help ensure that the business is able to remain profitable and continue to grow.

An operating budget is a financial plan that details how a company will generate and spend revenue over a specific period of time. The operating budget is important because it ensures that a company has the resources it needs to meet its operational goals. The budget also provides a way to track actual results against desired outcomes.

A service level agreement (SLA) is a contract between a service provider and a customer that specifies the nature and quality of the service to be provided. The SLA will typically include a description of the service to be provided, the standards that the service must meet, the customer’s responsibilities, and the service provider’s obligations. The SLA may also specify the remedies available to the customer if the service provider fails to meet the agreed-upon standards.

Service-level indicators (SLIs) are performance metrics that help organizations measure and track the quality of their services. SLIs can be used to track the performance of individual service components, as well as the overall performance of the service. Common service-level indicators include uptime, response time, and error rates. By tracking SLIs, organizations can identify service problems early and take steps to improve the quality of their services.

Service-level objectives (SLOs) are a key component of any effective service-level management (SLM) program. SLOs help ensure that services are delivered in a consistent and predictable manner, and help identify and track the key performance indicators (KPIs) that are most important to the success of the business.

SLOs should be designed to meet the specific needs of the business, and should be based on a thorough understanding of the customer’s requirements. They should be realistic and achievable, and should be reviewed and updated on a regular basis.

An effective SLM program will help to ensure that services are delivered in a timely and efficient manner, and that customer expectations are met or exceeded.

Technical requirements specify the characteristics that a system or component must have in order to be able to perform its required functions. These include requirements such as atomicity, consistency, reliability, and durability. Atomicity refers to the ability of a system to guarantee that a transaction is either completed in its entirety or not at all. Consistency refers to the ability of a system to maintain data integrity. Reliability refers to the ability of a system to perform its required functions correctly and consistently. Durability refers to the ability of a system to maintain data integrity in the face of failures.

Functional requirements are the specific capabilities that a system must have in order to perform its intended functions. For example, a compute requirement might be the ability to process a certain amount of data within a certain time frame, while a storage requirement might be the need for a certain amount of space to store data. Network requirements might include the need for certain bandwidth or the ability to connect to certain types of devices. All of these requirements must be taken into account when designing a system.

Requirements can be grouped into being met by the cloud’s offerings. Compute Engine, App Engine, Kubernetes Engine, Cloud Run, and Cloud Functions all solve unique use cases. It is forseeable that all of your requirements are going to fall along these lines when it comes to processing data requests, responding to requests, delivering content and interfaces. If not, another Google product will represent a Functional Needs subset.

Similarly, storage options are plethora. One or more of them meet our needs. Is your data Structured, or Unstructured, Relational? What latency requirements do you have? Group your requirements together and look at how the offerings meet those needs. If you are only appending dumps of data somewhere, you can chose a better option for that.

How many instances or nodes will you need? That number will affect how big your subnets will need to be. Can Firewall rules be allowed by service accounts? Do you have multiple workloads that you can sort into different groups to which the rules correspond?

Do you need DNS peering to enable hybrid-cloud networking between your VPC and your on-premises networks? These are questions an architect asks. You have to take the company’s subnets into account so that you can avoid collisions. So is automated or custom subnetting right for your project?

How is hybrid peering accomplished: VPN Peering which has high security but low througput? Or will Dedicated Interconnect and Partner Interconnects be used at higher cost for greater throughput?

Nonfunctional requirements are those that specify system characteristics such as availability, reliability, scalability, durability, and observability. They are often expressed as quality attributes or service level agreements. Functional requirements define what the system does, while nonfunctional requirements define how the system behaves. Nonfunctional requirements are important because they ensure that the system will meet the needs of its users.

  • Availabiltiy
  • Reliability
  • Scalability
  • Durability
  • Observability

There are many factors to consider when determining the availability requirements for a system. The first is the required uptime, which is the percentage of time that the system must be operational. For example, a system with a required uptime of 99% must be operational for at least 99% of the time. Other factors include the reliability of the components, the redundancy of the system, and the response time to failures. Availability requirements are often specified in terms of uptime and downtime, which is the amount of time that the system is operational and unavailable, respectively.

Reliability requirements are those that specify how often a system or component must perform its required functions correctly. They are typically expressed as a percentage or a probability, and they may be specified for a single function or for the system as a whole. Reliability requirements are important because they help ensure that a system will be able to meet its operational objectives. Related to Availability, Reliability is the same requirement under the pressure of business load.

Scalability requirements are those that dictate how well a system can cope with increased loads. They are typically expressed in terms of throughput, response time, or capacity. For example, a system that can handle twice the number of users without any degradation in performance is said to be scalable.

Scalability is a key consideration in the design of any system, be it a website, an application, or a network. It is especially important in the case of web-based systems, which are often subject to sudden and unexpected spikes in traffic. A system that is not scalable will quickly become overloaded and unable to cope, leading to a poor user experience and potential loss of business. Scalability requirements often are linked to Reliability factors.

In order for a product to be considered durable, it must be able to withstand repeated use and exposure to the elements without showing signs of wear and tear. This means that the materials used to construct the product must be of high quality and able to withstand regular use. Additionally, the product must be designed in a way that minimizes the likelihood of damage. For example, a durable product might have reinforced seams or be made from waterproof materials. Ultimately, the durability of a product is a key factor in determining its overall quality and usefulness.

Durability in the cloud is the ability to retrieve data placed there in the future. This means not losing volumes, files, objects and the immediate replacability and reproducibility of any resources that are not functioning correctly.

Observability requirements are those that enable a system to be monitored and its performance to be assessed and internal states to be known. They are typically concerned with aspects such as the availability of data, the ability to detect and diagnose faults, and the ability to predict future behavior. In many cases, these requirements will need to be trade-offs between conflicting goals, such as the need for timely data versus the need for comprehensive data.

Features in Google Cloud for Securing Virtual Machines(VMs)

Shielded VMs use verification on hardware IDs and chips to defend against Linux bootkits and rootkits and provides self-healing security features such as integrity monitoring and healing.

It uses Secure Boot, Virtual trusted platform module(vTPM)-enabled Measured Boot, and Integrity monitoring.

You can monitor your VMs in a few ways with Shielded VMs:

  • You can monitor the boot integrity of shielded VMs with cloud monitoring.
  • You can automatically take action on integrity failures with cloud functions.

These Virtual Machines use encryption-in-use and encrypt the data in memory. You provision this type of VM with the type N2D:

  • n2d-standard-2
  • n2d-standard-4
  • n2d-standard-8
  • n2d-standard-16
  • n2d-standard-32
  • n2d-standard-48
  • n2d-standard-64
  • n2d-standard-80
  • n2d-standard-96
  • n2d-standard-128
  • n2d-standard-224

VPC Service Controls can define perimeters around sets of services within a VPC and can have their access limited. Traffic that crosses perimeters have Ingress and Egress rules. This affords us the following benefits:

  • Unauthorized networks with stolen credentials are blocked
  • Data exfiltration blocked.
  • Safety net for misconfigured over-permissive IAM policies.
  • Honeypot perimetering and additional monitoring.
  • Extend perimeters to on-premiss networks
  • Context-aware access to resources

image

Comparison of Google Cloud Database Options

There are many pros to using bigtable, including the ability to handle large amounts of data, the flexibility to scale up or down as needed, and the ability to support a variety of data types. Additionally, bigtable is designed to be highly available and can provide near-real-time access to data. resizing without downtime, simple administration, highly scalable.

BigQuery is a very powerful tool that can handle large amounts of data very efficiently. It is also easy to use and has a lot of features that make it a great choice for data analysis. On the downside, BigQuery can be expensive to use, and it can be challenging to get started if you are not familiar with it.

Google Cloud SQL is fully managed, flexible, automatically replicated across multiple zones, encrypted at rest and in transit, automatic updates.

Cloud Spanner uses TrueTime to execute the same query on multiple regions to ensure consistency. If your data needs to be consistent and cannot wait for replication, cloud spanner is the clear choice.

Running a database cluster on Compute VM, you take all the management upon yourself. If you select the wrong compute sizes, either too big or too small, you run risks of rising costs or falling performance.

ProductRelationalStructuredUnstructuredHeavy R/WLow LatencyGlobal Consistency
Bigtable🔴🟢🟢🟢🟢🔴
BigQuery🟢🟢🟢✝🔴✝✝🔴🔴
Cloud Firestore🔴🔴🟢🔴🔴🔴
Firebase Realtime Database🟢🟢🟢✝🔴✝✝🔴🟢
Cloud SQL🟢🟢🟢🔴🔴🔴
Cloud Spanner🟢🟢🔴🔴🔴🟢
Compute VM🟢🟢🟢🔴🔴🔴
SymbolMeaning
🟢Yes
🔴No
Semi Unstructured Data with the Json type
✝✝Read / Append Only