PBA Zone : Requirements Environment : Systemic Quality Requirements Area
One area where Architecture puts a particularly large amount of focus is systemic qualities. This area is also known as non-functional qualities. Specifically, this means qualities of the solution which are pervasive throughout the solution but are not event driven capabilities focused on providing a particular observable function to the actors (users, systems, etc…) for the solution.
For example, a functional requirement might be to provide authentication and authorization services for a self identified user into the given solution.
The systemic quality might be to provide for strong confidentiality, integrity and non-repudiation during all phases of communication between systems and networks of the solution. This security oriented systemic quality might be implemented with the use of DRM, IPSec Tunneling, and role based access control system (like AD/Kerberos).
If the solution had a functional requirement which needed authorization models be based on past dynamic activity (example: exposure to other information of have a “need to know” designation), then it might require the use of an additional mandatory access control system be implemented.
Systemic qualities are usually represented into 2 main areas based on time.
These are often called Manifest (similar to .net manifest or manifest in Longhorn) and Evolutionary qualities. Manifest simply means the capability a solution should demonstrate at a given point in time. This is similar to a manifest in .net or a manifest in Longhorn (what and how you want a .net application or operating system to currently demonstrate). An example of a manifest requirement might be 5000 concurrent transactions/second (part of a performance requirement). Another example might be that the solution demonstrates operating capabilities within 4 nines availability (52.56 minutes of downtime/year). Manifest qualities are critical as they must demonstrate immediate capability to the stakeholders. Often, failure to demonstrate the acceptable metrics of a Manifest quality results in a violation of an operational service agreement (which directly impacts service level agreements) in the enterprise.
Evolutionary simply means what the solution is designed to demonstrate in the future without breaking or re-designing the solution. This is an important element to design to as change has proven to be a constant. Architects must be capable of architecting systems to adapt to projected (and many times, unprotected) systemic needs in future time frames. A typical example is scalability. Does the solution (by tier, by layer) have mechanisms to scale to larger projected performance requirements within it’s projected lifespan without significant redesign or breaking the solution?
Furthermore, Manifest and Evolutionary requirements often impact each other. For example, the horizontal scaling model of an application tier will increase database pooling concurrent requirements to the resource tier (example; a database). If the architecture of the database wasn’t designed to handle these future concurrent connections, adding more application servers in a horizontal manner might negatively impact the performance of the overall solution (the database not being able to handle the increased pooled connections).
This brings up an important point: Architects must design an ecosystem based on the systemic quality requirements (both manifest and evolutionary) while understanding the impact of these decisions on the overall IT ecosystem.
Perspective Capture Questions:
What are the performance requirements for the solution?
Performance: Provides Expected Responsiveness (accurate, consistent & predictable)
Performance is a very important to IT architecture. This systemic quality is the solutions ability to provide the expected responsiveness to the expected external entities.
Note: an external entity or actors are terms usually phrased to mean another system which interacts with the solution.
An Actor (a term from the Unified Modeling Language utilized in a model called a use case diagram) has often been termed for a user type which interacts with the solution. This user type could be customer (utilizing a browser on the internet) or another solution or system which interacts with the solution (like an authentication system, credit card transaction system, inventory management system, etc…) External Entities are another term usually termed by infrastructure architects to describe any system which is outside the scope of control of the solution but has some sort of relationship (unidirectional or bidirectional) with the given solution
Performance is concerned with generating the expected response to actors/external entities under calculated pressure. Calculating pressures are usually defined in two ways:
· # of Concurrent Connections
This measurement is often focused on how many users are utilizing the solution at given state in time. This measurement examines how many session states are managed on the solution at the same time. However, it should be noted that most user interactions are not executed simultaneously on the solution. There are usually pause periods in between execution events in the system.
· # of Simultaneous Connections
This measurement is often focused on the number of simultaneous executable activities the solution or system is experiencing at the same time.
However, while these metrics are focused on external users, this often does not examine the load generated by external systems/services. One connection from a external system could generate tremendous load on solution systems if not planned for carefully.
The solution must also provide the performance in an expected accurate, consistent and predictable capability
· Accurate Performance:
Each transaction is correctly calculated and executed according to a given pressure portfolio.
Pressure portfolios can contain (but are not limited to) given load stress types, actor type interactions, time of day/week/year, time period, maintenance period, etc… it is important to understand a solutions potential range of pressure portfolios.
· Consistent Performance:
Multiple instantiations of a specific actor/external entity type experience similar transaction performance behavior based on the functional requirements of the business. There should be little to no variability between actor/entity instantiation types.
The word “instantiation” simply means a specific system or person of an actor/ external entity type. For example, a customer might be an actor but “Betty shopping for shoes on Monday April 21st 2002 at 720pm EST on a shoe e-commerce web-site” is an instantiation of the actor “customer.” Actors are abstract representations interacting with the Solution while Instantiations are unique real examples of the abstraction interacting with the Solution at a given point in time.
· Predictable Performance:
Administrators can predict the performance of the solution to specific actor types in a given portfolio of pressure conditions at a given point in time.
What are the scalability requirements for the solution?
This quality is associated with a solution’s ability to adjust to a projected more challenging pressure portfolio in the future without breaking or redesigning the solution significantly. Constraints that can be interpreted as “breaking the system” can include cost, time and people constraints (skill sets, team member availability, etc…) of the organization as well as technical design constraints. An example of a scalability model that could break a solution is when a business/domain tier is designed to scale horizontally in the future when no scalability planning was done for the resource tier (the database). Adding more application servers increases the amount of database pooled connections to the given database. If the database machine was not designed to expand vertically or horizontally to the increased connections, it will need to be completely replaced in order to accommodate the increase in pooled connections coming from the application servers.
What are the reliability requirements for the solution?
Reliability: the ability of a single component to operate without experiencing a fault, error or failure.
This metric is often focused on the ability of a solution component to operate consistently and predictably without experiencing any fault, error or failure.
The common metric associated with reliability is MTBF (mean time between failure).
Reliability of specific components (hardware, software, network structures, etc…) is critical for architects to understand.
What are the resiliency requirements for the solution?
Resiliency: transaction executes with little to no errors when the solution experiences a fault in its components.
This metric is often focused on the ability of a solution to provide an expected predictable and consistent performance metric when it encounters a fault activity. This means that when the system experiences a fault, the solution executes the current transactions at an expected error rate (The expectation is usually very low or zero). This often helps architects focus on the solution’s ability to provide each transaction with integrity and accuracy when experiencing problems.
While solutions architects focus on the code level design to increase resiliency for functional resiliency, architects must focus on systemic levels (non-functional components) to continue to provide expected resiliency under a given pressure portfolio when component fault activity is experienced.
What are the availability requirements for the solution?
Availability: Provides expected performance even when the solution really breaks
Architects utilize this systemic quality metric to describe the solution’s ability to provide the expected performance levels under a given pressure portfolio even when certain components of the solution (a system, service, etc…) ceases to operate normally or ceases to provide required capability to the overall solution. This can be caused by the component or system generating a terminal error when experiencing a fault or can be caused by maintenance activity bringing down the system or component.
Inherent Availability: MTBF/(MTBF+MTTR) of the solution
Operational Availability: uptime / (uptime+downtime) of the solution
Mean Time between Failure: MTBF
Mean Time to Repair: MTTR
Often availability is measured (or cited) as the number of nines the solution can support.
A common example of a Solutions measured availability
Yearly downtime:
Percentage of Operational Availability
| desired uptime | downtime |
| 98% | 7.3 days |
| 98.5% | 5.475 days |
| 99% | 3.65 days |
| 99.5% | 1.825 days (43.8 hours) |
| 99.9% | 8.76 hours |
| 99.99% | 54.56 minutes |
| 99.999% | 5.256 minutes |
| 99.9999% | 31.536 seconds |
It is important to note that Availability is often directly related to the service recoverability of a service set (external and internal) that the solution is dependent on.
What are the operational manageability requirements for the solution?
Serviceability
This is often associated with repair oriented maintenance (pro-active and reactive diagnose and fix) and upgradeability of the solution’s components. Often, it is associated with the amount of work (time, complexity, cost) required for a given activity on the solution.
Maintainability
This is often associated with non-repair oriented maintenance (pro-active and reactive incremental patching, adjustments, etc…) and upgradeability of the solution’s components. Often, it is associated with the amount of work (time, complexity, cost) required for a given activity on the solution.
Manageability
This is often associated with the solution’s collective ability to be holistically managed by the organization’s operational environment. This can include the cohesion of the solution with the organization’s IT environment (people, processes and technologies) to increase maintainability & serviceability
· an organization’s ability to measure the effectiveness and efficiency of a solution
· an organization’s ability to predict future systemic quality requirements of the solution
· an organization’s ability to monitor and understand the current granular activities of the solution and its decomposed elements. Solutions with strong manageability characteristics demonstrate proactive models, processes and tools to understand projected future performance characteristics of the solution.
What are the interoperability requirements for the solution?
Interoperability: (to interoperate with diverse services or systems)
Think of interoperability as the degree of usability (ease of use) for diverse external entities / services /systems which are projected to interface with the given solution. It is important to understand that a solution can never be completely interoperable with everything. Architects must make discriminating decisions as to what diverse types should the solution operate with now and in the projected future. Many architects align architectures with common standard communication protocols or interfaces for critical external interaction components of the solution. In the past, many architects have promoted loosely coupled standardized communication mechanisms between systems. Of course, architects must analyze the impact of interoperability decisions on other systemic quality areas of the solution as well the operational environment of the datacenter.
What are the extensibility requirements for the solution?
Extensibility
This quality is often associated with requirements to increase functional capabilities that the solution provides to a given set of actors/external entities. This might also pertain to increasing the diversity of actors / external entities to the solution.
However, it is important for architects to consider the impact of extensibility strategies on other systemic qualities (especially performance, scalability, availability and security).
What are the adaptability requirements for the solution?
Adaptability
This quality is often associated with requirements to change existing functional and systemic quality capabilities that the solution provides to a given set of actors/external entities. This might also pertain to increasing the diversity of actors / external entities to the solution.
What are the reusability requirements for the solution?
Reusability
This quality is focused on decomposable services within the solution being capable of being capitalized on by other solutions today and in the future.
Of course, architects must be sensitive to the ramifications of reusability on systemic qualities of the overall solution and its systemic quality impact on other solutions. Furthermore, it is important for architects to analyze the impact of reusable strategies on the operational and organizational environment.
What are the usability requirements for the solution?
Usability:
This is usually considered a functional perspective focusing on the ease of use for specific human actors based on actor/external entity events and point in time measurements.
Non-Functional (Systemic levels) of usability focus on three areas:
· Accurate Usability
Expected Usability levels are accurately delivered to different actors/external entities given a specific performance portfolio.
· Consistent Usability
Expected Usability levels are consistently delivered to similar actor/external entity types given a specific performance portfolio.
· Predictable Usability
Administrators can accurately predict a given level of usability for a given actor/external entity type given a specific performance pressure portfolio.
What are the accessibility requirements for the solution?
Accessibility:
This functional attribute is often associated with usability for the physically impaired. However, from a technical perspective, think of accessibility as ease of use for actors / external entities under given constrained conditions of the actor/external entity utilizing the same interface to the solution.
Also, when another system/service operates under constrained conditions utilizing the same service interface, it also expects the solution to operate effectively. This is another example of service accessibility.
What are the security requirements for the solution?
Security:
This quality is usually focused on the solution’s ability to provide expected confidentially, integrity and availability of the solution’s services to expected actors/external entities as well as unexpected actors/external entities. This includes measuring Vulnerabilities, Threats, Risks, and Countermeasures for various statistically probable and operationally significant scenarios. This can include administrative, technical (which can include system, storage, communication and application controls) and physical controls to company data, assets and services.
What are the recoverability requirements for the solution?
Recoverability:
For system service zone areas, it focuses on the service zone’s ability (speed, accuracy, consistency and predictability) to provide expected performance after a terminal service level failure has been experienced.
Service Zone Area: sometimes called tier areas. Examples can be the client (ex: rich application, browser), presentation services (ex: web servers), business or domain (ex: application servers), and resource (ex: database, directories, flat files or even other external services etc…). This can also mean operational service zones. Examples can include Authentication/Authorization services, DNS Services, DHCP Services, File Management Services, Monitoring Services, etc…
Terminal Service Zone Area Failure: When a specific service zone area simply stops performing at expected levels at given pressure portfolio. An example might be when the presentation service zone (example: the web servers) stop executing effectively.
What are the portability requirements for the solution?
Portability
This quality is focused on the ability of the solution to move functional capability from a given executable platform type to a different expected executable platform type. This is often associated with the flexibility of higher level system service areas to execute on diverse lower level system service areas.
Examples:
· Application Component’s ability to execute on different Application Service Stacks.
· Application Service Stack’s ability to execute on different Operating System Services
· Operating System Service’s ability to execute on different physical processing components
· A physical processing component’s ability to execute on different physical chassis environments
Application Component: groupings of application code into organized, manageable structures which an application service stake and or an operation system service can deploy, execute, monitor and allocate resources to.
Application Service Stack: The specific libraries, application systems, language structures, etc… which support the application component on the operating system service
Operating System Service: usually associated with a specific operating system. This infrastructural layer provides the executable base platform (any maybe other libraries, support services, etc…) for the application component and application service stack to be executed, instrumented/monitored, provisioned, and patched.
Physical Processing Component: This is considered the physical apparatus with which the operating system executes on. However, many have divided this into two areas: processing systems (the CPU, memory, disk I/O capacity, network capacity, etc…) and the physical chassis which supports one or more processing systems on it. The physical chassis can contain power supplies, CPU, memory, disk I/O slots, cooling systems, monitoring systems, etc…)
What makes each System Service Area more or less portable is directly related to the level of coupling and component standardization attached to other System Service Areas within each decomposed tier. There are often unique trade offs between highly coupled optimized systems and loosely coupled highly portable systems.
What are the testability requirements for the solution?
Testability:
This quality is often associated with a solution’s ability to demonstrate accurate, consistent, and predictable testing scenarios and results. This also applies to the comprehensiveness of the testing capability of the solution as well as the relative ease (amount of work, skill, process, etc…) in which to conduct the tests
This is a critical aspect of any solution. Data Center teams must be capable of easily testing multiple combinations of capability requirements (functional and non-functional) of the solutions to find real breaking point constraints.
What are the buildability requirements for the solution?
Buildability:
This quality is often associated with the inherent complexity of the system’s requirements, the number and complexity of external system dependencies and amount of custom code or integration vs. pre-fabricated or COTS components capitalized on. If the data center team needs to hire a couple of doctorial candidates to build components of the data center solution, then this aspect could negatively impact the data center environment, the operational team’s effectiveness and increase the brittleness of overall operations.
What are the understandability requirements for the solution?
Understandability:
The ratio of work required to understand the solution/component to the degree to which the system’s internals can be understood and acted upon in a timely manner.
While many would rate this as a given aspect of any solution, it is surprising how many enterprise solutions contain designs where architectural, design and support teams have limited degree of understanding years later. When too much of the understandability is associated with a couple of heroes within a company, then the solution usually has a high probability of eventually damaging the organization after a couple of years.
What are the architectural balance requirements for the solution?
Balance:
This has been associated with a couple of different balancing areas of IT architecture
· Matching investment into a specific systemic and operational quality requirement with the benefit experienced from that requirement
· Balancing the reliance (or reducing it) in one specific technology, process, or person/team to increase the overall success of the solution over its projected lifespan
· A common observed example of lack of balance:
a. Reducing over investment in one systemic quality strategy (at a specific layer at a specific tier) while significantly ignoring other systemic quality needs throughout the rest of the solution system service areas and tiers.
Example: When your security measures creates a self inflicted denial of service attack:
Often, when we decide to provide confidential communication, we often have several technical options from which we can choose. But what happens we choose everything possible?
Example:
Encryption on the wire (encrypted Ethernet) Encryption at the TCP/IP level (example: IP-Sec), Encryption at multiple application layers (SSL and WS-Security).
· How would the expected performance be impacted by such a choice? Do duplicate cascading encryption models always make the communication more secure (especially if the reliability of the communication is depreciated)? Could this generate a self inflicted denial of service attack on your own user community (actors/external entities) with significant performance degradation if the infrastructure cannot handle the overhead? Can the business afford the physical and execution resources required to support this design for the expected performance capabilities? These are questions architects must ask themselves when cons