Metron - Capacity Management: October 2011

Wednesday 26 October 2011

Managing IT Capacity – Make it lean and green

No one now doubts the wisdom of going 'green' - reducing the environmental impact of IT on the world. Before long the IT industry will have overtaken the airline industry as a polluter of the environment.

Most organizations now have initiatives underway to reduce their carbon footprint, adopting more responsible policies to lessen any detrimental impact of IT on the world.

Strategies range from the quick and simple such as 'think of the environment before printing this e-mail' to the longer term and more complex such as virtualizing your server estate.

New technologies can go a long way to helping Companies meet their Green initiatives but only if they are effectively managed, otherwise the benefits to both the Company and the environment are squandered.

Capacity management has a role to play in helping you ensure you can implement green strategies that optimize your infrastructure and maximize the green savings to be made

Failure to implement sound, sustainable strategies will result in spiralling costs or poorly performing infrastructure with the inevitable impact on your business goals.

We're all trying to go green in an IT context that is becoming ever more complex. From a server perspective, everyone now accepts that by virtualizing our vast ranks of under-utilized servers, we can do more with less: reduce power consumption, reduce data center space, reduce air conditioning required and more. This promises a 'double bubble' of benefit: lower costs and lower carbon footprint. Fewer servers means less staff time required to manage them. Your business benefits as you save time, save energy and save money.

Let’s face it ‘Managed Capacity’ sounds much more like an approach that fits with a Green agenda than ‘Unmanaged Capacity’.

We’re going to be speaking at the Green IT Expo in London on November 1^st as we’re passionate about managing IT resources to ensure you deploy the capacity you need, when you need it. Minimising money, people and carbon costs.

http://www.metron-athene.com/forms/green_it_registration_form.asp

Andrew Smith
Chief Sales & Marketing Officer

Monday 24 October 2011

Cloud Computing - Complexity, Cost, and Capacity Management

Computer systems have always been complex. The range of useful work computers can do is extended by layering complexity on top of the basic machine. Current “Cloud Computing” capabilities are no exception. Regardless of complexity, the costs of systems must be understood and reported to enable business planning and management.

So what are these perspectives that need to be understood, reported on and normalized to enable comparison and calculation of unit costs? The business likes to look at total costs, irrespective of how their service is provided. They are right to do this – what happens under the covers to deliver that service shouldn’t be their concern. They just want to know that the service level they require is being provided and what that costs per business transaction or process.

On the systems side, it used to be relatively simple. Internal systems were the norm. We had to account for costs of hardware, software, floor space, power, air conditioning, ancillary costs such as insurance and of course, staff costs. As applications and services became more interlinked and disparate in implementation, it became ever harder to compare and calculate costs for a particular service delivered to users.

Outsourcing and now the Cloud have added yet more levels of complexity. On one level it seems simple: we pay a cost for an outsourced provision (application, hardware, complete data center or whatever). In practice it becomes ever more difficult to isolate costs. Service provision from outside our organization is often offered at different tiers of quality (Gold, Silver, Bronze etc). These have different service levels, and different levels of provision, for example base units of provision and overage costs that vary and make direct comparison awkward.

Increasingly the model is to mix all of these modes of service provision, for example hybrid Cloud implementations featuring internal and external Cloud provision plus internal services all combined to deliver what the user needs.

Each facet of systems use can be monitored and accounted for in terms of resource utilization, and ultimately, dollar costs. However, overly detailed data quickly adds volume and cost, becomes unwieldy, and delays analysis and reporting while overly simplified data weakens analysis and adversely impacts the quality of decision support. The points of monitor and level of detail for data to be collected is driven by considerations of trade-offs between cost, utility, and performance and are highly detailed and dynamic. Frequently, though, data collection is minimalized and aggregated to a level which obscures the level of detail needed to make some decisions. For example, cpu metrics aggregated to 5 minute periods and suitable for capacity planning are not very useful to understand cpu resource consumption for individual transactions, a performance engineering concern.

A distinction perhaps needs to be made between different types of costs. We might need to move towards calculating regular on-going fixed costs for users, supplemented by variable costs based on changing circumstances. To my mind this is a little like having the running costs of your car covered by a general agreement (free servicing for 3 years) with those qualifying criteria any insurance business likes to slip in (assumes no more than 20,000 miles per year average motoring, standard personal use, does not include consumables such as tires, wiper blades). If we go outside the qualifying criteria, we have to pay for individual issues to be fixed.

Cloud in particular lends itself to costing IT services based on a fixed charge plus variable costs dependent on usage.

Going back to those complex systems - we need to ensure we normalize our view of services across this complex modern IT model, a fancy way of saying we must compare apples with apples. The key is being able to define a transaction from a business perspective and relate it to the IT processing that underlies that definition.

Application Transaction Management tools such as the Sharepath software distributed by Metron enable you to get this transaction visibility across diverse infrastructures, internal and external.

Capacity Management data capture and integration tools like Metron’s Athene then allow you to relate underpinning resource metrics to those transactions, at least for systems where you are able or allowed to measure those resources.

This brings us to a last key point about external providers, outsourcers or Cloud providers. You need to ensure that they provide you with that resource level data for their systems or the ability for you to access your own systems and get that data. Once you have the numbers, calculating costs per transaction is just math. Without the numbers, you can’t calculate the cost.

Business increasingly wants to see the costs of their services, so make sure you demand access to the information you need from your suppliers, as well as from your own organization. Then you can put together a comprehensive and realistic view of costs for services in today’s multi-tiered internal and external application world.

GE Guentzel

Consultanthttp://www.metron-athene.com/

Friday 21 October 2011

Often capacity managers feel they’re in a never ending battle to show their value

You can be successful 99% of the time with your predictions and analysis, but that 1% of the time you’re not, business users begin to doubt you. What do you do in this circumstance besides throw paper at them?

One thing that might help is to have a defined capacity management process.

A well-defined process leads to gaining the confidence of the business users where your projections are concerned. This allows you to fall back on the information, show them the process you are following and convince them that it is not hit and miss. That 1% is typically always down to issues with the quality of data you have to work with, timeliness of receiving that data and more. Having a defined, visible process shows that it is not the process itself that is at fault. Many times the reaction from the business users is because the unexpected, in many cases, will cause an increase in cost.

Along with the challenges we’re already facing in the community, the business users are now asking how “our” transactions are responding. As you begin to ask them the question of “what do you mean by a transaction”, you get “when the person fills out a form on the screen and hits enter, how do we know everything is working fine? Now we all know what it means when they say “fine”. It means I don’t want to hear from my people that their applications are running slow and they can’t get their work done. This is where an additional item to your capacity management process is necessary, that addition being Application Transaction Monitoring.

Application Transaction Monitoring gives you the ability to not only monitor the capacity of your servers within the enterprise; it enables you to determine what is happening from the application point of view. This monitoring is more critical these days as the setup of the IT environments become ever more complex.

What Application Transaction Monitoring brings to the table is the ability to monitor an application transaction every step of the way throughout its lifecycle. The ability to take this information and marry it to the server data allows you to gain both breadth and depth of the IT environment.

Now back to the question that I began with, what do you do when the business users say they don’t have confidence in your information? Show them you have a well-defined and visible capacity management process. Show them that this process has both depth, from business data down to technical resource levels, and breadth, covering across an application from their perspective. By having both of these, you are going to be able to show them both sets of information and talk them down off the ledge. Any issues should move from ‘you got the numbers wrong’ to ‘well, we can see how your process should work, what input do you need to make it work successfully for us?’

Are we as capacity managers going to be 100 % accurate? No, but by showing that we are gathering and analyzing all the information that is available, they will walk away with an understanding that you have given the best results possible.

Now when they leave your office, you can throw the paper at the door.

Charles Johnson

Principal Consultant

Wednesday 19 October 2011

VMware vSphere Performance Management Challenges and Best Practices

Ever wanted to know what affects CPU Performance in vSphere? What a World is? How and why ESX uses memory reclamation techniques? Or why it is recommended to install VMware Tools?

I’m running a free to attend webinar tomorrow which will identify and highlight the key performance management challenges within a vSphere environment, whilst providing best practice guidelines, key metrics for monitoring and recommendations for implementation.

I’ll be focusing on the key resource areas within a virtualized environment, such as CPU, Memory, Storage and Network and also provide an introduction into Virtualization Performance challenges and some further information on VM performance and virtualizing applications.

Challenges of x86 virtualization

Four levels of privilege (Ring 0 – 3)
Hardware – Intel VT-x and AMD-V
Software – Binary Translation
Memory Management – MMU
Default Monitor Mode

CPU Performance Management

What is a World?
CPU Scheduling – why is it important to understand how it works?

SMP and Ready Time

NUMA Aware
What affects CPU Performance?
Host CPU Saturation?

Causes and resolutions

Increasing VM efficiency

Timer interrupts and large memory pages

ESX Host pCPU0 High Utilization – why is this bad?

Memory Performance Management

Memory reclamation – how and why?
What do I monitor and what does it mean?
Troubleshooting tips
vSwp file placement guidelines

Networking

Reducing CPU load using TCP Off-Load and Jumbo Frames
What is NetQueue and how will it benefit me?
What to monitor to identify Network Performance Problems

Storage

Setting the correct LUN Queue Depth
Key metrics what to monitor
Identifying key factors of Storage Response Time
Overview of Best Practices

Virtual Machine

Selecting the right guest operating system – why does this matter?
VM Timekeeping
Benefits of installing VMware Tools
SMP – Only when required
NUMA Server Considerations

Applications

Can my application be virtualized?

Why not come along and discover these and many more VMware vSphere performance management challenges and best practices http://www.metron-athene.com/training/webinars/index.html

Jamie Baker

Principal Consultant