May 4, 2012May 4, 2012 by eknowledger

CLR 4.5: .Net Framework Kernel Improvements

In this post I’ll go through some of the enhancements and improvements done by the CLR team as part of the performance improvements in .Net 4.5. In most cases developers will not have to do anything different to take advantage of the new stuff, it will just works whenever the new framework libraries are used.

Improved Large Object heap Allocator

I’ll start by the most “asked-for” feature from the community – compaction on LOH. As you may know the .Net CLR has a very tough classification regime for its citizens(objects) any object that is greater than or equal to 85.000 bytes considered to be special(large Object) and needs different treatment and placement, that’s why the CLR allocate a special heap for those guys and then the poor guys lives in a different heap frankly called Small Object Heap (SOH).

SOH Allocations and Garbage Collections

The main difference between these two communities are basically around how much disturbance the CLR imposes in each society. When a collection happen on SOH the CLR clear the unreachable objects and reorganize the objects again (all citizens of the SOH) to compact the space and group all the free space at the beginning of the heap. this decreases the chances of fragmentation in the heap and it can only happen because the citizens of SOH are lighter and easier to move around. on the other hand when collection occur on LOH ( Only occur during Gen 2 Collection) the CLR doesn’t com[pact the memory cause it’s very expensive process to move these large object around.You see where this is leading, now we have to deal with fragmentation issues in LOH and from time to time a Out of Memory Exceptions.

LOH Allocations and Garbage Collections

Because the LOH is not compacted, memory management happens in a classical way. where the CLR keeps a free list of available blocks of memory. When allocating a large object, the runtime first looks at the free list to see if it will satisfy the allocation request. When the GC discovers adjacent objects that died, it combines the space they used into one free block which can be used for allocation.

Well the CLR team didn’t make the decision yet to compact LOH which is understandable because of the cost of that operation, on the other hand they improved the method by which the CLR manages the free lists, therefore making more effective use of fragments. So now the CLR will revisit the memory fragments “Free Slots” that earlier allocation didn’t use..

Also in Server GC mode, the runtime balances LOH allocations between each heap. Prior to .NET 4.5, we only balanced the SOH.

Many of the citizens of LOH are similar in nature which lends itself to the idea of Object Pools that can essentially reduce the LOH fragmentation.

Background mode for Server GC

Couple of years ago I blogged about a new Background GC in Workstation mode in CLR 4.0 the main idea is while executing Gen 2 collection, CLR checks at well defined points if Gen0/Gen1 has been requested, if true then Gen2 is paused until the lower collection runs to completion then it resumes. Read more about Background mode for Workstation GC.

New CLR supports Background mode for Server GC which introduces support for concurrent collections to server collector. It minimizes long blocking collections while continuing to maintain high application throughput.

In the above schematic diagram notice the following:

As in .Net 4.0 whenever Gen 0 or Gen 1 happen the managed threads allocating objects on the heap are paused until the collection is done.
Gen 2 will now pauses on Server mode too (As in Client/Workstation mode) during Gen 0 and Gen 1 giving it priority to finish first.
Gen 2 as usual runs in the background while the managed threads still allocating objects on the SOH heap.

Auto NGEN

.Net Framework prior to v4.5 install both the IL and NGened images for all the assemblies to the dev machines which consumes double the space required or more just for the framework. The CLR team conducted a research to find out what are the most commonly used assemblies out of the framework and named those the “Frequent Set” this set of assemblies only NGened now which saves a lot of space and dramatically decrease the framework disk footprint.

Now the perfectly good question that present itself is What about performance when non-NGened assemblies are used?!

The CLR team introduce a new replacement to the ngen engine called Auto NGen Maintenance Task. When you create a new application uses non-NGened assemblies or it’s not NGened itself here is what happen:

The user runs the application
Every time the application run it creates a new type of logs called “Assembly Usage Logs” in the AppData windows directory.
Now the new Auto NGen Maintenance Task takes advantage of a new feature in Windows 8 called “Automatic Maintenance” where a background efficient performance enhancement jobs can run in the background while the user is no heavily using the machine.
The Auto NGen Maintenance Task goes through the Logs from all the managed apps the user ran and creates a picture of what assemblies are frequently used by the user and are not NGened and which ones are NGened and not used frequently.
Then the task NGen these assemblies on fly and removes the NGen images not used (This will be known as Reclaiming Native Images Process).
The next time the app runs it gain some boost after it uses the NGened images.

Some notes about Auto NGen process

The assembly must targets the .NET Framework 4.5 Beta or later.
The Auto NGen runs only on Windows 8
For Desktop apps the Auto NGen applies only to GAC assemblies
For Metro styles apps Auto NGen applies to all assemblies
Auto NGen will not remove not used rooted native images (Images NGened by the developers). Basically the Auto NGen removes only images created by the same process read more about reclaiming native images here.

More Information

Related Posts

Hope this Helps,

Ahmed

April 27, 2012 by eknowledger

CLR 4.5: Managed Profile Guided Optimization (MPGO)

Again this performance enhancement (or technology I would say ) is targeting the application startup time more focused on the large desktop mammoth though! the new technology Microsoft introducing with .Net 4.5 is not going to be brand new for you if you are a C++ developer.

The PGO build process in C++

The PGO build process in C++

In C++ shipped with .Net 1.1 a multi-step compilation known as Profile Guided Optimization (PGO) as an optimization technique can be used by the C++ developer to enhance the startup time of the application. by running the application though some common scenarios (exercising your app) with data collection running in the background then used the result of that for optimized compilation of the code. you can read more about PGO here.

As a matter of fact Microsoft has been using this technology since .Net 2.0 to generate optimized ngen-ed assemblies (eating their own dog food) and now they are releasing this technology for you to take advantage of in your own managed applications.

Crash Course to .Net Compilation

In .Net when you compile your managed code it really gets interpreted into another language (IL) and is not compiled to binaries will be running on the production machine, this happen at the run time and referred to as Just-In-Time (JIT) Compilation. The alternative way to that is to pre-compile your code prior to execution at runtime using NGEN tool (shipped with .Net).

Generating these native images to the production processor usually speed up lunch time of the application especially large desktop apps that require a lot of JITting now there are not JIT compilation required at all.

Why you want to use NGen?

If it’s not obvious yet here are some good reasons:

NGen typically improves the warm startup time of applications, and sometimes the cold startup time as well
NGen also improves the overall memory usage of the system by allowing different processes that use the same assembly to share the corresponding NGen image among them

Exercise your Assemblies!

The Managed Profile Guided Optimization (MPGO) technology can improve the startup and working set (memory usage) of managed applications by optimizing the layout of precompiled native images.By organize the layout of your native image so most frequently called data are located together in a minimal number of disk pages in order to optimize each request for a disk page of image data to include a higher density of useful image data for the running program which in return will reduce the number of page requests from disk. if you think more about it, mostly it will be useful to machines with mechanical disks but if you have already moved to the Solid State Drive Nirvana the expected performance improvement will probably will be unnoticeable.

How it Works?

As I mentioned before it’s a very similar process to PGO from C++ compiler, where you exercise your assemblies, here are a simplified steps of the process:

Compile and Build your large desktop application to generate the normal IL assemblies.
Run MPGO tool on the IL assemblies built before.
Now exercise a variety of representative user scenarios within your application.(not too little not too much you don’t want the tool be confused about how to organize the layout of your image).
MPGO will store the profile created from your “training” with each IL assembly trained.
Now when you use NGEN tool on the trained assemblies it will genreate an optimized native image.

How to use MPGO

you can find MPGO in the following directory:

C:\program files(x86)\microsoft visual studio 11.0\team tools\performance tools\mpgo.exe

Run VS command as administrator and execute MPGO with the correct parameters

MPGO -scenario “c:\profiles\WinApp.exe” -OutDir “C:\profiles\optimized” -AssemblyList “c:\profiles\WinApp.exe”

Exercise some most used scenarios on your application. then close the application
The IL with profiles will be generated in the output dir
Run Ngen on the IL+Profiles

NGEN “C:\profiles\optimized\WinApp.exe”

Run the optimized application.

Final Words

MPGO is very cool innovative technology but the results of it depends highly on the human factor, what scenarios you will exercise? and how you’ll run it? the profile gets created so you might end up running it multiple times with different scenarios before your get the expected lunch time you’re looking for.

More Information:

Hope this Helps,

Ahmed

April 26, 2012April 26, 2012 by eknowledger

CLR 4.5: Multicore Just-in-Time (JIT)

Well, technically it’s not a new CLR it’s just an update to CLR 4.0 libraries, .Net framework 4.5 will replace your old CLR libraries with a brand new bits but the runtime is technically the same. running a GetSystemVersion() function on .Net 4.0 CLR OR .Net 4.5 CLR gives the same version! “v4.0.30319”

System.Runtime.InteropServices.RuntimeEnvironment.GetSystemVersion()

The new CLR takes advantage of the revolution of multicore processor to speed up the application compilation process, since the number of new machines equipped with multicore increased in the last few years more users can benefit from this feature.

The current startup routine for managed apps as depicted blow, the application starts, the the JIT compiler kicks in compiling all the methods into binaries before the program starts to executes. you can see how this can be a very lengthy process for big server applications with tens of dependencies on managed libraries.

.Net 4.5 introduce the concept of Parallel JIT Compilation, where a background thread runs on a separate processor core taking care of the JIT Compilation while the main execution thread running on a different core. In ideal scenario the JIT Compilation thread gets ahead of the main thread execution thread so whenever a method is required it is already Compiled. The question now which methods are required for the startup? in order for the background thread to compile to be ready for a faster startup. That leads us to the new ProfileOptimization type.

Optimization Profilers

This is a new type introduced in .Net 4.5 to improve the startup performance of application domains in applications that require the just-in-time (JIT) compiler by performing background compilation of methods that are likely to be executed, based on profiles created during previous compilations.

Notice ProfileOptimization requires a multicore machine to take advantage of its algorithms otherwise the CLR will ignore it on single core machines.The idea is fairly simple the ProfileOptimization runs in the background of your managed application AppDomain identifying the methods that are likely to be executed at the startup time of the application, so the next time the application runs it uses the profile created from the previous run to optimize the JIT compilation process by compile those methods to be ready when its needed.

The ProfileOptimization can be created simply by setting the profile root directory location (directory must exist!) and then start the profiler and save it to a file. so with two simple methods calls you can create profile files used by your application on the next run for faster startup.

There are something simple things to understand about the profiler quoted from MSDN

Profile optimization does not change the order in which methods are executed. Methods are not executed on the background thread; if a method is compiled but never called, it is simply not used. If a profile file is corrupt or cannot be written to the specified folder (for example, because the folder does not exist), program execution continues without optimization profiling.

Here’s a quick look of the TestProfile file created by the ProfileOptimization runtime type for the previous sample application.

JIT-compiling using multiple cores is enabled by default in ASP.NET 4.5 and Silverlight 5, so you do not need to do anything to take advantage of this feature. If you want to disable this feature, make the following setting in the Web.config file:

More Information:

MSDN: ProfileOptimization Class

Hope this helps,

Ahmed

April 24, 2012 by eknowledger

Silent Overflow in CLR

.Net, OLD BLOG, Software development
.Net, C#, CLR, Development, IL, Software
Leave a comment

Note: This blog post transferred from my OLD BLOG and was originally posted in 2007.

One of the things you want to avoid in your arithmetic operations on primitive types is the Silent Overflow , for example :

as you can see the value of Y is not the desired value , this is because the illegal operation on 2 different primitive types . what just happened is known as Silent Overflow , this kind of silent overflow handled in each programming language in a different way , language like C# consider this kind of overflow is not an error that’s why it allow it by default . other language like VB always consider this kind of overflow as an error and throw exception .

What’s matter here is the desired behavior the program need , in most scenarios you will need this kind of silent overflow not to happen in your code or you will get strange unexpected behaviors .

Fortunately the common language runtime (CLR) offers IL instructions that make it very easy task for each .Net language compiler to choose the desired behavior

*add*	adds two values together, with no overflow checking.
*add.ovf*	adds two values together.However,it throws System.OverflowException if an overflow occurs.
*sub*	subtraction with no overflow checking.
sub.ovf	subtract two values , throw System.OverflowException if an overflow occurs.
*mul*	multiplication with no overflow checking.
*mul.ovf*	multiply two values , throw System.OverflowException if an overflow occurs.
*conv*	convert value with no overflow checking.
*conv.ovf*	convert value , throw System.OverflowException if an overflow occurs.

Now let’s take a closer look on how C# compiler take advantage of this set of IL instructions . first, by default C# compiler doesn’t use overflow checking instructions to run the code faster , this means that the add,subtract,multiply and conversion instruction in C# will not include overflow checking

as you can see the C# compiler generated IL uses the add instruction as it the default behavior for C# to generate silent overflow , generally you can turn that off by compile your code using the /checked+ compiler switch , which tell the C# compiler to uses the safe version of the add with overflow check , add.ovf which will prevent this kind of silent overflow and throw OverflowException if overflow occur.

Sometimes you don’t want to turn the overflow globally for the whole application , rather you want to enable it only in some places in your code that’s when checked and unchecked C# operators come into the picture .

Simply checked operator tells the C# compiler to use the safe IL instruction for this operation , and the unchecked use the normal IL instructions (which is the normal behavior in C#).

now let’s modify our simple example to use the checked operator :

using the checked operator in our example generates OverflowException because the C# Compiler generated the IL using the safe add instruction add.ovf as you can see .

C# also offers checked and unchecked statements for wider scope of overflow checking but still determined by the developer inside his/her code .

same result as you can see , OverflowException has been thrown and the generate IL uses add.ovf , just the IL get larger because the C# compiler generated more (no operation instruction nop) which you can get ride of by generating more optimized version of your app.