Saturday, April 9, 2016

Scale up and scale out with .NET framework


Recently I came across a question about the ability of .NET to scale, comparing to other frameworks. Here’s my answer. I hope you’ll find it useful.

While the scalability of an application is mostly determined by the way in which the code is written, the framework / platform that is being used can significantly influence the amount of effort required to produce an application that scales gracefully to many cores (scale up) and many machines (scale out).

Before joining Microsoft, I was part of a team that built a distributed, mission critical Command and Control system using .NET technologies (almost exclusively). The applications that make up the system are deployed on powerful servers, they are all stateful, massively concurrent, with strict throughput requirements. We were able to build a quality, maintainable, production ready system in less than 3 years (which is a record time for such system).

While working for Microsoft in the past 6 years, I worked on hyper scale services deployed on thousands of machines, all of which have been written using .NET tech. In most cases, unless there’s a good reason, .NET is used for the front ends (ASP.NET MVC), mid-tier and backend services.

Here’s how .NET empowers applications that need to scale up and scale out.

Scale up:

Building a reliable, high performance application that scales gracefully to many core hardware is hard to do. One of the challenges associated with scalability is finding the most efficient way to control the concurrency of the application. Practically, we need to figure out a way to divide the work and distribute it among the threads such that we put the maximum amount of cores to work. Another challenge that many developers struggle with is synchronizing access to shared mutable state (to avoid data corruption, race conditions and the like) while minimizing contentions between the threads.

Concurrency Control

Take the below concurrency/throughput analysis for example, note how throughput peaks at concurrency level of 20 (threads) and degrades when concurrency level exceeds 25. 

​So how can a framework help you maximize throughput and improve the scalability characteristic of your application?

It can make concurrency control dead simple. It can include tools that allow you to visualize, debug and reason about your concurrent code. It can have first-class language support for writing asynchronous code​. It can include best in class synchronization and coordination primitives and collections.

In the past 13 years I’ve been following the incremental progress that the developer division made in this space - it has been a fantastic journey.

In the first .NET version the managed Thread Pool was introduced to provide a convenient way to run asynchronous work. The Thread Pool optimizes the creation and destruction of threads (according to JoeDuffy, it cost ~200,000 cycles to create a thread) through the use of heuristic ‘thread injection and retirement algorithm’ that determines the optimal number of threads by looking at the machine architecture, rate of incoming work and current CPU utilization.

In .NET 4.0, TPL (Task Parallel Library) was introduced. The Task Parallel Library includes many features that enable the application to scale better. It supports worker thread local pool (to reduce contentions on the global queue) with work stealing capabilities, and the support for concurrency levels tuning (setting the number of tasks that are allowed to run in parallel)

In .NET 4.5 -  the async-await keywords were introduced, making asynchronicity a first class language feature, using compiler magic to make code that looks synchronous run asynchronously. As a result – we get all the advantages of asynchronous programming with a fraction of the effort.

Consistency / Synchronization

Although more and more code is now tempted to run in parallel, protecting shared mutable data from concurrent access (without killing scalability) is still a huge challenge. Some applications can get away relatively easy by sharing only immutable objects, or using lock free synchronization and coordination primitives (e.g. ConcurrentDictionary) which eliminate the need for locks almost entirely. However, in order to achieve greater scalability there’s not escape from using fine-grained locks.

In an attempt to provide a solution for mutable in-memory data sharing that on one hand scales, and on the other hand easy to use and less error prone than fine grained locks – the team worked on a Software Transactional Memory support for .NET that would have ease the tension between lock granularity and concurrency. With STM, instead of using multiple locks of various kinds to synchronize access to shared objects, you simply wrap all the code that access those objects in a transaction and let the runtime execute it atomically and in isolation by doing the appropriate synchronization behind the scenes. Unfortunately, this project never materialized.

As far as I know, the .NET team was the only one that even made a serious effort to make fine grand concurrency simpler to use in a non functional language.

Speaking of functional languages, F# is a great choice for building massively concurrent applications. Since  in F# structures are immutable by default, sharing state and avoiding locks is much easier. F# also integrates seamlessly with the .NET ecosystem, which gives you access to all the third party .NET libraries and tools (including TPL).

Scale out:

Say that you are building a stateless website/service that needs to scale to support millions of users: You can deploy your ASP.NET application as WebSite or Cloud Service to Microsoft Azure public cloud (and soon Microsoft Azure Stack for on premises) and run it on thousands of machines. You get automatic load balancing inside the data-center, and you can use Traffic Manager to load balance requests cross data-centers. All with very little effort.

If you are building a statesful service (or combination of stateful and staeless), you can use the Azure Service Fabric, which will allow you deploy and manage hundreds or thousandof .NET applications on a cluster of machines. You can scale up or scale down your cluster easily, knowing that the applications scale according to available resources.

Note that you can use the above with none .NET applications. But most of the tooling and libraries are optimized for .NET.

Wednesday, March 11, 2015

Software Interview Nightmares

The below is from my Quora answer to ”Has Cracking the Coding Interview made it more difficult for recruiters to evaluate software dev…” @

The problem with the hiring process in many software companies (especially the big ones) is that instead being optimized to establish diverse teams of smart, creative and passionate engineers – it is heavily optimized to filter out the Secretly Terrible Engineers, for the most part by using the same old, back-to-basics algorithms questions, irrespective of candidates experience and background. Unfortunately, as recognized by many (including Google, Facebook), this interview strategy result in an outstanding false negative ratio.

Now, can a company come out with an alternative system with much lower false negative ratio without compromising the hiring bar and increasing the false positive (miss hire) ratio? yes they can! Myself and my extended team have been doing just that in the past 4 years.

The honest truth is that very few of us (if any) have the wisdom to assess the programming skills of experienced engineers in a 45 minutes algorithms quiz which is performed under extremely stressful conditions.

Trust me, I've been asking these questions for years. I even documented all the questions (and answers) that I've asked and being asked, it’s all right here: Get Ready for Software Interview. As an interviewer, algorithms questions are the easiest to ask. You don’t have to sweat nor think too much. You know the questions and answers by heart, and you are always surprised to see engineers with great experience struggle when they try to solve these questions with their back against the whiteboard.

It never really felt quite right, and the more I interviewed the more I realized that I simply can't evaluate years of experience using 45 minutes quiz. That definitely wasn't the way I wanted to be evaluated.

We can do better. We should do better. Hiring managers must know better. Instead of settling on these awful odds, they should refresh their interview strategy. They should hand pick the interviewers based on the candidate experience and current set of skills. They should guide the interviewers to focus less on algorithms on the whiteboard (enough already!) and more on good old, one-on-one software related conversations.

I believe that a good interviewer, with similar background as the interviewee, should be able to asses the quality of the latter simply by talking with him/her for 15-30 minutes. Talk about his/her past experience, look at the code that he/she has written, and check if he/she became an expert in the areas that he/she worked on. Every interviewer that is asked to participate in an interview loop should carefully read the candidate resume and make sure that he can ask coding questions related to the latter past experience. If a technical interviewer doesn't have the appropriate experience to ask these questions (no shame in that), he should excuse himself from the loop instead of defaulting to good old , one size (doesn't) fit all algorithms quiz. I mean, it makes little sense to base an interview on algorithms questions when interviewing an engineer that spent the last 5 years building web applications focusing on HTML and java-script (and then rationalize the no-hire decision on his/her lacking 'core' engineering skills). Yet it happens all the time.
Some of the best developers I know have degrees in Electronics, Physics and Art. They have been developing software since puberty. They are passionate about it. It’s their hobby. They would work for free. Some of them might not know what’s the BigO of Merge Sort (god forbid!), but they have been rockstars in every company that they worked on, the kind of talent that you don’t want to miss. Dare to add those people to your mix and you’ll get a diverse (=more productive) team that create great products that appeal to a wider range of users.

If you search for topics currently asked in Software interviews, you'll find the following:binary search, tree traversal (pre/in/post), sorting algorithms (merge/quick/and some O(n^2) ones), recursion/iteration, graph search, dynamic programming, breadth first search, depth first search, stacks, queues, hashtables, heaps, priority queues, and linked lists (single/doubly/circular).
We expect all candidates to solve these questions on the spot, under pressure, irrespective of their past experience.

The problem is that most developers don’t get a chance to implement even half of the above in their daily work. They reuse existing algorithms or services encapsulated nicely in most modern frameworks/platforms. So, you are optimizing your interview to find the 5% that implement these algorithms in their daily work, plus 5% collage grads that just finished the 'Introduction to Algorithms' class, and maybe another 20% that spent couple of months preparing to these interviews.
The rest are in serious disadvantage. They might or might not pass your tests. It’s more than likely that hiding in this group are the 10x multipliers, the ones that can ramp up quickly on the most complex code-base, the ones that write beautiful and maintainable code, the ones that can design, the ones that can test, the ones that can lead, the ones that stop at nothing and get things done, the ones that make the difference between successful and unsuccessful projects. Isn’t that what you are looking for?!

Having said all that, I have no fantasy that interviewers will stop using the whiteboard so extensively any time soon. It's just too easy. Plus, software engineering is spread across so many areas (web, mobile, SQL, OO, concurrency, distributed systems, cloud, big data, etc) - that algorithms seems like the only common denominator. Just that it isn't