Expectantly, you’d assume that Microsoft had built in a function to calculate a median of a set of values. Expect less! Or, rather none. There is no built-in function in T-SQL to calculate a median. I do not know the reasoning behind not making one, but that’s what it is.
In case you don’t know what a median is, it is the value right in the middle of your set of ordered values. So, if you have an array of nine integers, it will be the fifth element. In the case when the number of elements is even, it is the value evenly distributed between the bordering elements in the middle. The median is usually used in statistics applications.
To give you an example, if we have a result set of five elements from a query:
1,3,5,9,11 – your median is going to be
5 because it is exactly in the middle of the set. And if we take an even set of
1,3,5,9, then your median is going to be
(3+5)/2 = 4.
Now, how would you get one on T-SQL if there is no built-in function to do that?
If you understand what a median is, a quite simple solution can implement it. All you have to do is get the values from both halves of the dataset, both closest to the middle, and take an average of them.
Let’s say we have conditions we want to enforce, such as each each number in the table has an
Active column, and we only want to get the median of active numbers.
Your complete set of active numbers can be selected like this:
SELECT Number FROM MyTable WHERE Active = 1
But you still need to split it in half as you want to get the median value. So, why don’t we take a half of it and sort it with ascending order and then take another half sorted by descending order?
SELECT TOP 50 PERCENT Number FROM MyTable WHERE Active = 1 ORDER BY Number ASC
SELECT TOP 50 PERCENT Number FROM MyTable WHERE Active = 1 ORDER BY Number DESC
Your two sets are now going to look like:
+-----+------+ | ASC | DESC | +-----+------+ | 1 | | | 3 | | | | 3 | | | 1 | +-----+------+
We can now select the largest number from our first subset, the one in ascending order and the least one from the descending subset – all we have to do is split them in half, and we’ll have the median:
SELECT Median = ( ( SELECT TOP 50 PERCENT Number FROM MyTable WHERE Active = 1 ORDER BY Number ASC ) AS OrderByMin + ( SELECT TOP 50 PERCENT Number FROM MyTable WHERE Active = 1 ORDER BY Number DESC ) AS OrderByMax ) / 2.
Note the dot after the digit 2 at the end. That will indicate that it is a floating point value rather than an integer. If you don’t use it, in cases when you have an even number of rows in your dataset, you will have a ceiling or a floor of what actually could be a 2.8 or 2.3.
Till’ next post and hope this helps someone 🙂