Calculating Statistical Values in T-SQL: Mean (Average), Median, Mode and Range

Sometimes you may need to calculate statistical values based on a set of values you have in a table. Let’s start with explaining what these values represent, in case you don’t already know. For the purposes of this post, we will use the following set of numbers: 1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 4, 4, 5.

So, let's start with creating a temporary table and populating it with this set:

IF OBJECT_ID('tempdb..#Vals') IS NOT NULL DROP TABLE #Vals;

CREATE TABLE #Vals
(
	PK INT PRIMARY KEY IDENTITY(1, 1),
	Number INT NOT NULL
);

INSERT INTO #Vals
VALUES (1), (1), (1), (1), (1), (2), (2), (2), (2), (3), (3), (3), (4), (4), (5);

-- Our calculations will be inserted here

DROP TABLE #Vals;

1. Mean (Same as Average)
The mean, or the average, of a set of numbers is calculated as the sum of all numbers in the set divided by the count of numbers in the set. So, in our example, it's (1 + 1 + 1 + 1 + 1 + 2 + 2 + 2 + 2 + 3 + 3 + 3 + 4 + 4 + 5) / 15 = 35 / 15 = 2.(3)

T-SQL has a built-in function to calculate the mean called AVG() which, as you probably figured, stands for average. Now, let's get it:

SELECT AVG(Number) FROM #Vals;
SELECT AVG(CAST(Number AS FLOAT)) FROM #Vals;

Notice how the first query returns 2 – that is because Number is declared as INT so the result is rounded down. To get a true average, we need to cast it to FLOAT (or any other floating-point type).

2. Median
The median is the value in the middle of the set, when that set is ordered. In our example there is an odd number of records so we can pick the one right in the middle: 2 (8th item). If there is an even number of items in the set, the median is calculated as the average of the two numbers in the middle. For example, if our set was 1, 1, 2, 2, then the median is (1 + 2) / 2 = 1.5

There is no built-in function to calculate a median in T-SQL and there are a few ways to do that. Here is a couple of them:

One way to do that is to split your set in two (select 50%) and take the average of your top values for ascending and descending order (same record if you have an odd count of numbers):

SELECT Median =
	(
		(
			SELECT TOP 1 Number 
			FROM
				(
					SELECT TOP 50 PERCENT Number FROM #Vals ORDER BY Number ASC
				) AS T 
			ORDER BY Number DESC
		) + 
		(
			SELECT TOP 1 Number 
			FROM
				(
					SELECT TOP 50 PERCENT Number FROM #Vals ORDER BY Number DESC
				) AS T
			ORDER BY Number ASC
		)
	) / 2;

The code above looks, of course, somewhat cumbersome. Here's a different way, using the ROW_NUMBER() function:

SELECT Median = AVG(Number)
FROM 
	(SELECT 
			Number, 
			rnAsc = ROW_NUMBER() OVER(ORDER BY Number ASC),
			rnDesc = ROW_NUMBER() OVER(ORDER BY Number DESC)
	FROM #Vals
	) T
WHERE rnAsc BETWEEN rnDesc - 1 AND rnDesc + 1

The former is a more elegant way, obviously.

3. Mode
Mode is defined as the value that occurs the most in the set. So, this is pretty simple – all we have to do is group by number and order in descending order:

SELECT TOP 1 Mode = Number
FROM #Vals
GROUP BY Number
ORDER BY COUNT(Number) DESC;

4. Range
Range is nothing more than your minimum and maximum values in the set so, this is yet another very simple query:

SELECT 
		[Min] = MIN(Number),
		[Max] = MAX(Number)
	FROM #Vals;
This entry was posted in MSSQL and tagged , , , , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *