Sunday 22 April 2018

Excel function working like SQL group by + count(distinct *)?


Suppose I have an Excel sheet with below data



CODE (COL A) | VALUE (COL B)
==============================
A01 | 10
A01 | 20
A01 | 30
A01 | 10
B01 | 30
B01 | 30

Is there an Excel function working like ..




SELECT CODE, count (Distinct *) FROM TABLE GROUP BY CODE


CODE | Distinct Count of Value
===================================
A01 | 3
B01 | 1

or, better yet, can I have an Excel formula pasted in column C to get something like this:


 
CODE (COL A) | VALUE (COL B) | DISTINCT VALUE COUNT WITH MATCHING CODE (COL C)
===============================================================================
A01 | 10 | 3
A01 | 20 | 3
A01 | 30 | 3
A01 | 10 | 3
B01 | 30 | 1
B01 | 30 | 1

I know I can use pivot table to get this result easily. However due to reporting requirements I have to append the "distinct count" column to the Excel sheet, hence pivot table is not an option.


My last resort is to use Excel macros (which are fine), but before that I would like to learn whether Excel functions can accomplish this kind of task.



Answer



Enter this formula in cell C2, assuming you have data in rows 2 through 7,


=SUMPRODUCT(($A$2:$A$7=A2)  /  COUNTIFS($B$2:$B$7, $B$2:$B$7, $A$2:$A$7, $A$2:$A$7))

and drag it down.


How it works:


When SUMPRODUCT is given a list of scalar arguments, it works like SUM, but it will take an array as an argument without special array entry.


The array is populated with zeroes for records that don't match the CODE value in column A. For those that match, the array is populated with 1/(the number of records that have the same A and B values as this record). So, for example, there are two records that have A=A01 and B=10, so for those two records 1/2 (½) is entered in the array. Think of this as a kind of weighting for duplicate values. Whenever these values are summed, the sum for each unique B value is 1 (in the example, the two records would sum ½+½=1). This gives the count of distinct records.


Full example using your example data:


For any record with A=A01, the formula would return the sum of {½,1,1,½,0,0}=3.
For any record with A=B01, the formula would return the sum of {0,0,0,0,½,½}=1.


No comments:

Post a Comment

Where does Skype save my contact's avatars in Linux?

I'm using Skype on Linux. Where can I find images cached by skype of my contact's avatars? Answer I wanted to get those Skype avat...