Recent Posts

Calculate CUMULATIVE SUM (running total) using SQL / Hive query

Calculate CUMULATIVE SUM using SQL / Hive query | APDaga | DumpBox
SUM is another commonly used function in SQL after the COUNT. SUM is used to get the sum/addition of the records in the table.
It is very simple to use the SUM function in SQL / Hive query. But, How to calculate Cumulative Sum / Running Total?


[Def] CUMULATIVE SUM :

A cumulative sum is a sequence of partial sums of a given sequence.

For example, the cumulative sums of the sequence {a,b,c,...}, are  a, a+b, a+b+c, ....






Here, we will be demonstrating the calculation of cumulative sum using SQL query using the following problem statement.

Let's consider a tables "EMPLOYEE" as shown below:
BEGIN TRANSACTION;

/* Create a table called EMPLOYEE */
CREATE TABLE EMPLOYEE(ID integer, SALARY integer);

/* Create few records in this table */
INSERT INTO EMPLOYEE VALUES(1,10);
INSERT INTO EMPLOYEE VALUES(2,20);
INSERT INTO EMPLOYEE VALUES(3,30);
INSERT INTO EMPLOYEE VALUES(4,40);
INSERT INTO EMPLOYEE VALUES(5,50);
COMMIT;

/* Display all the records from the table */
SELECT * FROM EMPLOYEE;
Table created: EMPLOYEE
ID| SALARY 
----------
1 | 10
2 | 20
3 | 30
4 | 40
5 | 50

EXPECTED OUTPUT :

We need to perform a cumulative sum on the column SALARY of the table employee. 
We want to get the output as:
ID | SALARY | CUMULATIVE_SUM   // REMARK
----------------------------
1  | 10     | 10               // 10
2  | 20     | 30               // 10+20
3  | 30     | 60               // 10+20+30
4  | 40     | 100              // 10+20+30+40
5  | 50     | 150              // 10+20+30+40+50




■ INNER JOIN LOGIC :

In order to get the result, as shown above, we need to perform an inner join on the same table.

INNER JOIN keyword returns the records that have matching values in both the tables. In this case, we are joining the same table "EMPLOYEE" with itself.

The condition for joining these tables will be (A.ID >= B.ID)

Inner join on (A.ID >= B.ID) condition is shown in below image.

Inner Join on the Table | SQL / Hive Query | APDaga | DumpBox


■ VIDEO :









■ QUERY :

SELECT A.ID, A.SALARY, SUM(B.SALARY)
FROM EMPLOYEE as A
INNER JOIN EMPLOYEE as B ON (A.ID>=B.ID)
GROUP BY A.ID, A.SALARY;

OUTPUT :

ID | SALARY | CUMULATIVE_SUM
----------------------------
1  | 10     | 10            
2  | 20     | 30            
3  | 30     | 60            
4  | 40     | 100           
5  | 50     | 150           

That's it. We got the expected output. 


--------------------------------------------------------------------------------

Click here to see solutions for all Machine Learning Coursera Assignments.
&
Click here to see more codes for Raspberry Pi 3 and similar Family.
&
Click here to see more codes for NodeMCU ESP8266 and similar Family.
&
Click here to see more codes for Arduino Mega (ATMega 2560) and similar Family.


Feel free to ask doubts in the comment section. I will try my best to answer it.
If you find this helpful by any mean like, comment and share the post.
This is the simplest way to encourage me to keep doing such work.

Thanks & Regards,
-Akshay P Daga


No comments