Recent Posts

Difference in COUNT(*) vs COUNT(1) vs COUNT(col_name) in SQL / Hive query

Difference in COUNT(*) vs COUNT(1) vs COUNT(col_name) in SQL / Hive query | APDaga's DumpBox
In SQL query, "COUNT" is the most used function. It is used for taking the count of the records.

We can pass various parameters in the count function. It's output changes as per the parameters passed in it.

We will discuss the difference in the output of count(*)count(1) and count(col_name).

Let's consider a table as shown below:
BEGIN TRANSACTION;

/* Create a table called NAMES */
CREATE TABLE STUDENTS(ID integer, MARKS text);

/* Create few records in this table */
INSERT INTO STUDENTS VALUES(1,10);
INSERT INTO STUDENTS VALUES(2,15);
INSERT INTO STUDENTS VALUES(null,20);
INSERT INTO STUDENTS VALUES(3,25);
INSERT INTO STUDENTS VALUES(null,18);
INSERT INTO STUDENTS VALUES(4,22);
COMMIT;

/* Display all the records from the table */
SELECT * FROM STUDENTS;

Table created: STUDENTS
ID   | MARKS
-----+------
1    | 10
2    | 15
null | 20
3    | 25
null | 18
4    | 22





■ COUNT(*) :

Let's check count(*) operation on the above table: STUDENTS.
SELECT count(*) from STUDENTS;
Output:
6
count(*output = Total number of records in the table including null values.



■ COUNT(1) :

Let's check count(1) operation on the above table.
SELECT count(1) from STUDENTS;
Output:
6
count(1output = Total number of records in the table including null values.


NOTE :

▪ The output of count(*) and count(1) is same but the difference is in the time taken to execute the query.


▪ count(1) is faster/optimized than count(*) because:

    ▯ count(*) has to iterate through all the columns,
             But
 count(1) iterates through only one column.

    ▯ Check the time difference between count(*) and count(1) on big data-set.

▪ Always try to use count(1) instead of count(*). Since, count(1) performs better and saves computation effort & time.





COMMON CONFUSION :

▪ Many people think that the number "1" in the count(1) indicates the first column of the table. 
But it is not correct.

▪ 1 is just a hardcoded value. You can use any other hardcoded number or string instead of 1.
The output will be the same.
count(0), count(2), count(-1), count("A"), count("APDaga"), etc



■ COUNT(col_name) :

Let's check count(col_name) operation on the above table.

On first column : ID
SELECT count(ID) from STUDENTS;
Output:
4
count(IDoutput = Total number of entries in the column "roll_no" excluding null values.

On the second column : MARKS
SELECT count(MARKS) from STUDENTS;
Output:
6  
count(MARKSoutput = Total number of entries in the column "Marks" excluding null values.


■ VIDEO :







■ SUMMARY :

  1. count(*) :
    output = total number of records in the table including null values.
  2. count(1) :
    output = total number of records in the table including null values.
    [ Faster than count(*) ]
  3. count(col_name) :
    output = total number of entries in the column "col_name" excluding null values.

--------------------------------------------------------------------------------

Click here to see solutions for all Machine Learning Coursera Assignments.
&
Click here to see more codes for Raspberry Pi 3 and similar Family.
&
Click here to see more codes for NodeMCU ESP8266 and similar Family.
&
Click here to see more codes for Arduino Mega (ATMega 2560) and similar Family.


Feel free to ask doubts in the comment section. I will try my best to answer it.
If you find this helpful by any mean like, comment and share the post.
This is the simplest way to encourage me to keep doing such work.

Thanks & Regards,
-Akshay P Daga

No comments