Difference in COUNT(*) vs COUNT(1) vs COUNT(col_name) in SQL / Hive query

byAkshay Daga (APDaga) -February 19, 2019

3

Difference in COUNT(*) vs COUNT(1) vs COUNT(col_name) in SQL / Hive query | APDaga's DumpBox

In SQL query, "COUNT" is the most used function. It is used for taking the count of the records.

We can pass various parameters in the count function. It's output changes as per the parameters passed in it.

We will discuss the difference in the output of count(*), count(1) and count(col_name).

Recommended SQL Courses:

Udemy: SQL - MySQL for Data Analytics and Business Intelligence

LinkedIn: Master SQL for Data Science

edX: Databases: Advanced Topics in SQL

edureka: SQL Essentials Training & Certification

Coursera: Learn SQL Basics for Data Science

Eduonix: SQL Queries

Let's consider a table as shown below:

BEGIN TRANSACTION;

/* Create a table called NAMES */
CREATE TABLE STUDENTS(ID integer, MARKS text);

/* Create few records in this table */
INSERT INTO STUDENTS VALUES(1,10);
INSERT INTO STUDENTS VALUES(2,15);
INSERT INTO STUDENTS VALUES(null,20);
INSERT INTO STUDENTS VALUES(3,25);
INSERT INTO STUDENTS VALUES(null,18);
INSERT INTO STUDENTS VALUES(4,22);
COMMIT;

/* Display all the records from the table */
SELECT * FROM STUDENTS;

Table created: STUDENTS

ID   | MARKS
-----+------
1    | 10
2    | 15
null | 20
3    | 25
null | 18
4    | 22

■ COUNT(*) :

Let's check count(*) operation on the above table: STUDENTS.

SELECT count(*) from STUDENTS;

Output:

count(*) output = Total number of records in the table including null values.

■ COUNT(1) :

Let's check count(1) operation on the above table.

SELECT count(1) from STUDENTS;

Output:

count(1) output = Total number of records in the table including null values.

NOTE :

▪ The output of count(*) and count(1) is same but the difference is in the time taken to execute the query.

▪ count(1) is faster/optimized than count(*) because:

▯ count(*) has to iterate through all the columns,
But count(1) iterates through only one column.

▯ Check the time difference between count(*) and count(1) on big data-set.

▪ Always try to use count(1) instead of count(*). Since, count(1) performs better and saves computation effort & time.

COMMON CONFUSION :

▪ Many people think that the number "1" in the count(1) indicates the first column of the table.

But it is not correct.

▪ 1 is just a hardcoded value. You can use any other hardcoded number or string instead of 1.
The output will be the same.

count(0), count(2), count(-1), count("A"), count("APDaga"), etc

■ COUNT(col_name) :

Let's check count(col_name) operation on the above table.

On first column : ID

SELECT count(ID) from STUDENTS;

Output:

count(ID) output = Total number of entries in the column "roll_no" excluding null values.

On the second column : MARKS

SELECT count(MARKS) from STUDENTS;

Output:

count(MARKS) output = Total number of entries in the column "Marks" excluding null values.

■ VIDEO :

■ SUMMARY :

count(*) :
output = total number of records in the table including null values.
count(1) :
output = total number of records in the table including null values.
[ Faster than count(*) ]
count(col_name) :
output = total number of entries in the column "col_name" excluding null values.

--------------------------------------------------------------------------------

Click here to see solutions for all Machine Learning Coursera Assignments.

&

Click here to see more codes for Raspberry Pi 3 and similar Family.

&

Click here to see more codes for NodeMCU ESP8266 and similar Family.

&

Click here to see more codes for Arduino Mega (ATMega 2560) and similar Family.

Feel free to ask doubts in the comment section. I will try my best to answer it.

If you find this helpful by any mean like, comment and share the post.

This is the simplest way to encourage me to keep doing such work.

Thanks & Regards,

-Akshay P Daga

Tags: Q&A SQL

3 Comments

Configuration Expert22 June 2021 at 22:44
I was just testing this on a table with 68,689,336 rows of data on a SQL Server 2008 R2 instance and COUNT(*) and COUNT(1) perform basically the same. There is a 30 ms difference, but with repeated runs, the difference seems to switch as to which one performs better. I used the hint MAXDOP 1 to ensure that I wouldn't be hit by parallel processing causing performance differences.
How large does the data set need to be for COUNT(*) and COUNT(1) to give a noticeable performance difference?
ReplyDelete
Replies

Add comment

Difference in COUNT(*) vs COUNT(1) vs COUNT(col_name) in SQL / Hive query

■ COUNT(*) :

■ COUNT(1) :

NOTE :

COMMON CONFUSION :

■ COUNT(col_name) :

■ VIDEO :

■ SUMMARY :

3 Comments

Contact form