The SQL AVG() function calculates the average value of a numeric column in a table. It is commonly used to find the average of a set of values, such as prices, scores, or quantities. Here’s an overview of the SQL AVG() function:
Syntax:
SELECT AVG(column_name) AS average_value FROM table_name;
column_name: The name of the numeric column for which you want to calculate the average.
table_name: The name of the table containing the column.
Example: Suppose you have a table named sales with a column named amount, and you want to find the average amount of sales:
SELECT AVG(amount) AS average_sales FROM sales;
Result: The AVG() function returns a single value, which is the average of the values in the specified column. If there are no rows in the table, or if the specified column contains NULL values, the function returns NULL.
Aggregate Function: AVG() is an aggregate function in SQL, which means it operates on a set of rows and returns a single result. It calculates the average value across all rows that meet the conditions specified in the WHERE clause (if present).
Data Type: The data type of the result returned by AVG() is typically the same as the data type of the column being averaged. For example, if the column is of type INT, the result will also be an INT. However, in some cases, the result may be automatically cast to a larger data type to avoid loss of precision.
Usage:
AVG() is commonly used in statistical analysis, reporting, and data exploration to calculate the mean value of a dataset.
It can be combined with other SQL functions, such as GROUP BY, WHERE, and HAVING, to perform more complex calculations or filter the data before averaging.
Overall, the SQL AVG() function is a powerful tool for calculating the average value of numeric data in a table, making it easier to analyze, count and interpret numeric value in large datasets.
SQL Server AVG() function: ALL vs. DISTINCT
n SQL Server, the AVG() function calculates the average value of a numeric column. The differences between using ALL and DISTINCT with AVG() lie in how duplicates are handled in the query and what returns the average value calculation:
ALL:
When ALL is used with AVG(), it includes all values, including duplicates, in the calculation of the average.
It is the default behavior of the AVG() function if neither ALL nor DISTINCT is specified.
If there are duplicate values in the column, each occurrence is counted separately in the average calculation.
Example:
SELECT AVG(ALL column_name) AS average_value FROM table_name;
DISTINCT:
When DISTINCT is used with AVG(), it only considers distinct values in the column for the average calculation.
It eliminates duplicate values from the calculation, ensuring that each distinct value contributes only once to the average.
Example:
SELECT AVG(DISTINCT column_name) AS average_value FROM table_name;
When to Use Each:
Use ALL when you want to include all values in the average calculation, including duplicates. This is useful when each occurrence of a value should contribute to the average independently.
Use DISTINCT when you want to calculate the average based on unique values only, excluding duplicates. This is useful when you’re interested in the average value across distinct entities or when you want to eliminate redundancy in the calculation.
In summary, choose between ALL and DISTINCT based on whether you want to include or exclude duplicates from the average calculation, respectively.
SQL Server AVG() with GROUP BY example
Let’s create a scenario with two tables: products and sales. The products table contains information about average list price different products, including their IDs and names. The sales table records sales transactions, including the product ID, quantity sold, and the sale amount. We’ll then use the AVG() function with GROUP BY to calculate the average name price and sale amount for each product.
Here’s the T-SQL code to create all the records tables and insert sample data:
-- Create the products table
CREATE TABLE products (
product_id INT PRIMARY KEY,
product_name VARCHAR(100)
);
-- Insert sample data into the products table
INSERT INTO products (product_id, product_name)
VALUES
(1, 'Product A'),
(2, 'Product B'),
(3, 'Product C');
-- Create the sales table
CREATE TABLE sales (
sale_id INT PRIMARY KEY,
product_id INT,
quantity_sold INT,
sale_amount DECIMAL(10, 2)
);
-- Insert sample data into the sales table
INSERT INTO sales (sale_id, product_id, quantity_sold, sale_amount)
VALUES
(1, 1, 10, 100.00),
(2, 1, 5, 50.00),
(3, 2, 8, 120.00),
(4, 2, 12, 180.00),
(5, 3, 15, 200.00);
Now, let’s use the AVG() function with GROUP BY to calculate the average sale price and amount for each product:
SELECT p.product_id, p.product_name, AVG(s.sale_amount) AS avg_sale_amount
FROM products p
JOIN sales s ON p.product_id = s.product_id
GROUP BY p.product_id, p.product_name;
Output:
product_id | product_name | avg_sale_amount
-------------------------------------------
1 | Product A | 75.0000
2 | Product B | 150.0000
3 | Product C | 200.0000
In sum, this output:
Each row represents a product.
avg_sale_amount shows the average sale amount for each product.
The result is calculated by averaging the sale amounts for each product using the AVG() function along with GROUP BY to group the sales data by product.
AVG() With a DISTINCT Clause
Let’s create a scenario with a table named students that contains information about students and their scores in different subjects. We’ll then use the AVG() function with a DISTINCT clause to calculate the average score across all distinct subjects.
Here’s the T-SQL code to create the table and insert sample data:
-- Create the students table
CREATE TABLE students (
student_id INT PRIMARY KEY,
student_name VARCHAR(100),
subject VARCHAR(50),
score INT
);
-- Insert sample data into the students table
INSERT INTO students (student_id, student_name, subject, score)
VALUES
(1, 'Alice', 'Math', 90),
(2, 'Bob', 'Science', 85),
(3, 'Charlie', 'Math', 95),
(4, 'David', 'English', 80),
(5, 'Eve', 'Science', 90),
(6, 'Frank', 'Math', 85),
(7, 'Grace', 'English', 75),
(8, 'Hannah', 'Science', 88),
(9, 'Ian', 'Math', 92),
(10, 'Jack', 'English', 78);
Now, let’s use the AVG() function with a DISTINCT clause to calculate the sum of the average score across all distinct subjects:
SELECT AVG(DISTINCT score) AS average_score
FROM students;
Output:
average_score
-------------
85.8
In this output:
The AVG() function calculates the average of the score column.
The DISTINCT clause ensures that only distinct values of score are considered in the average calculation.
The result, 85.8, represents the average score across all distinct subjects in the students table.
We can use the AVG() function with a CASE statement to calculate the average score for each subject. Here’s how you can do it:
SELECT
subject,
AVG(CASE WHEN subject = 'Math' THEN score ELSE NULL END) AS avg_math_score,
AVG(CASE WHEN subject = 'Science' THEN score ELSE NULL END) AS avg_science_score,
AVG(CASE WHEN subject = 'English' THEN score ELSE NULL END) AS avg_english_score
FROM students
GROUP BY subject;
Output:
subject | avg_math_score | avg_science_score | avg_english_score
---------------------------------------------------------------
Math | 90.6667 | NULL | NULL
Science | NULL | 87.6667 | NULL
English | NULL | NULL | 77.6667
In this query:
We use a CASE statement within the AVG() function to conditionally calculate the average score for each subject.
The CASE statement checks the subject column. If the subject matches the specified subject (‘Math’, ‘Science’, ‘English’), it includes the score in the average calculation; otherwise, it includes NULL.
The GROUP BY clause groups the results by the subject column, allowing us to calculate the average score for each subject separately.
The output displays the average score for each subject. If there are no scores for a particular subject, for example the average score is shown as a NULL value.
Additional Resources