Sorting rows¶
Try me¶
If you want to try this tutorial, you can use the Classic model database. This is a sample database provided by MySQL Tutorial. You can also download it from here.
The classic model is represented below:

The classic model is a sample database that represents a classic car company. The database contains 8 tables:
Customers: stores customer’s data.
Employees: stores employee’s data.
Offices: stores sales office data.
OrderDetails: stores sales order line items for each sales order (i.e. each product sold within a sales order).
Orders: stores sales orders placed by customers.
Payments: stores payments made by customers based on their accounts.
ProductLines: stores product line data. A product line is like a type of product. Each product line can have various products.
Products: stores product data.
ORDER BY¶
Sometimes we want to control the order in which the results are displayed. It is often easier to explore data and analyse the results when data is sorted in a meaning way. This is when ORDER BY clause comes into play.
The syntax of the ORDER BY clause is:
SELECT column1, column2, ...
FROM table_name
ORDER BY column1 ASC|DESC, column2 ASC|DESC, ... ASC|DESC
That is, we basically specify the columns in which we want to order the results and specify if we want the results in ascending or descending order.
Let us see some examples extracted from the classic_model database. This is the first ten orders:
SELECT * FROM Orders
LIMIT 10;
Will return:
orderNumber |
orderDate |
requiredDate |
shippedDate |
status |
comments |
customerNumber |
|---|---|---|---|---|---|---|
10100 |
2003-01-06 |
2003-01-13 |
2003-01-10 |
Shipped |
null |
363 |
10101 |
2003-01-09 |
2003-01-18 |
2003-01-11 |
Shipped |
Check on availability. |
128 |
10102 |
2003-01-10 |
2003-01-18 |
2003-01-14 |
Shipped |
null |
181 |
10103 |
2003-01-29 |
2003-02-07 |
2003-02-02 |
Shipped |
null |
121 |
10104 |
2003-01-31 |
2003-02-09 |
2003-02-01 |
Shipped |
null |
141 |
10105 |
2003-02-11 |
2003-02-21 |
2003-02-12 |
Shipped |
null |
145 |
10106 |
2003-02-17 |
2003-02-24 |
2003-02-21 |
Shipped |
null |
278 |
10107 |
2003-02-24 |
2003-03-03 |
2003-02-26 |
Shipped |
Difficult to negotiate with customer. We need more marketing materials |
131 |
10108 |
2003-03-03 |
2003-03-12 |
2003-03-08 |
Shipped |
null |
385 |
10109 |
2003-03-10 |
2003-03-19 |
2003-03-11 |
Shipped |
Customer requested that FedEx Ground is used for this shipping |
486 |
Note that we use the LIMIT 10 clause to limit to 10 results.
We can for instance order the result by customerNumber as:
SELECT * FROM Orders
ORDER BY customerNumber
LIMIT 10;
Will return the values ordered by customer:
orderNumber |
orderDate |
requiredDate |
shippedDate |
status |
comments |
customerNumber |
|---|---|---|---|---|---|---|
10298 |
2004-09-27 |
2004-10-05 |
2004-10-01 |
Shipped |
null |
103 |
10123 |
2003-05-20 |
2003-05-29 |
2003-05-22 |
Shipped |
null |
103 |
10345 |
2004-11-25 |
2004-12-01 |
2004-11-26 |
Shipped |
null |
103 |
10346 |
2004-11-29 |
2004-12-05 |
2004-11-30 |
Shipped |
null |
112 |
10278 |
2004-08-06 |
2004-08-16 |
2004-08-09 |
Shipped |
null |
112 |
10124 |
2003-05-21 |
2003-05-29 |
2003-05-25 |
Shipped |
Customer very concerned about the exact color of the models. There is high risk that he may dispute the order because there is a slight color mismatch |
112 |
10120 |
2003-04-29 |
2003-05-08 |
2003-05-01 |
Shipped |
null |
114 |
10342 |
2004-11-24 |
2004-12-01 |
2004-11-29 |
Shipped |
null |
114 |
10347 |
2004-11-29 |
2004-12-07 |
2004-11-30 |
Shipped |
Can we deliver the new Ford Mustang models by end-of-quarter? |
114 |
10125 |
2003-05-21 |
2003-05-27 |
2003-05-24 |
Shipped |
null |
114 |
Note that by default, ORDER BY returns the results in ascending order from the lowest value to the highest value. We can reverse the order, applying a descending order with the keyword DESC:
SELECT * FROM Orders
ORDER BY customerNumber DESC;
orderNumber |
orderDate |
requiredDate |
shippedDate |
status |
comments |
customerNumber |
|---|---|---|---|---|---|---|
10138 |
2003-07-07 |
2003-07-16 |
2003-07-13 |
Shipped |
null |
496 |
10179 |
2003-11-11 |
2003-11-17 |
2003-11-13 |
Cancelled |
Customer cancelled due to urgent budgeting issues. Must be cautious when dealing with them in the future. Since order shipped already we must discuss who would cover the shipping charges. |
496 |
10360 |
2004-12-16 |
2004-12-22 |
2004-12-18 |
Shipped |
null |
496 |
10399 |
2005-04-01 |
2005-04-12 |
2005-04-03 |
Shipped |
null |
496 |
10243 |
2004-04-26 |
2004-05-03 |
2004-04-28 |
Shipped |
null |
495 |
10207 |
2003-12-09 |
2003-12-17 |
2003-12-11 |
Shipped |
Check on availability. |
495 |
10186 |
2003-11-14 |
2003-11-20 |
2003-11-18 |
Shipped |
They want to reevaluate their terms agreement with the VP of Sales |
489 |
10213 |
2004-01-22 |
2004-01-28 |
2004-01-27 |
Shipped |
Difficult to negotiate with customer. We need more marketing materials |
489 |
10149 |
2003-09-12 |
2003-09-18 |
2003-09-17 |
Shipped |
null |
487 |
10219 |
2004-02-10 |
2004-02-17 |
2004-02-12 |
Shipped |
null |
487 |
We can apply sorting in multiple columns simultaneously. Imagine we want to sort orders first by customerNumber, and then, for each customer, we want to see the orders by order date in descending order, so that we see most recent orders first:
SELECT * FROM Orders
ORDER BY customerNumber, orderDate DESC;
orderNumber |
orderDate |
requiredDate |
shippedDate |
status |
comments |
customerNumber |
|---|---|---|---|---|---|---|
10345 |
2004-11-25 |
2004-12-01 |
2004-11-26 |
Shipped |
null |
103 |
10298 |
2004-09-27 |
2004-10-05 |
2004-10-01 |
Shipped |
null |
103 |
10123 |
2003-05-20 |
2003-05-29 |
2003-05-22 |
Shipped |
null |
103 |
10346 |
2004-11-29 |
2004-12-05 |
2004-11-30 |
Shipped |
null |
112 |
10278 |
2004-08-06 |
2004-08-16 |
2004-08-09 |
Shipped |
null |
112 |
10124 |
2003-05-21 |
2003-05-29 |
2003-05-25 |
Shipped |
Customer very concerned about the exact color of the models. There is high risk that he may dispute the order because there is a slight color mismatch |
112 |
10347 |
2004-11-29 |
2004-12-07 |
2004-11-30 |
Shipped |
Can we deliver the new Ford Mustang models by end-of-quarter? |
114 |
10342 |
2004-11-24 |
2004-12-01 |
2004-11-29 |
Shipped |
null |
114 |
10223 |
2004-02-20 |
2004-02-29 |
2004-02-24 |
Shipped |
null |
114 |
10125 |
2003-05-21 |
2003-05-27 |
2003-05-24 |
Shipped |
null |
114 |
Aggregate functions¶
Aggregate functions are functions that operate on a set of values across different records. For instance, the function average returns the average of the value passed as argument. We can use this function in our query as with data type functions. Look at the example below:
SELECT AVG(amount)
FROM payments WHERE customerNumber = 103;
Will return:
|-------------+
| avg(amount) |
|-------------+
| 7438.120000 |
|-------------+
We have used a select query with the AVG function to calculate the average of the payment amount, and the result is the average amount paid among all payments registered in the table. Each database engine provides a different set of aggregate functions. The table below is extracted from the official MySQL documentation and describes some of the most relevant aggregate functions available:
Name |
Description |
|---|---|
AVG() |
Return the average value of the argument |
COUNT() |
Return a count of the number of rows returned |
GROUP_CONCAT() |
Return a concatenated string |
MAX() |
Return the maximum value |
MIN() |
Return the minimum value |
STD() |
Return the population standard deviation |
SUM() |
Return the sum |
VARIANCE() |
Return the population standard variance |
Normally, aggregate functions ignore NULL values.
Note that most of the functions provide statistical properties of the values passed as parameters, and consequently, they make sense primarily for numeric arguments.
For temporal values such as datetime or timestamp, MAX and MIN provide the minimum and maximum date values. The SUM() and AVG() aggregate functions do not work with temporal values, and you should use other data type functions to convert to a numeric value first. Examples:
# Calculate the sum of seconds in a timestamp column
SELECT SEC_TO_TIME(SUM(TIME_TO_SEC(time_col))) FROM tbl_name;
# Calculate the sum of days in a datetime column
SELECT FROM_DAYS(SUM(TO_DAYS(date_col))) FROM tbl_name;
AVG([DISTINCT] expr)¶
Returns the average value of expr. The DISTINCT option can be used to return the average of the distinct values of expr.
If there are no matching rows, AVG() returns NULL. Example, calculate the average payment amount across all payments:
SELECT AVG(amount)
FROM payments
COUNT(expr)¶
Returns a count of the number of non-NULL values of expr in the rows retrieved by a SELECT statement. The result is a BIGINT value.
If there are no matching rows, COUNT() returns 0.
SELECT COUNT(*)
FROM student
COUNT(*) is somewhat different in that it returns a count of the number of rows retrieved, whether or not they contain NULL values.
COUNT(DISTINCT) on the other hand returns a count of the number of rows with different non-NULL expr values. If there are no matching rows, COUNT(DISTINCT) returns 0.
GROUP_CONCAT()¶
This function returns a string result with the concatenated non-NULL values from a group. It returns NULL if there are no non-NULL values. The full syntax is as follows:
GROUP_CONCAT([DISTINCT] expr [,expr ...]
[ORDER BY {unsigned_integer | col_name | expr}
[ASC | DESC] [,col_name ...]]
[SEPARATOR str_val])
For instance,
SELECT
GROUP_CONCAT(comments SEPARATOR ';')
FROM Orders;
As shown in the full syntax, duplicated values can be eliminated using the DISTINCT clause. To sort values in the result, use the ORDER BY clause. To sort in reverse order, add the DESC (descending) keyword to the name of the column you are sorting by in the ORDER BY clause. The default is ascending order; this may be specified explicitly using the ASC keyword. The default separator between values in a group is comma (,). To specify a separator explicitly, use SEPARATOR followed by the string literal value that should be inserted between group values. To eliminate the separator altogether, specify SEPARATOR ‘’.
MAX()¶
Returns the maximum value of expr.
If there are no matching rows, MAX() returns NULL.
SELECT MIN(amount), MAX(amount)
FROM Payments;
MIN()¶
Returns the minimum value of expr.
If there are no matching rows, MIN() returns NULL.
SELECT MIN(amount), MAX(amount)
FROM Payments;
STD()¶
Returns the population standard deviation of expr. STD() is a synonym for the standard SQL function STDDEV_POP(), provided as a MySQL extension.
If there are no matching rows, STD() returns NULL.
SELECT MIN(amount), MAX(amount), STD(amount)
FROM Payments;
SUM()¶
Returns the sum of expr. If the return set has no rows, SUM() returns NULL. The DISTINCT keyword can be used to sum only the distinct values of expr.
If there are no matching rows, SUM() returns NULL.
VARIANCE()¶
Returns the population standard variance of expr. VARIANCE() is a synonym for the standard SQL function VAR_POP(), provided as a MySQL extension.
If there are no matching rows, VARIANCE() returns NULL.
GROUP BY¶
The GROUP BY clause groups rows according to the provided conditions to group the result of the aggregate functions to group the results into subsets. For instance, in the example below, we calculated the average amount across all payments, but imagine that we are interested in knowing the average amount per customer. Without grouping the query:
SELECT
customerNumber,
amount
FROM payments
ORDER BY customerNumber LIMIT 10;
Returns:
Customer Number |
amount |
|---|---|
103 |
6066.78 |
103 |
14571.44 |
103 |
1676.14 |
112 |
14191.12 |
112 |
32641.98 |
112 |
33347.88 |
114 |
45864.03 |
114 |
82261.22 |
114 |
7565.08 |
114 |
44894.74 |
With grouping, we can create groups and for each group, the SQL engine will perform the aggregate functions, showing a different record for each group. For instance, imagine we would like to get the average payment amount, the total number of payments, and the maximum paid amount for every customer. We could group the records of payments by customer, and then perform the grouping operations, as illustrated in the image below:

To implement this pipeline, we could modify the clause above using GROUP BY as:
SELECT
customerNumber,
avg(amount) AS average,
count(*) AS total,
max(amount) AS max
FROM payments
GROUP BY customerNumber
ORDER BY customerNumber LIMIT 10;
Which returns the following result:
customerNumber |
average |
total |
max |
|---|---|---|---|
103 |
7438.120000 |
3 |
14571.44 |
112 |
26726.993333 |
3 |
33347.88 |
114 |
45146.267500 |
4 |
82261.22 |
119 |
38983.226667 |
3 |
49523.67 |
121 |
26056.197500 |
4 |
50218.95 |
124 |
64909.804444 |
9 |
111654.40 |
128 |
18984.440000 |
4 |
33820.62 |
129 |
22236.853333 |
3 |
26248.78 |
131 |
35879.980000 |
3 |
50025.35 |
141 |
55056.844615 |
13 |
120166.58 |
With this query, we have gained insights on the statistics of payments per customer, calculating the average, the total number of payments and the maximum amount.
Note that in combination with GROUP BY we need to first introduce the GROUP BY clause and then the ORDER BY clause, otherwise, the engine will raise an error.
We can group by more than one column, for instance, imagine we would like to know how many employees of each title are there in each country. We could use the following query:
SELECT
o.country,
e.jobTitle,
count(*) as total
FROM employees e
INNER JOIN offices o ON (e.officeCode = o.officeCode)
GROUP BY country, jobTitle;
Will return:
country |
jobTitle |
total |
|---|---|---|
USA |
President |
1 |
USA |
VP Sales |
1 |
USA |
VP Marketing |
1 |
USA |
Sales Manager (NA) |
1 |
USA |
Sales Rep |
6 |
France |
Sale Manager (EMEA) |
1 |
France |
Sales Rep |
4 |
Japan |
Sales Rep |
2 |
Australia |
Sales Manager (APAC) |
1 |
Australia |
Sales Rep |
3 |
UK |
Sales Rep |
2 |
Note that we have used an INNER JOIN clause to join the employees and the offices tables and then use A GROUP BY clause with the columns country and jobTitle to obtain the expected result.
HAVING¶
The HAVING clause is very similar to the WHERE house, in the sense that it is used to filter out results. The main difference is that WHERE cannot work on calculated aggregated functions, that is why HAVING is introduced. For instance:
SELECT
customerNumber,
avg(amount) AS average,
count(*) AS total,
max(amount) AS max
FROM payments
GROUP BY customerNumber
HAVING total > 5
ORDER BY customerNumber LIMIT 10;
Note that we introduced the HAVING clause to filter customers that have not made at least 5 payments. Yhe query will result in the following:
customerNumber |
average |
total |
max |
|---|---|---|---|
124 |
64909.804444 |
9 |
111654.40 |
141 |
55056.844615 |
13 |
120166.58 |