Sorting rows

Try me

If you want to try this tutorial, you can use the Classic model database. This is a sample database provided by MySQL Tutorial. You can also download it from here.

The classic model is represented below:

classic model

The classic model is a sample database that represents a classic car company. The database contains 8 tables:

  • Customers: stores customer’s data.

  • Employees: stores employee’s data.

  • Offices: stores sales office data.

  • OrderDetails: stores sales order line items for each sales order (i.e. each product sold within a sales order).

  • Orders: stores sales orders placed by customers.

  • Payments: stores payments made by customers based on their accounts.

  • ProductLines: stores product line data. A product line is like a type of product. Each product line can have various products.

  • Products: stores product data.

ORDER BY

Sometimes we want to control the order in which the results are displayed. It is often easier to explore data and analyse the results when data is sorted in a meaning way. This is when ORDER BY clause comes into play.

The syntax of the ORDER BY clause is:

SELECT column1, column2, ...
FROM table_name
ORDER BY column1 ASC|DESC, column2 ASC|DESC, ... ASC|DESC

That is, we basically specify the columns in which we want to order the results and specify if we want the results in ascending or descending order.

Let us see some examples extracted from the classic_model database. This is the first ten orders:

SELECT * FROM Orders
LIMIT 10;

Will return:

orderNumber

orderDate

requiredDate

shippedDate

status

comments

customerNumber

10100

2003-01-06

2003-01-13

2003-01-10

Shipped

null

363

10101

2003-01-09

2003-01-18

2003-01-11

Shipped

Check on availability.

128

10102

2003-01-10

2003-01-18

2003-01-14

Shipped

null

181

10103

2003-01-29

2003-02-07

2003-02-02

Shipped

null

121

10104

2003-01-31

2003-02-09

2003-02-01

Shipped

null

141

10105

2003-02-11

2003-02-21

2003-02-12

Shipped

null

145

10106

2003-02-17

2003-02-24

2003-02-21

Shipped

null

278

10107

2003-02-24

2003-03-03

2003-02-26

Shipped

Difficult to negotiate with customer. We need more marketing materials

131

10108

2003-03-03

2003-03-12

2003-03-08

Shipped

null

385

10109

2003-03-10

2003-03-19

2003-03-11

Shipped

Customer requested that FedEx Ground is used for this shipping

486

Note that we use the LIMIT 10 clause to limit to 10 results.

We can for instance order the result by customerNumber as:

SELECT * FROM Orders
ORDER BY customerNumber
LIMIT 10;

Will return the values ordered by customer:

orderNumber

orderDate

requiredDate

shippedDate

status

comments

customerNumber

10298

2004-09-27

2004-10-05

2004-10-01

Shipped

null

103

10123

2003-05-20

2003-05-29

2003-05-22

Shipped

null

103

10345

2004-11-25

2004-12-01

2004-11-26

Shipped

null

103

10346

2004-11-29

2004-12-05

2004-11-30

Shipped

null

112

10278

2004-08-06

2004-08-16

2004-08-09

Shipped

null

112

10124

2003-05-21

2003-05-29

2003-05-25

Shipped

Customer very concerned about the exact color of the models. There is high risk that he may dispute the order because there is a slight color mismatch

112

10120

2003-04-29

2003-05-08

2003-05-01

Shipped

null

114

10342

2004-11-24

2004-12-01

2004-11-29

Shipped

null

114

10347

2004-11-29

2004-12-07

2004-11-30

Shipped

Can we deliver the new Ford Mustang models by end-of-quarter?

114

10125

2003-05-21

2003-05-27

2003-05-24

Shipped

null

114

Note that by default, ORDER BY returns the results in ascending order from the lowest value to the highest value. We can reverse the order, applying a descending order with the keyword DESC:

SELECT * FROM Orders
ORDER BY customerNumber DESC;

orderNumber

orderDate

requiredDate

shippedDate

status

comments

customerNumber

10138

2003-07-07

2003-07-16

2003-07-13

Shipped

null

496

10179

2003-11-11

2003-11-17

2003-11-13

Cancelled

Customer cancelled due to urgent budgeting issues. Must be cautious when dealing with them in the future. Since order shipped already we must discuss who would cover the shipping charges.

496

10360

2004-12-16

2004-12-22

2004-12-18

Shipped

null

496

10399

2005-04-01

2005-04-12

2005-04-03

Shipped

null

496

10243

2004-04-26

2004-05-03

2004-04-28

Shipped

null

495

10207

2003-12-09

2003-12-17

2003-12-11

Shipped

Check on availability.

495

10186

2003-11-14

2003-11-20

2003-11-18

Shipped

They want to reevaluate their terms agreement with the VP of Sales

489

10213

2004-01-22

2004-01-28

2004-01-27

Shipped

Difficult to negotiate with customer. We need more marketing materials

489

10149

2003-09-12

2003-09-18

2003-09-17

Shipped

null

487

10219

2004-02-10

2004-02-17

2004-02-12

Shipped

null

487

We can apply sorting in multiple columns simultaneously. Imagine we want to sort orders first by customerNumber, and then, for each customer, we want to see the orders by order date in descending order, so that we see most recent orders first:

SELECT * FROM Orders
ORDER BY customerNumber, orderDate DESC;

orderNumber

orderDate

requiredDate

shippedDate

status

comments

customerNumber

10345

2004-11-25

2004-12-01

2004-11-26

Shipped

null

103

10298

2004-09-27

2004-10-05

2004-10-01

Shipped

null

103

10123

2003-05-20

2003-05-29

2003-05-22

Shipped

null

103

10346

2004-11-29

2004-12-05

2004-11-30

Shipped

null

112

10278

2004-08-06

2004-08-16

2004-08-09

Shipped

null

112

10124

2003-05-21

2003-05-29

2003-05-25

Shipped

Customer very concerned about the exact color of the models. There is high risk that he may dispute the order because there is a slight color mismatch

112

10347

2004-11-29

2004-12-07

2004-11-30

Shipped

Can we deliver the new Ford Mustang models by end-of-quarter?

114

10342

2004-11-24

2004-12-01

2004-11-29

Shipped

null

114

10223

2004-02-20

2004-02-29

2004-02-24

Shipped

null

114

10125

2003-05-21

2003-05-27

2003-05-24

Shipped

null

114

Aggregate functions

Aggregate functions are functions that operate on a set of values across different records. For instance, the function average returns the average of the value passed as argument. We can use this function in our query as with data type functions. Look at the example below:

SELECT AVG(amount)
FROM payments WHERE customerNumber = 103;

Will return:

|-------------+
| avg(amount) |
|-------------+
| 7438.120000 |
|-------------+

We have used a select query with the AVG function to calculate the average of the payment amount, and the result is the average amount paid among all payments registered in the table. Each database engine provides a different set of aggregate functions. The table below is extracted from the official MySQL documentation and describes some of the most relevant aggregate functions available:

Name

Description

AVG()

Return the average value of the argument

COUNT()

Return a count of the number of rows returned

GROUP_CONCAT()

Return a concatenated string

MAX()

Return the maximum value

MIN()

Return the minimum value

STD()

Return the population standard deviation

SUM()

Return the sum

VARIANCE()

Return the population standard variance

Normally, aggregate functions ignore NULL values.

Note that most of the functions provide statistical properties of the values passed as parameters, and consequently, they make sense primarily for numeric arguments.

For temporal values such as datetime or timestamp, MAX and MIN provide the minimum and maximum date values. The SUM() and AVG() aggregate functions do not work with temporal values, and you should use other data type functions to convert to a numeric value first. Examples:

# Calculate the sum of seconds in a timestamp column
SELECT SEC_TO_TIME(SUM(TIME_TO_SEC(time_col))) FROM tbl_name;
# Calculate the sum of days in a datetime column
SELECT FROM_DAYS(SUM(TO_DAYS(date_col))) FROM tbl_name;

AVG([DISTINCT] expr)

Returns the average value of expr. The DISTINCT option can be used to return the average of the distinct values of expr.

If there are no matching rows, AVG() returns NULL. Example, calculate the average payment amount across all payments:

SELECT AVG(amount)
    FROM payments

COUNT(expr)

Returns a count of the number of non-NULL values of expr in the rows retrieved by a SELECT statement. The result is a BIGINT value.

If there are no matching rows, COUNT() returns 0.

SELECT COUNT(*)
    FROM student

COUNT(*) is somewhat different in that it returns a count of the number of rows retrieved, whether or not they contain NULL values.

COUNT(DISTINCT) on the other hand returns a count of the number of rows with different non-NULL expr values. If there are no matching rows, COUNT(DISTINCT) returns 0.

GROUP_CONCAT()

This function returns a string result with the concatenated non-NULL values from a group. It returns NULL if there are no non-NULL values. The full syntax is as follows:

GROUP_CONCAT([DISTINCT] expr [,expr ...]
             [ORDER BY {unsigned_integer | col_name | expr}
                 [ASC | DESC] [,col_name ...]]
             [SEPARATOR str_val])

For instance,

SELECT
    GROUP_CONCAT(comments SEPARATOR ';')
    FROM Orders;

As shown in the full syntax, duplicated values can be eliminated using the DISTINCT clause. To sort values in the result, use the ORDER BY clause. To sort in reverse order, add the DESC (descending) keyword to the name of the column you are sorting by in the ORDER BY clause. The default is ascending order; this may be specified explicitly using the ASC keyword. The default separator between values in a group is comma (,). To specify a separator explicitly, use SEPARATOR followed by the string literal value that should be inserted between group values. To eliminate the separator altogether, specify SEPARATOR ‘’.

MAX()

Returns the maximum value of expr.

If there are no matching rows, MAX() returns NULL.

SELECT MIN(amount), MAX(amount)
       FROM Payments;

MIN()

Returns the minimum value of expr.

If there are no matching rows, MIN() returns NULL.

SELECT MIN(amount), MAX(amount)
       FROM Payments;

STD()

Returns the population standard deviation of expr. STD() is a synonym for the standard SQL function STDDEV_POP(), provided as a MySQL extension.

If there are no matching rows, STD() returns NULL.

SELECT MIN(amount), MAX(amount), STD(amount)
       FROM Payments;

SUM()

Returns the sum of expr. If the return set has no rows, SUM() returns NULL. The DISTINCT keyword can be used to sum only the distinct values of expr.

If there are no matching rows, SUM() returns NULL.

VARIANCE()

Returns the population standard variance of expr. VARIANCE() is a synonym for the standard SQL function VAR_POP(), provided as a MySQL extension.

If there are no matching rows, VARIANCE() returns NULL.

GROUP BY

The GROUP BY clause groups rows according to the provided conditions to group the result of the aggregate functions to group the results into subsets. For instance, in the example below, we calculated the average amount across all payments, but imagine that we are interested in knowing the average amount per customer. Without grouping the query:

SELECT
    customerNumber,
    amount
FROM payments
ORDER BY customerNumber LIMIT 10;

Returns:

Customer Number

amount

103

6066.78

103

14571.44

103

1676.14

112

14191.12

112

32641.98

112

33347.88

114

45864.03

114

82261.22

114

7565.08

114

44894.74

With grouping, we can create groups and for each group, the SQL engine will perform the aggregate functions, showing a different record for each group. For instance, imagine we would like to get the average payment amount, the total number of payments, and the maximum paid amount for every customer. We could group the records of payments by customer, and then perform the grouping operations, as illustrated in the image below:

grouping

To implement this pipeline, we could modify the clause above using GROUP BY as:

SELECT
    customerNumber,
    avg(amount) AS average,
    count(*) AS total,
    max(amount) AS max
FROM payments
GROUP BY customerNumber
ORDER BY customerNumber LIMIT 10;

Which returns the following result:

customerNumber

average

total

max

103

7438.120000

3

14571.44

112

26726.993333

3

33347.88

114

45146.267500

4

82261.22

119

38983.226667

3

49523.67

121

26056.197500

4

50218.95

124

64909.804444

9

111654.40

128

18984.440000

4

33820.62

129

22236.853333

3

26248.78

131

35879.980000

3

50025.35

141

55056.844615

13

120166.58

With this query, we have gained insights on the statistics of payments per customer, calculating the average, the total number of payments and the maximum amount.

Note that in combination with GROUP BY we need to first introduce the GROUP BY clause and then the ORDER BY clause, otherwise, the engine will raise an error.

We can group by more than one column, for instance, imagine we would like to know how many employees of each title are there in each country. We could use the following query:

SELECT
    o.country,
    e.jobTitle,
    count(*) as total
FROM employees e
INNER JOIN offices o ON (e.officeCode = o.officeCode)
GROUP BY country, jobTitle;

Will return:

country

jobTitle

total

USA

President

1

USA

VP Sales

1

USA

VP Marketing

1

USA

Sales Manager (NA)

1

USA

Sales Rep

6

France

Sale Manager (EMEA)

1

France

Sales Rep

4

Japan

Sales Rep

2

Australia

Sales Manager (APAC)

1

Australia

Sales Rep

3

UK

Sales Rep

2

Note that we have used an INNER JOIN clause to join the employees and the offices tables and then use A GROUP BY clause with the columns country and jobTitle to obtain the expected result.

HAVING

The HAVING clause is very similar to the WHERE house, in the sense that it is used to filter out results. The main difference is that WHERE cannot work on calculated aggregated functions, that is why HAVING is introduced. For instance:

SELECT
    customerNumber,
    avg(amount) AS average,
    count(*) AS total,
    max(amount) AS max
FROM payments
GROUP BY customerNumber
HAVING total > 5
ORDER BY customerNumber LIMIT 10;

Note that we introduced the HAVING clause to filter customers that have not made at least 5 payments. Yhe query will result in the following:

customerNumber

average

total

max

124

64909.804444

9

111654.40

141

55056.844615

13

120166.58