Joins: Inner vs Outer, etc.

18 07 2010

As a PHP Developer, there is one interview question you can usually count on which is actually a database-related question. The prospective employer will ask that you explain the difference between an inner join and an outer join.

If you want to properly harness the power of a server-side language, you need to know something about data storage and typically a relational database is where you will put the data. So, as a developer you need to know how to efficiently access the stored data and hence the need to know something about SQL, the acronym for structured query language, and joins, particularly the merits of inner vs. outer joins.

If one is going to discuss types of joins, it is important that we define what a join is, namely data that is the result of temporarily joining one or more tables in a database (yes, you can join a table with itself!) The way we create a join is by issuing an SQL query. SQL has different variants and since I’m most familiar with SQL supported by MySQL that’s what the example queries will use.

Here is our schema:


CREATE TABLE IF NOT EXISTS `grp_table` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) NOT NULL,
`rank` varchar(255) NOT NULL,
`serialno` text NOT NULL,
`score` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 COMMENT='for show' AUTO_INCREMENT=9 ;
CREATE TABLE IF NOT EXISTS `grp_detail_table` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`serialno` text NOT NULL,
`age` tinyint(3) NOT NULL,
`income` mediumint(6) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 COMMENT='for show' AUTO_INCREMENT=13 ;

Here’s some queries to populate each table:

INSERT INTO `grp_table` (`id`, `name`, `rank`, `serialno`, `score`) VALUES
(1, 'Sharon ', 'Captain', 'a823bc457', 60),
(2, 'George', 'Soldier', 'b823bc457', 50),
(3, 'Shannon', 'Captain', 'a12345789', 20),
(4, 'Bill', 'Soldier', 'b12345789', 30),
(5, 'Sam', 'General', 'z1234abc', 40),
(6, 'Sandra', 'General', 'z2234abc', 80),
(7, 'Annie', 'Soldier', 'b123456zyz', 94),
(8, 'Doris', 'General', 'c1334455', 77);
INSERT INTO `grp_detail_table` (`id`, `serialno`, `age`, `income`) VALUES
(9, 'a823bc457', 22, 52000),
(10, 'b823bc457', 32, 70000),
(11, 'a12345789', 45, 27000),
(12, 'z2234abc', 19, 16000),
(13, 'aabb112233', 18, 25000);

Note that the last record in grp_detail_table has a serial number that is non-existent in grp_table. An intentional typo that should produce some interesting results from which we can learn.

With respect to the notion I raise above of a table being joined to itself, here’s one such query we might use if we wanted to see all the captains who had a higher score than a general in our sample:

select a.name,a.rank,a.score,b.name,b.rank,b.score from grp_table a
inner join grp_table b
ON a.rank="Captain" and b.rank="General"
WHERE a.score > b.score

Result:

name rank score name rank score
Sharon Captain 60 Sam General 40

There are two primary types of joins, inner and outer. What does this mean? Each type involves joining tables with the difference between the two being dependent on whether the conditions of the query are met. An inner join results when the columns or fields from both tables exist and satisfy the query. For example consider the following query:

SELECT *
FROM grp_table, grp_detail_table
WHERE grp_table.serialno = grp_detail_table.serialno
LIMIT 0 , 30

We could also rewrite the query to make it more explicit as an inner join, as follows:

SELECT *
FROM grp_table
INNER JOIN grp_detail_table
ON grp_table.serialno = grp_detail_table.serialno
LIMIT 0 , 30

With either syntax above, the result is the same and is the following:

id name rank serialno score id serialno age income
1 Sharon Captain a823bc457 60 9 a823bc457 22 52000
2 George Soldier b823bc457 50 10 b823bc457 32 70000
3 Shannon Captain a12345789 20 11 a12345789 45 27000
6 Sandra General z2234abc 80 12 z2234abc 19 16000

Note that only the records in each table that matched the conditions of the query are returned.

There’s another slightly different join we could have done to obtain the same results using a STRAIGHT_JOIN, as follows:

SELECT *
FROM grp_table
STRAIGHT_JOIN grp_detail_table
ON grp_table.serialno = grp_detail_table.serialno
LIMIT 0 , 30

Since the results are the same as the above queries, you may well wonder why you would want to consider using a STRAIGHT_JOIN. It basically assures that the table on the left is read first before the table on the right. This can be useful if you are using an optimization technique where there might be an issue regarding which table is read first (see: http://dev.mysql.com/doc/refman/5.0/en/join.html.)

The general rule is to use the “ON” to determine the dataset and the “WHERE” to select the specific records from that set as follows:

SELECT *
FROM grp_table
inner JOIN grp_detail_table
ON grp_table.serialno = grp_detail_table.serialno
where grp_table.rank="Captain"
LIMIT 0 , 30

Result:
id name rank serialno score id serialno age income
1 Sharon Captain a823bc457 60 9 a823bc457 22 52000
3 Shannon Captain a12345789 20 11 a12345789 45 27000

Let’s now take a look at the outer join which comes in two flavors: left and right. How does this differ from an inner join? An outer join is potentially an everything-nothing result effectively. In an outer join, all the data from one table is made available, while the other table will return NULL values for the unmatched fields. Here’s a query that makes use of a left outer join:

SELECT *
FROM grp_table
left outer JOIN grp_detail_table
ON grp_table.serialno = grp_detail_table.serialno

And, you could also write:

SELECT *
FROM grp_table
left JOIN grp_detail_table
ON grp_table.serialno = grp_detail_table.serialno

Either way the result is:

id name rank serialno score id serialno age income
1 Sharon Captain a823bc457 60 9 a823bc457 22 52000
2 George Soldier b823bc457 50 10 b823bc457 32 70000
3 Shannon Captain a12345789 20 11 a12345789 45 27000
4 Bill Soldier b12345789 30 NULL NULL NULL NULL
5 Sam General z1234abc 40 NULL NULL NULL NULL
6 Sandra General z2234abc 80 12 z2234abc 19 16000
7 Annie Soldier b123456zyz 94 NULL NULL NULL NULL
8 Doris General c1334455 77 NULL NULL NULL NULL

You also do similarly with a RIGHT OUTER Join as follows:

SELECT *
FROM `grp_table`
RIGHT JOIN grp_detail_table ON grp_table.serialno = grp_detail_table.serialno
LIMIT 0 , 30

Result:
id name rank serialno score id serialno age income
1 Sharon Captain a823bc457 60 9 a823bc457 22 52000
2 George Soldier b823bc457 50 10 b823bc457 32 70000
3 Shannon Captain a12345789 20 11 a12345789 45 27000
6 Sandra General z2234abc 80 12 z2234abc 19 16000
NULL NULL NULL NULL NULL 13 aabb112233 18 25000

The use of RIGHT Outer Joins is discouraged owing to portability issues (see http://dev.mysql.com/doc/refman/5.0/en/join.html

There’s another type of related outer join that you may have heard of called a FULL OUTER JOIN which apparently is still unsupported by MySQL. But you can try to emulate it using UNION. Now if that term brings back memories of Algebra, there is a good reason for that. SQL is more about understanding SET Theory (see http://forum.codecall.net/tutorials/14325-using-set-theory-understand-sql-joins.html. A good tutorial that visually demonstrates using set theory in SQL is http://www.codinghorror.com/blog/2007/10/a-visual-explanation-of-sql-joins.html. As mentioned previously, you can emulate a FULL OUTER JOIN (See http://stackoverflow.com/questions/38549/sql-difference-between-inner-and-outer-join
The query below will show the union of both tables and indicates what data the tables have in common as well as what the differences are between the two tables.

SELECT *
FROM grp_table
LEFT JOIN grp_detail_table ON grp_table.serialno = grp_detail_table.serialno
UNION
SELECT *
FROM grp_table
RIGHT JOIN grp_detail_table ON grp_table.serialno = grp_detail_table.serialno
LIMIT 0 , 30

Result:
id name rank serialno score id serialno age income
1 Sharon Captain a823bc457 60 9 a823bc457 22 52000
2 George Soldier b823bc457 50 10 b823bc457 32 70000
3 Shannon Captain a12345789 20 11 a12345789 45 27000
4 Bill Soldier b12345789 30 NULL NULL NULL NULL
5 Sam General z1234abc 40 NULL NULL NULL NULL
6 Sandra General z2234abc 80 12 z2234abc 19 16000
7 Annie Soldier b123456zyz 94 NULL NULL NULL NULL
8 Doris General c1334455 77 NULL NULL NULL NULL
NULL NULL NULL NULL NULL 13 aabb112233 18 25000

This work is licensed under a Creative Commons License
.

Advertisements

Actions

Information

2 responses

18 07 2010
bigmumet

nice info.. 😀

can i ask??
why NULL is happen in left and right join in that case, ??

20 07 2010
Sharon Lee Levy

bigmumet,

Thanks for reading my blog and glad that you like the info. NULLs will happen in a left or right join when the second table has non-matching records. So, in the left join, the second table only has some of the serialno column data of the first table. While all the records of the first table display, the information missing in the second table causes the resulting table to contain NULL values.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




%d bloggers like this: