RSS Feed

Difference between Merge Join and Hash Join

Difference between Merge Join and Hash Join


Merge Join Hash Join
Both tables must be sorted on the merge column in order for the join to work. The optimiser usually chooses a merge join when working with large data sets that are already sorted on the join columns. Hash join works well with tables that are not sorted on the join columns(s). The optimiser typically opts for a hash join when dealing with large inputs and when no index exists to join them or an index exists but is unusable.
The optimizer can perform a merge join operation for every type of relational join operation except CROSS JOIN and FULL JOIN. Merge operations can also be used to UNION tables together (since they must be sorted to eliminate duplicates). Hash join operations can service every type of relational join (including UNION and DIFFERENCE operations) except CROSS JOINs.
Merge join requires that both inputs be in memory (as opposed to a hash join, where only the smaller input is in memory).
Hash join has a substantial advantage over merge join only. In fact, a query processor using only merge join for large inputs is 40% slower than one using hash join (30 vs. 42). Thus, if there are very few indexes in the database, or if the existing indexes don’t serve a query very well, hash join has substantial value. In a way, this is not surprising, given that the hash table in a hash join is nothing but an in-memory on-demand index.
A merge join uses memory and a bit of tempdb to do its ordered comparisons. A hash join uses memory and tempdb to build out the hash tales for the join.
Usual size of joining tables is large. Usual size of joining tables can be any.
Join clause used here is equijoin. Join clause used here is equijoin.


Sources:
The Guru's Guide to SQL Server Architecture and Internals by Ken Henderso.
The Value of Merge-Join and Hash-Join in SQL Server.
SQL Server 2008 Query Performance Tuning Distilled by Sajal Dam, Grant Fritchey.