Last year I wrote about recursion. It was a concept that baffled me. How was it keeping track of all of those return values? I learned what a stack was, and how the calls hung out in the stack until they were ready to be used/popped off/returned…pick whatever verb you like. However, that post wasn’t my ultimate goal.
My ultimate goal was to figure out whas was going on behind the scenes in a merge sort algorithm. I had been introduced to this algorithm in a Coursera course I took last October and though the course didn’t require us to implement it, they discussed it in detail and I was lost.
I don’t mind getting lost, but I do mind staying lost, and unfortunately these past few months I just haven’t made the time to sit down and learn this properly. I tried half-heartedly at the first of the year, but I found something else that piqued my interest and this topic was pushed aside.
Well no more! A coworker has been working feverishly on algorithms and data structures the last couple of months and it inspired me to give this a try, so here goes….
Merging two items seems straightforward. I have a set of numbers here, I have another set there and I want to merge them into one set (I’ll use the word array from here on out, but in non-programmer/math parlance, it means the same; it is a collection of something).
When first introduced to the merge-sort algorithm, this was the second part taught, which when I thought about it later, seems backwards. This function does a great deal of work, and it’s explicit. Reading the code, you can easily identify what it’s doing. While it is a function called at the very end of the mergeSort() function, without it, nothing would happen outside of a stack overflow. So to help me explain it to myself, I prefer to look at this function first.
Here is what goes on in the merge function:
- It takes two arrays of values (we’ll use numbers for our examples)
- It will merge those two arrays into one array, ordered from smallest number to largest
- It will run in O(n) time as it has to look at every value in the two arrays
- It will return an ordered array at the end
That’s pretty much it, though step 2 is quite involved. We need to evaluate a number in each array and compare them to each other. Whichever number is smallest, we will add that number to a new sorted array. Step 2 needs to handle a case for if the number in array1 is smaller than the number in array2, a case for if the number in array2 is smaller than the number in array1, and a case for if both numbers are equal.
So how do we keep track of two numbers at the same time? How do we know the when the comparison is done? Well, if we’re not exactly sure when something will end, a while loop makes sense; we just need a terminating condition so the loop actually stops.
To keep track of which value we are at in both arrays, we set index values. This also serves as our terminating condition for the while loop. The first while loop is primarily where most of the work is happening. The two remaining while loops handle cases where the arrays are of differing lengths and there is no more comparisons needed to be made to the two arrays; they simply just append all the remaining values to the sorted array.
So all of this happens when two arrays are passed into it. However, the arrays that are passed in need to be in sorted order. By that, I mean we can have an array such as [2, 4, 9] and [3, 5, 7] and the merge function will work just fine, but we can’t have something like [2, 1, 0] and [7, 5, 8], because really, all the merge function does is just swap out values at indexes for two different arrays. The number 2 in the first array is never compared with the second number, 1, in the same array. It’s only evaluated against the numbers in the other array. So how do we fix this?
The Merge Sort
What if the leftPile array and rightPile array that is required by the merge() function only had one value each in it? The arrays wouldn’t have to worry about values within their own array, and the comparison made against each other would work fine. I mean merge() could handle  &  without in issue and add those numbers into the orderedArray pile without problem. So if we divide up an original array of say [2, 1, 5, 4, 9] where merge only had to compare two arrays with one number each, the merge() function would work without worry. So how do we divide our single array into a bunch of small arrays with only one value each? Recursion
And this is where I struggled mightily to understand what was happening. I knew what the program was designed to do. Break one large array into a bunch of single value arrays, but I didn’t understand how it was return values from our orderedArray in the merge function all the way back to original function. It was magic and my brain was stumped; which led to the study of how recursion was working on just a small level such as the Fibonacci sequence. From there and learning about the call stack, I finally discovered what was happening. With merge sort, the same thing is going on but because it’s still all so new to me, it took me awhile to understand.
In order to explain, let me show the code that is actually called to run merge sort.
- Line 38 is our terminating condition. Since this function calls itself, it needs someway to know when to stop. If the array only has one element in it, return that array and end the recursive call.
- Line 39 split the array in ½. This forms our left array and right array as seen on lines 41 and 42
- Line 44 returns the merged, sorted array
Deceptively simple right? Well, yeah, it was to me. I immediately looked at the code and thought, okay, the leftArray = [2, 1] and the rightArray = [5, 4, 9]. But I thought for merge() to work, the arrays have to already be in sorted order? Yeah, I was confused.
As it turns out, we need leftArray to be =  and rightArray to =  for our merge to work–and it does. The recursive call is doing exactly that. Allow me to explain.
leftArray continues to call the mergeSort function until it eventually receives a value back. If there are too many recursive calls, the program will crash (stack overflow), but the picture above basically represents the call stack. Red arrows denote that the leftArray (steps 1 & 2), or in one case, the rightArray (step 4) has not finished evaluating yet and makes another call on itself. The green arrows represent a value returned back to leftArray (steps 3 & 6) or rightArray (step 5). NOTE: this picture only represents finding a value for leftArray in the intial call on the left. We haven’t even started evaluating the initial rightArray variable. (I’d encourage you to do this on paper just to walk through the process : ) )
Nested call, continued call, whatever you want to call it, this is what was lost on me. I could not visualize it in my head. Only when I sat down and started sketching it out on paper, did it start to make sense. The biggest takeaway for me is: in the initial call, the rightArray never even gets evaluated until the leftArray finally has a value returned to it from the recursive calls, and that value returned is in sorted order. It may take 2 recursive calls, or it may take 15, but lines 42 and 44 in the code snippet above remain waiting to be called until line 41 finally has a value.
If none of this makes sense, I’d invite you to read my initial post on recursion here. I think the only way I’ve been able to keep track is to write it on paper and tack it up on a wall so I can follow the recursion trail. Keep in mind that I’ve been trying to make sense of this off and on for a few months, and I’m sure as soon as someone asks me a question, I will still probably mess up the answer! But, a lesson I learned years ago in college struggling through sequences and series in Calc 2 is this: there is an answer, I can find it, but it might take awhile. Don’t measure others pace against your own.
For those more advanced in this than I, I did stick with a top-down approach for merge sort. I know a bottom-up exists, but I want to make sure I have a thorough understanding of this implementation first before moving on.