Difficulty: Hard
Category: Arrays
Step 1: Grasping the Problem
This problem is an ad-hoc problem. So there are tons of solutions with similar time and space complexity. Given a stream of input integers, at any time find the median of the integers. According to wiki, the definition of median is:
The median of a finite list of numbers is the “middle” number, when those numbers are listed in order from smallest to greatest.
If there is an odd number of numbers, the middle one is picked. For example, consider the list of numbers
[1, 3, 3, 6, 7, 8, 9]
This list contains seven numbers. The median is the fourth of them, which is 6.
If there is an even number of observations, then there is no single middle value; the median is then usually defined to be the mean of the two middle values.[1][2] For example, in the data set
[1, 2, 3, 4, 5, 6, 8, 9]
The the naive approach will be to sort the integers and pick the middle values (average them if needed). But it will take O(n*logn)
for findMedian()
function. Can we optimize this more? Notice that the values are given 1 at a time. So while getting the values with addNum(int)
function, we can add data-structures specifically to get the medium as fast as possible.
Step 2: Building data structures concept with heaps
A binary heap is an abstract binary tree with the following properties:
- A child parent relation will always hold in the whole tree. Most of the time, the relation is lesser than (<) or greater than (>).
- It must be a complete binary tree. A complete binary tree is a binary tree in which every level, except possibly the last, is completely filled, and all nodes are as far left as possible.
Below are two representations of a min-heap (a heap where the child-parent relationship is “<”) and a max-heap (a heap where the child-parent relationship is “>”).
|
|
The above two heaps can be represented in arrays like below:
|
|
The implementation of heaps are very easy to do. But python has built-in heap
modules to leverage our coding load. Below is TL;DR of the heaps documentation which is used to solve the problem. To learn more about heapq
, please refer to the documentation.
|
|
Step 3: addNum(int)
implementation
Now that we are familiar with basic operations of heaps, In this problem, we are going to use two heaps to solve. The first one is a min-heap, and the second one is a max-heap. As we just want the middle of the numbers, the max-heap named left
will keep the first half of all the numbers so that it has the middle value (or the left value of the middle values if there are even numbers) on top of the heap. And the min-heap named right
will hold the second half of all the numbers so that it has the right value of the middle value/s. As there are multiple functions to use in the class, we have to make member variables to use these in all functions of the class.
|
|
Python does not have any implementation for max-heaps. But it can be achieved with a very simple trick. Just negate the values before pushing them to the heap!
We will push the new values to the left
first, then pop that value back and push it to the right
heap. This makes the left
and right
arrays always heapified.
|
|
To make the left
heap always have the middle value, it needs always to have more or equal values than right
heap. This way, it is guaranteed that left
will hold the median if the number of elements is odd. But the new value is currently in the right
heap (possibly making it bigger than left
). So we compare the size of both, and move one value from right
to left
.
|
|
This completes the addNum(int)
function implementation.
Step 4: findMedian()
implementation
Because we have the median at the top of the two heaps, this implementation is pretty straight forward. We will check if we have even or odd values. We can also check if size of left
is greater than size of right
and return the top of left
if not equal (total even numbers) or return the average of two tops if equal (total odd numbers). Remember that the top of the heap is already the first element of the array. And also remember that because the numbers were inserted into the max-heap after negating, you will have to negate it again to get the actual value. So when calculating the medium, we will subtract the top of right
instead of adding with top of left
.
|
|
The complexity of addNum(int)
is O(logn)
because we are inserting a value in a binary tree. The complexity of findMedian(self)
is constant.
The final solution is given below:
|
|
|
|
|
|
|
|