Find Median from Running Data Stream using Heaps

  • Similar to above balancing BST Method, we can use a max heap on the left side to represent elements that are less than effective median, and a min-heap on the right side to represent elements that are greater than effective median.
  • After processing an incoming element, the number of elements in heaps differs atmost by 1 element. When both heaps contain the same number of elements, we pick the average of heaps root data as effective median. When the heaps are not balanced, we select effective median from the root of the heap containing more elements.

Below is the implementation of the above approach:

C++14




// C++ code to implement the approach
 
#include <bits/stdc++.h>
using namespace std;
 
// Function to find the median of stream of data
void streamMed(int A[], int n)
{
    // Declared two max heap
    priority_queue<int> g, s;
   
    for (int i = 0; i < n; i++) {
        s.push(A[i]);
        int temp = s.top();
        s.pop();
       
        // Negation for treating it as min heap
        g.push(-1 * temp);
        if (g.size() > s.size()) {
            temp = g.top();
            g.pop();
            s.push(-1 * temp);
        }
        if (g.size() != s.size())
            cout << (double)s.top() << "\n";
        else
            cout << (double)((s.top() * 1.0
                              - g.top() * 1.0)
                             / 2)
                 << "\n";
    }
}
 
// Driver code
int main()
{
    int A[] = { 5, 15, 1, 3, 2, 8, 7, 9, 10, 6, 11, 4 };
    int N = sizeof(A) / sizeof(A[0]);
   
    // Function call
    streamMed(A, N);
    return 0;
}


Java




// Java code to implement the approach
 
import java.io.*;
import java.util.*;
 
class GFG {
   
    // Function to find the median of stream of data
    public static void streamMed(int A[], int N)
    {
        // Declaring two min heap
        PriorityQueue<Double> g = new PriorityQueue<>();
        PriorityQueue<Double> s = new PriorityQueue<>();
        for (int i = 0; i < N; i++) {
             
            // Negation for treating it as max heap
            s.add(-1.0 * A[i]);
            g.add(-1.0 * s.poll());
            if (g.size() > s.size())
                s.add(-1.0 * g.poll());
           
            if (g.size() != s.size())
                System.out.println(-1.0 * s.peek());
            else
                System.out.println((g.peek() - s.peek())
                                   / 2);
        }
    }
 
    // Driver code
    public static void main(String[] args)
    {
        int A[] = { 5, 15, 1, 3, 2, 8, 7, 9, 10, 6, 11, 4 };
        int N = A.length;
         
        // Function call
        streamMed(A, N);
    }
}


Python3




# Python code to implement the approach
 
from heapq import heappush, heappop, heapify
import math
 
# Function to find the median of stream of data
def streamMed(arr, N):
     
    # Declaring two min heap
    g = []
    s = []
    for i in range(len(arr)):
       
        # Negation for treating it as max heap
        heappush(s, -arr[i])
        heappush(g, -heappop(s))
        if len(g) > len(s):
            heappush(s, -heappop(g))
 
        if len(g) != len(s):
            print(-s[0])
        else:
            print((g[0] - s[0])/2)
 
 
# Driver code
if __name__ == '__main__':
    A = [5, 15, 1, 3, 2, 8, 7, 9, 10, 6, 11, 4]
    N = len(A)
     
    # Function call
    streamMed(A, N)


C#




// C# code to implement the approach
 
using System;
using System.Collections.Generic;
 
public class GFG {
 
  // Function to find the median of stream of data
  static void StreamMed(int[] arr, int N)
  {
    // Declaring two min heap
    SortedSet<int> g = new SortedSet<int>();
    SortedSet<int> s = new SortedSet<int>();
 
    for (int i = 0; i < N; i++) {
      // Negation for treating it as max heap
      s.Add(arr[i]);
      g.Add(s.Max);
      s.Remove(s.Max);
 
      if (g.Count > s.Count) {
        s.Add(g.Min);
        g.Remove(g.Min);
      }
 
      if (s.Count < g.Count)
        Console.WriteLine(g.Min);
      else if (s.Count > g.Count)
        Console.WriteLine(s.Max);
      else
        Console.WriteLine((g.Min + s.Max) / 2.0);
    }
  }
 
  static public void Main()
  {
 
    // Code
    int[] A = { 5, 15, 1, 3, 2, 8, 7, 9, 10, 6, 11, 4 };
    int N = A.Length;
 
    // Function call
    StreamMed(A, N);
  }
}
 
// This code is contributed by lokesh.


Javascript




//Javascript  code to implement the approach
function streamMed(arr) {
  // Declaring two min heap
  var g = [];
  var s = [];
 
  for (var i = 0; i < arr.length; i++) {
    // Negation for treating it as max heap
    s.push(-arr[i]);
    s.sort(function(a, b){ return a-b });
    g.push(-s.shift());
    g.sort(function(a, b){ return a-b });
    if (g.length > s.length) {
      s.unshift(-g.pop());
    }
 
    if (g.length != s.length) {
      console.log(-s[0]);
    } else {
      console.log((g[0] - s[0]) / 2);
    }
  }
}
 
// Driver code
var A = [5, 15, 1, 3, 2, 8, 7, 9, 10, 6, 11, 4];
streamMed(A);
//This Code is Contributed By Shivam Tiwari


Output

5
10
5
4
3
4
5
6
7
6.5
7
6.5

Time Complexity: O(n * log n), All the operations within the loop (push, pop) take O(log n) time in the worst case for a heap of size N.
Auxiliary Space: O(n)



Find Median from Running Data Stream

Given that integers are read from a data stream. Find the median of elements read so far in an efficient way. 

There are two cases for median on the basis of data set size.

  • If the data set has an odd number then the middle one will be consider as median.
  • If the data set has an even number then there is no distinct middle value and the median will be the arithmetic mean of the two middle values.

Example:

Input Data Stream: 5, 15, 1, 3
Output: 5, 10,5, 4
Explanation:
After reading 1st element of stream – 5 -> median = 5
After reading 2nd element of stream – 5, 15 -> median = (5+15)/2 = 10
After reading 3rd element of stream – 5, 15, 1 -> median = 5
After reading 4th element of stream – 5, 15, 1, 3 -> median = (3+5)/2 = 4

Input Data Stream: 2, 2, 2, 2
Output: 2, 2, 2, 2
Explanation:
After reading 1st element of stream – 2 -> median = 2
After reading 2nd element of stream – 2, 2 -> median = (2+2)/2 = 2
After reading 3rd element of stream – 2, 2, 2 -> median = 2
After reading 4th element of stream – 2, 2, 2, 2 -> median = (2+2)/2 = 2

Recommended Practice

Similar Reads

Find Median from Running Data Stream using Insertion Sort:

If we can sort the data as it appears, we can easily locate the median element. Insertion Sort is one such online algorithm that sorts the data appeared so far. At any instance of sorting, say after sorting i-th element, the first i elements of the array are sorted. The insertion sort doesn’t depend on future data to sort data input till that point. In other words, insertion sort considers data sorted so far while inserting the next element. This is the key part of insertion sort that makes it an online algorithm. However, insertion sort takes O(n2) time to sort n elements. Perhaps we can use binary search on insertion sort to find the location of the next element in O(log n) time. Yet, we can’t do data movement in O(log n) time. No matter how efficient the implementation is, it takes polynomial time in case of insertion sort....

Find Median from Running Data Stream using Augmented self-balanced binary search tree (AVL, RB, etc…)

...

Find Median from Running Data Stream using Heaps

...