When it comes to programming, finding elements in large arrays quickly is crucial for efficient and speedy applications. In this blog post, we'll dive into a real-life example of how I tackled the challenge of making element search faster in an array. This is a case study that takes you step-by-step through the process, starting with traditional for loops, moving on to using built-in methods, and finally arriving at the most efficient technique called binary search.
Let's Understand the Problem . . .
Imagine you have an array that represents a series of events, where each object in the array corresponds to a specific event and includes the time at which it occurred. Now, suppose you receive a time value from an external API, and our task is to retrieve a portion of the array starting from the index where the time of the event is greater than or equal to the provided time.
This means we need to find the events that occurred after the time, that we get from the API (here we are going to refer to this time as lastSynced) from the events array which has objects with several properties and we need to find the events based on the property called logTime.
if found return the new events else return an empty array.
Approach 1: The Traditional way
Ha... whenever we need to find a value in an array, the conventional approach often we follow is iterating through the array using a for loop, During each iteration, we typically check certain conditions, and if a value satisfies the conditions, we store or process it further. In some cases, we might save the matched values in a data structure for future use.
Before getting into it first will write our helper method which will give us the difference between the given two dates.
The Helper method . . . .
Understanding the helper method.
Date.parse(dateTime) This will parse the datetime string to Date format and will give us the time in milliseconds commonly referred to as epoch time and the same for the lastSynced too, after parsing and getting the epoch for both the time string we return the difference between them (will soon get to see why we need the difference of two-time string).
lastSynced which holds the time that the previous events were get stored.
totalEvents has all the events that occurred over a period.
we declare and initialize an empty array in a variable called newEvents.
now we iterate the array and will check whether any of the events occurred after the lastSynced time or not using the helpermethod by passing the logTime of each object as the first argument and lastSynced as the second so that we will get the difference between the two datetime and store it in a variable called val,
We check if the val > 0, which indicates that the current event logTime used to check against the lastSynced is a new event. This will get cleared with the below example
e.g. Assume the array length is 300.
when index = 0;
the millisecond representation of "2023-06-07 10:27:08" is 1686113828000
the millisecond representation of "2023-07-07 13:27:08" is 1688716628000
let val = the difference returned by the helperMethod is -2602800000
the val is lesser than 0 which indicates that this event is an old one and we don't need it, so as we don't push it to the newEvents array and continue with the iteration
and the iteration goes on, once we reach the index 298 the difference between the totalEvents[298].logTime and the lastSynced is 43200000.
Similarly, the events from the index 299 will be greater only and get added to the newEvents.
now we see the val is greater than 0 which indicates that a current event is a new event and we add this to the newEvents array.
once the iteration is done the function returns the newEvents array.
Now the core logic is done it's time to observe the time taken by this for loop . . .
the execution time may vary when running the same code on different machines or in different environments.
Note: Here the array used is a length of 5000.
Approach 2: Using filter
The filter()
method creates a shallow copy of a portion of a given array, filtered down to just the elements from the given array that pass the test implemented by the provided function.
The time taken by the filter method is . . .
Upon observation, it can be noted that the execution time of the filter method is generally shorter compared to the traditional for loop. However, it is important to clarify that when the array size remains the same, both the filter method and the for loop would take a similar amount of time, as each element in the array needs to be checked.
It's time for the algorithm . . . . .
Approach 3: Binary search
What is binary search?
It is a searching algorithm used in a sorted array by repeatedly dividing the search interval in half.
If you are not familiar with it, here is the link Binary search
Now, if we observe our totalEvents array we can see the array is sorted in nature which indicates that we can perform a Binary search in it.
But how the array is sorted,
If we carefully observe the logTime property of each object in an array we can confirm that the array is sorted in nature as the events occurred in an increasing order.
The idea is to separate the logic into two parts,
First, we find the first occurrence of the object's corresponding index using binary search.
Then, we use the slice method to get the actual array which represents the newEvents.
Breaking down binary search implementation . . .
the low and high are used to determine the search space of an array by checking the mid element of the current search space if the mid element is greater than the searchable value we reduce the search space by assigning the high value as mid-1 and low value to mid+1, if the mid value is less than searchable value. to understand it better follow the below example,
the first iteration, Assume array length = 300
low = 0, high = 299 and mid = Math.round(149.5) = 150 (here we round the resultant value to have a valid index).
so here the mid object is totalEvents[150].logTime = "2023-06-07 18:27:08" is 1686142628000
the difference between the current mid and lastSynced is -2574000000 which is less than 0 so, we change the low value to mid+1, by making the searching space to the right half of the array and vice versa.
now, low = 151 and high = 299, we keep repeating it till the low is lesser than high, if low > high, it indicates that the search range has been exhausted.then we take the value of low,
why the low value,
here the aim is to find the first occurrence of the object where the logTime is greater than the lastSynced, so to exactly find the first occurrence, the upper bound of the target value is the one we search for, the low represents the upper bound of the value 0.
the if condition low < totalEvents.length is to check whether low is within the array bound as the maximum index will be 299 (assuming array size is 300), if the low is greater than are equal to the totalEvents length then, all the object in the array is lesser than the lastSynced which represents the all events are old events and the loop is terminated if the condition passed we get the part of an array using slice method and if not it defaultly returns an empty array.
it's time to see the results
It is observed that, binary search yields superior results compared to traditional looping and filtering methods when it comes to search efficiency
While the performance difference between binary search and linear search may not be significant for smaller-size arrays, it becomes increasingly pronounced as the array size grows.
refer to the below table for which the test is conducted for various array sizes
The table demonstrates that as the array size increases, the linear search algorithm's execution time grows significantly, while the binary search algorithm's execution time remains relatively stable. This highlights the inherent advantage of binary search, as its time complexity grows logarithmically rather than linearly
Conclusion
In conclusion, optimizing your code, including implementing more efficient search algorithms like binary search, can be beneficial. However, it is important to consider whether the optimization is actually necessary, especially for smaller-sized arrays.