Sitemap

How companies like Uber, Google and Airbnb Disrupt Industries with Location Intelligence?

Understanding the mechanism behind getting my nearest cafes in a matter of seconds.

10 min readJul 12, 2023
http://www.greenbot.com/article/2925062

Companies like Uber, Lyft and Google leverage location data of users to provide useful information in a lot of ways. I have noted only a few down while the possibilities are endless.

  1. Ride-hailing and Navigation:
    Uber utilizes location data to match riders with drivers, calculate the estimated time of arrival, and optimize routing for efficient transportation. Google Maps offers navigation services, suggesting the fastest routes, providing real-time traffic updates, and offering alternative transportation options.
  2. Personalized Recommendations:
    Both Uber and Google leverage location data to offer personalized recommendations based on a user’s current location. Uber suggests popular destinations or nearby restaurants, while Google provides recommendations for nearby businesses, attractions, and events.
  3. Targeted Advertising: Location data enables companies to deliver targeted advertisements. For example, Uber may display promotions for discounted rides in a user’s vicinity, while Google Ads can show relevant ads based on a user’s location and search history.
  4. Traffic Analysis and Predictions: Uber and Google analyze location data to gather insights into traffic patterns, congestion hotspots, and transportation trends. This data is used to improve routing algorithms, develop predictive models, and offer real-time traffic information to users.

Different types of Geo-spatial Indices

Geo-spatial Indices broad classification

In a Hash based geospatial index, each geospatial object (point, polygon, or region) is assigned a unique hash value. The hashing function takes into account the properties of the object, such as its coordinates or attributes, and generates a hash code that represents its location or position within the index.

Hash-based indexes are commonly used for geospatial indexing because they offer fast insertion and retrieval operations. When querying for a specific object or searching for nearby objects, the hashing function is applied to the query parameters, and the resulting hash code is used to quickly locate the relevant entries in the index. This allows for efficient spatial searches and reduces the search space.

Tree based geospatial indexes are data structures that organize geospatial data in a tree-like structure to enable efficient storage, retrieval, and querying of spatial information. These indexes partition the space into hierarchical subdivisions and use tree structures to represent the relationships between these subdivisions.

In this article, we will try to dive a little deeper into the Hash based Geo-spatial index called Geo Hash and try to figure out the problem

Hey google, show me my nearest cafes quick…

Traditional Approach

You essentially try to search the whole whole DB which matches these conditions.

SELECT * FROM business 
WHERE latitude BETWEEN 10 - (:radius) AND 10 + (:radius)
AND longitude BETWEEN 120 - (:radius) AND 120 + (:radius)

The problem with the Traditional Vanilla Approach

This query is going to be tremendously slow. You are scanning almost your whole table.

Even with indexes on latitude and longitude, you will still need to scan way more rows then you would need.

Moreover if you would have stored them in a Relational databases, then it is good to remember that they do not do well with floating point numbers and comparisons which is why they will start choking as soon as their is scale.

Geo Hashing

What is GeoHashing and why is it used?

Geohashing is a geocoding method that converts geographical coordinates into a short alphanumeric string called a geohash. It is a way to represent a location with a high level of precision using a concise and easily shareable code.

The geohash algorithm divides the Earth’s surface into a grid of rectangular cells. Each cell is assigned a unique geohash code based on its position within the grid. The geohash code is generated by interleaving bits from the latitude and longitude coordinates of a location.

The length of the geohash code determines the level of precision. A longer geohash code represents a smaller area and provides more accurate location information. Conversely, a shorter geohash code represents a larger area with lower precision.

The main advantages of geohashing are:

  1. Precision and Flexibility: Traditional search methods often rely on bounding boxes or complex geometric calculations to determine spatial relationships. Geohashing provides a simple and precise representation of location with a compact code, allowing for accurate proximity searches and spatial calculations.
  2. Proximity-based Searches: Traditional methods require computationally expensive operations to identify and retrieve nearby locations. Geohashing, with its hierarchical structure and similar prefixes for nearby locations, enables efficient proximity searches by leveraging the inherent proximity encoded in the geohash codes.
  3. Indexing Efficiency: Traditional databases may face challenges in efficiently indexing and querying spatial data. Geohashing provides a spatial index structure that allows for efficient indexing, retrieval, and range searches. It simplifies the indexing process and reduces the search space, resulting in improved query performance.
  4. Scalability: Traditional search methods might struggle to scale in distributed systems or handle large spatial datasets. Geohashing provides a distributed-friendly approach by allowing for efficient partitioning and distribution of spatial data across multiple nodes. It enables scalable processing and parallel execution of geospatial queries.
  5. Compactness and Storage Efficiency: Traditional approaches can require significant storage space to represent complex geographic regions accurately. Geohashes are compact and can represent complex regions with minimal data, making them storage-efficient and suitable for transmission and indexing purposes.
  6. Hierarchical Querying: Traditional search methods often lack an inherent hierarchical structure for spatial querying. Geohashing, with its hierarchical grid system, allows for easy aggregation, filtering, and querying at different levels of granularity. It provides flexibility in querying spatial data at varying resolutions without the need to traverse the entire dataset.

Fundamentals of GeoHashing

The below representation helps understand how the world is divided into recursive segments.

The grid cells have a consistent size determined by the length of the geohash code. A longer geohash code represents a smaller cell size, while a shorter geohash code represents a larger cell size. This means that the level of precision remains the same regardless of the density of locations in a specific area.

In dense regions, where there are many locations clustered together, each location will have its own geohash code within the smaller grid cells. The granularity of the grid cells does not change, but more cells will be populated with individual geohash codes to represent the dense concentration of locations.

In sparse regions, where there are fewer locations spread apart, each location will still have its own geohash code, but the grid cells will cover a larger area. The grid cells are larger in size, which can result in a lower level of precision for individual locations.

Notice how the dense locations have a increasingly higher and more granular grid system as compared to the sparse locations where the grid systems are much more relaxed. Credits: ByteByteGo

Proximity with GeoHash

Notice how the prefixes are almost matching. Credits: ByteByteGo

The precision of a geohash code increases with the number of subdivisions or levels.

When two locations are very close to each other, they fall within the same or adjacent grid cells at a certain level of precision. The geohash codes for these locations will share the same prefix, which represents the shared parent cell. As you move down to more precise levels, the geohash codes for the two locations will become more similar.

For example, consider two locations just a short distance apart within the same grid cell. At a lower precision level, their geohash codes might look like 9q8zn and 9q8zn. As you move to a more precise level, the codes might become 9q8znd and 9q8zn9. The longer the shared prefix, the closer the two locations are to each other.

This characteristic of geohashing is advantageous for proximity-based searches and spatial calculations. It allows for efficient identification of nearby locations by examining the similarity of their geohash prefixes. However, it’s important to note that as locations move further apart, their geohash codes will diverge significantly.

Calculation of a GeoHash

Let’s walk through a basic example of geohashing.

Let’s say we have a specific location represented by latitude and longitude coordinates: 37.7749° N (latitude) and 122.4194° W (longitude). Here’s how we can generate the geohash code for this location:

Step 1: Determine the Geohash Precision Level
Decide on the desired level of precision for the geohash. In this example, let’s use a precision level of 5, which provides a reasonable balance between accuracy and brevity.

Step 2: Convert Coordinates to Binary
Convert the latitude and longitude coordinates to binary format. For latitude, positive values represent north, and negative values represent south. For longitude, positive values represent east, and negative values represent west.

Latitude: 37.7749° N converts to binary as 100101.11001011011010000000101011010111100011100111011

Longitude: -122.4194° W converts to binary as 1111000.100101111011000011001100000010100010001110110

Step 3: Interleave Bits
Interleave the bits from the latitude and longitude binary representations. Take one bit from the latitude, followed by one bit from the longitude, and so on, until all the bits are used.

Interleaved Binary:
1011101101001001011101010101111110001111001110110100000100101100111011011000011010000111010

Step 4: Convert to Base32
Divide the interleaved binary into groups of 5 bits and convert each group to its corresponding Base32 character.

Base32 Geohash: 9q8yy

The resulting geohash code for the given location at a precision level of 5is 9q8yy.

This geohash can be used to represent the location and can be shared, stored, or used for spatial indexing and geospatial calculations.

Please note that in this example, we used a very simplified geohashing process. In practice, there are different implementations and variations of geohashing algorithms, but the fundamental idea remains the same.

Do take a look at the precision chart as well below.

But why do we employ a Base32 Encoding??

Base32 encoding is commonly chosen for geohashing due to several reasons:

  1. Compactness: Base32 encoding allows for representing data using a smaller character set compared to other encodings like Base64. It achieves this by using 32 distinct characters (0–9, and A-Z excluding I, L, O, and U), which provides a higher information density per character.
  2. Human Readability: Base32 encoding produces strings that are more human-friendly and easier to work with compared to binary or hexadecimal representations. The resulting geohash codes are alphanumeric and can be easily shared, communicated, or manually entered without the need for special characters.
  3. Error Tolerance: Base32 encoding is designed to be error-tolerant and resistant to common transcription errors or typographical mistakes. It achieves this by avoiding visually similar characters (such as I, L, O, and U) and by including error detection and correction mechanisms, such as checksums or parity bits.

Simplification through GeoHashing

SELECT * FROM business WHERE geohash LIKE "9q8%";

A gorgeous improvement over the Traditional approach.

String comparison are much quicker than floating point numbers in SQL queries. We can further add an Index on the geohash for optimization.

Usually we need only a few miles around the user.

So we can simply store the exact number of prefixes for a direct comparison rather than a LIKE comparison. Let’s just store these lengths directly as shown below in the below diagram.

Optimizations of the stored GeoHashes

Now if you want businesses around 1 km of the user:

SELECT * FROM business WHERE geohash_6 = "<first_6_characters_of_users_geohash>"

If you want businesses around 5 km of the user:

SELECT * FROM business WHERE geohash_5 = "<first_5_characters_of_users_geohash>"

If you want businesses around 10 km of the user:

SELECT * FROM business WHERE geohash_4 = "<first_4_characters_of_users_geohash>"

and the queries will be a lot faster!!

Issues with GeoHashing

While geohashing offers many advantages, it is important to consider some of the challenges and limitations associated with geohashing:

  1. Boundary Issues: When two locations are very close to each other, they may fall into different quadrants or cells within the geohashing grid. Even a slight difference in latitude or longitude can place the locations in different quadrants, resulting in different geohash codes.
    This behavior is a characteristic of geohashing and the grid system it employs. It ensures that nearby locations have similar prefixes in their geohash codes, indicating their spatial proximity. However, when locations cross quadrant boundaries, their geohashes can differ significantly due to the hierarchical nature of the grid.
  2. Overlapping Areas: Geohash grids do not align perfectly with geographical boundaries. In some cases, adjacent geohash cells may have overlapping areas, making it challenging to precisely define the boundaries of regions represented by geohash codes.
  3. Distortion at High Latitudes: Geohashing can introduce distortion at high latitudes, as the grid cells become elongated and narrower. This distortion can affect the accuracy of spatial calculations and queries, particularly near the poles.
  4. Complex Geometries: Geohashing works best for representing points, rectangles, or simple shapes. However, when dealing with complex geometries such as polygons with irregular boundaries or multiple disconnected regions, geohashing may not be the most efficient or accurate representation method.
  5. Querying Irregular Regions: Geohash grids are based on a regular subdivision of space, which can make it challenging to query irregularly shaped regions or areas with varying densities of points. Special techniques or additional data structures may be required to handle such cases effectively.
  6. Varying Cell Sizes: Geohashing uses fixed-sized cells, resulting in non-uniform cell sizes at different latitudes. This non-uniformity can lead to differences in the level of precision or coverage across different regions of the Earth’s surface.

Signing Off!!

Finishing off with the article here. I really hope, I made the article worth your while and you learned a great deal from it.

Do consider giving this a clap since this encourages me to write more such content and share it with the world.

More than happy to address any improvements and suggestions. Please feel free to drop a comment.

Connect with me via: |

Abhirup Acharya
Abhirup Acharya

Responses (1)