Geohash

Geohash

A geocoding scheme that encodes a (latitude, longitude) pair into a short alphanumeric string where nearby locations share a common prefix — enabling efficient proximity queries without full-table radius scans.

When NOT to Use

  1. When exact distance calculations are required — geohash cells are rectangular, not circular. Two points at opposite corners of the same cell can be farther apart than two points on either side of a shared cell boundary. Geohash gives proximity grouping, not metric distance. Use Haversine formula or PostGIS ST_DWithin for exact radius queries.
  2. When the query radius crosses cell boundaries at different precision levels — geohash proximity queries require checking the target cell AND its 8 neighbors (9 cells total). At the equator, cells are wider than near the poles; the same precision level covers different physical areas at different latitudes. This requires choosing precision carefully and always querying neighbors.
  3. When latitude/longitude wrapping at the antimeridian (longitude +/-180) is critical — geohash cells do not wrap smoothly across the antimeridian. The geohash prefixes "9" and "b" are geographically adjacent at the antimeridian but share no common prefix. Applications near the International Date Line require special-case handling.

Core Mechanism

How It Works

Geohash uses recursive binary space partitioning, alternating between latitude and longitude bits:

  1. Divide the world into two halves: is the latitude in the northern (0..90) or southern (-90..0) half? Encode as bit 1 or 0.
  2. Alternate to longitude: is the longitude in the eastern (0..180) or western (-180..0) half? Encode as bit 1 or 0.
  3. Recurse: each bit halves the selected dimension. After n iterations, you have a 2n-bit binary string encoding the location.
  4. Base-32 encode: group every 5 bits into one character using the geohash alphabet (0-9, b-z excluding a, i, l, o). A 5-character geohash = 25 bits; a 9-character geohash = 45 bits.

Precision Levels

CharactersLat errorLng errorApproximate cell size
1+/- 23 deg+/- 23 deg~5000 km x 5000 km
4+/- 0.18 deg+/- 0.18 deg~40 km x 20 km
6+/- 0.006 deg+/- 0.006 deg~1.2 km x 0.6 km
8+/- 0.00017 deg+/- 0.00017 deg~38 m x 19 m
9+/- 0.00002 deg+/- 0.00002 deg~4.8 m x 4.8 m

Prefix Property

All locations within a geohash cell share the same prefix up to that precision level. Example: all locations in the San Francisco neighborhood 9q8yy also belong to cell 9q8y, 9q8, 9q, and 9. This prefix hierarchy enables efficient database index lookups: a LIKE '9q8%' query returns all locations in that area without a full-table scan.

Component Diagram

flowchart TD
    A["(lat=37.8, lng=-122.4)"] --> B["Is lat in upper half?\n-90..0 or 0..90"]
    B -->|"Yes (0..90) → bit 1"| C["Narrow: 0..90\nIs lng in right half?"]
    B -->|"No → bit 0"| D["Narrow: -90..0"]
    C -->|"No (-180..0) → bit 0"| E["Narrow: -180..0\nContinue recursively"]
    E --> F["Alternating lat/lng bits\nafter n iterations"]
    F --> G["Binary string: 0110101..."]
    G --> H["Base-32 encode\nevery 5 bits → 1 character"]
    H --> I["Geohash: '9q8yy'"]
    I --> J["Prefix property:\nall '9q8yy*' locations\nare in the same ~4.8m cell"]
    J --> K["Proximity query:\nfetch cell + 8 neighbors\n'9q8yx', '9q8yz', '9q8yv'..."]

System Design Application

Proximity queries in ride-sharing and food delivery: Find all drivers within 1 km of a rider. With geohash at precision 6 (~1.2 km cells), the rider's geohash and its 8 neighbors bound the search radius. A database index on the geohash column reduces a full-table spatial scan to an index range scan. Systems like Uber, Lyft, and DoorDash use geohash or equivalent cell-based spatial indexing at their location services layer.

Geohash as a cache key in Distributed-Cache: Cache the set of active driver IDs per geohash cell. When a rider requests nearby drivers, look up the geohash cell in Redis instead of querying the database. Short TTL (~5 seconds for ride-sharing) keeps cells fresh as drivers move. GEOADD / GEORADIUS in Redis use geohash internally.

Geohash prefix as a shard key in Database-Sharding: When sharding a location-tagged dataset (e.g., restaurants, properties, delivery zones), geohash prefix is a natural shard key. All data within a geographic region lands on the same shard, which benefits locality for neighborhood-level queries. Geographic "hot spots" (dense city centers) require shard splitting at higher precision levels.

Scope Boundaries

  • NOT the same as Consistent-Hashing — Consistent Hashing distributes arbitrary keys across cluster nodes on a hash ring to minimize remapping when nodes are added or removed. Geohash encodes geographic coordinates into strings that preserve spatial proximity. The mechanisms are entirely different: Consistent Hashing solves node distribution; Geohash solves spatial indexing. The word "hash" appears in both names but refers to completely different operations.

  • NOT the same as Database-Sharding — Database Sharding is the infrastructure pattern describing how data is horizontally partitioned across multiple database nodes. Geohash is one possible shard key strategy when the data is geo-tagged. Sharding is the infrastructure decision; Geohash is an encoding choice that enables geographic locality in that sharding strategy.

Skeletal Snippet

// PRODUCTION NOTE: use ngeohash npm package for production — this snippet is for illustration only
// import ngeohash from 'ngeohash'; const hash = ngeohash.encode(37.8, -122.4, 6);
 
const BASE32 = '0123456789bcdefghjkmnpqrstuvwxyz';
 
function encodeGeohash(lat: number, lng: number, precision = 6): string {
  let minLat = -90, maxLat = 90;
  let minLng = -180, maxLng = 180;
  let hash = '';
  let bits = 0;
  let bitsTotal = 0;
  let hashValue = 0;
  let isLng = true; // geohash alternates: lng first, then lat
 
  while (hash.length < precision) {
    if (isLng) {
      const mid = (minLng + maxLng) / 2;
      if (lng >= mid) { hashValue = (hashValue << 1) | 1; minLng = mid; }
      else { hashValue = hashValue << 1; maxLng = mid; }
    } else {
      const mid = (minLat + maxLat) / 2;
      if (lat >= mid) { hashValue = (hashValue << 1) | 1; minLat = mid; }
      else { hashValue = hashValue << 1; maxLat = mid; }
    }
    isLng = !isLng;
    bits++;
    bitsTotal++;
 
    if (bits === 5) {
      hash += BASE32[hashValue];
      bits = 0;
      hashValue = 0;
    }
  }
  return hash;
}
 
// Usage: encodeGeohash(37.8, -122.4, 6) → '9q8yy9' (San Francisco area)
// Proximity query: fetch cell + 8 neighbors using ngeohash.neighbors(hash)

Pitfalls

Edge-of-cell problem: always query 8 neighbors

Two points at opposite corners of the same geohash cell can be farther apart than two points on opposite sides of a cell boundary. A proximity query that checks only the target cell misses nearby locations in neighboring cells. Always query the target cell PLUS its 8 neighbors (9 cells total). The ngeohash.neighbors(hash) function returns these 8 neighbors.

Antimeridian discontinuity

The geohash prefixes 9 (western Pacific) and b (eastern Pacific) are geographically adjacent at the antimeridian (longitude +/-180) but share no common prefix. An application tracking ships or aircraft that cross the International Date Line cannot rely on prefix matching alone — special handling is required to recognize that 9* and b* are neighbors at the antimeridian.

Existing Pattern Connections

NoteRelationship
Consistent-HashingDifferent hashing purposes: Consistent Hashing distributes keys to nodes; Geohash encodes geographic coordinates preserving spatial proximity
Database-ShardingGeohash prefix is one possible shard key strategy for geo-tagged data, enabling geographic data locality
Distributed-CacheGeohash strings serve as natural cache keys for location-based queries; Redis GEOADD/GEORADIUS use geohash internally
  • Algorithms-MOC — listed in System Design Application Index: O(1) proximity cache key and geographic shard key
  • Distributed-Cache — geohash cell string as cache key for location-based queries
  • Database-Sharding — geohash prefix as shard key for geographic data locality
  • Consistent-Hashing — geohash determines the data key; consistent hashing determines which node stores it