Line Through Points

Line Through Points

Category

Arrays

Difficulty

Very Hard

Problem Statement

Given a set of 2D points, find the maximum number of points that lie on the same straight line. The points are given as an array of (x, y) coordinate pairs.

Intuition

For each point, we can compute the slope it forms with every other point. Points that share the same slope relative to a fixed reference point are collinear with it. By grouping slopes and finding the largest group, we find the maximum collinear count through that reference point. Repeating this for every point as the reference gives the global answer. The critical subtlety is representing slopes in a way that avoids floating-point errors — using reduced fractions (rise/run in lowest terms) instead of decimal division.

Approach

  1. For each point p in the array, treat it as the "anchor" and compute the slope to every other point q.
  2. Represent the slope as a pair (dy, dx) reduced to lowest terms using the GCD. Normalize sign so that dx is always non-negative (and if dx == 0, ensure dy is positive). This avoids hash collisions between equivalent slopes.
  3. Use a hash map to count how many points share each slope relative to the anchor.
  4. Track duplicate points (same coordinates as the anchor) separately, since they are collinear with any line through the anchor.
  5. The best count through this anchor is maxSlopeCount + duplicates.
  6. The global answer is the maximum across all anchors.

Pseudocode

function lineThroughPoints(points):
    if len(points) <= 2:
        return len(points)

    maxPoints = 1

    for i from 0 to len(points) - 1:
        slopeMap = {}
        duplicates = 1  # count self

        for j from 0 to len(points) - 1:
            if i == j:
                continue
            dx = points[j].x - points[i].x
            dy = points[j].y - points[i].y

            if dx == 0 and dy == 0:
                duplicates += 1
                continue

            g = gcd(abs(dy), abs(dx))
            dy /= g
            dx /= g

            # Normalize sign
            if dx < 0:
                dx = -dx
                dy = -dy
            elif dx == 0:
                dy = abs(dy)

            slope = (dy, dx)
            slopeMap[slope] = slopeMap.get(slope, 0) + 1

        localMax = duplicates
        for count in slopeMap.values():
            localMax = max(localMax, count + duplicates)

        maxPoints = max(maxPoints, localMax)

    return maxPoints

Time & Space Complexity

  • Time: O(n^2) where n is the number of points. For each of the n points, we iterate over all other n-1 points. GCD computation is O(log(max_coordinate)) per pair, which is negligible.
  • Space: O(n) for the slope hash map, which is rebuilt for each anchor point.

Key Insights

  • Representing slopes as reduced integer fractions avoids floating-point precision issues entirely. This is essential for correctness.
  • Sign normalization ensures that slopes like (1, -2) and (-1, 2) are treated as the same slope.
  • Vertical lines (dx = 0) and horizontal lines (dy = 0) are handled naturally by the fraction representation.
  • Duplicate points must be counted separately since they have an undefined slope but lie on every line through that location.
  • The O(n^2) approach is optimal because in the worst case, verifying that more than 2 points are collinear requires examining all pairs.

Edge Cases

  • Single point: return 1.
  • Two points: always collinear, return 2.
  • All points are the same: all duplicates, return n.
  • All points are collinear: the answer is n.
  • Large coordinates: ensure GCD and fraction operations do not overflow (use long integers if necessary).
  • Points forming a grid: many distinct slopes exist, but the answer might still be a full row or column.