Point in Polygon in R

Posted: , Last Updated:

point_in_polygon <- function(polygon, point){
  #' Raycasting Algorithm to find out whether a point is in a given polygon.
  #' Performs the even-odd-rule Algorithm to find out whether a point is in a given polygon.
  #' This runs in O(n) where n is the number of edges of the polygon.
  #' @param polygon an array representation of the polygon where polygon[i,1] is the x Value of the i-th point and polygon[i,2] is the y Value.
  #' @param point   an array representation of the point where point[1] is its x Value and point[2] is its y Value
  #' @return whether the point is in the polygon (not on the edge, just turn < into <= and > into >= for that)
  
  # A point is in a polygon if a line from the point to infinity crosses the polygon an odd number of times
  odd = FALSE
  # For each edge (In this case for each point of the polygon and the previous one)
  i = 0
  j = nrow(polygon) - 1
  while(i < nrow(polygon) - 1){
    i = i + 1
    # If a line from the point into infinity crosses this edge
    # One point needs to be above, one below our y coordinate
    # ...and the edge doesn't cross our Y corrdinate before our x coordinate (but between our x coordinate and infinity)
    if (((polygon[i,2] > point[2]) != (polygon[j,2] > point[2])) 
        && (point[1] < ((polygon[j,1] - polygon[i,1]) * (point[2] - polygon[i,2]) / (polygon[j,2] - polygon[i,2])) + polygon[i,1])){
      # Invert odd
      odd = !odd
    }
    j = i
  }
  # If the number of crossings was odd, the point is in the polygon
  return (odd)
}

About the algorithm and language used in this code snippet:

Point in Polygon Algorithm

The Point in Polygon (PIP) problem is the problem of determining whether a point is any arbitrary polygon. This might sound trivial for a simple polygon like a square or a triangle, but gets more complex with more complex polygons like the one in the example below. In this post, the even-odd algorithm, also called crossing number algorithm or Jordan’s algorithm (since it can be proven using the Jordan curve theorem), will be introduced.

Description of the Algorithm

The basic principle behind the Even-odd aka. Jordan aka. Cross Number Algorithm is to count the number of times a line from the point in question (in any, arbitrary direction) crosses any edge of the polygon. This line will always be outside of the polygon at it’s “end” in infinity, so if it is inside the polygon at the start (the point in question), it will have to leave the polygon at some point, crossing some edge. It can re-enter the polygon (see the example below), but it always has to leave again, making the total number of crossings uneven if the point is in the polygon. The opposite is also true; if the number of crossings is even, the point is always outside of the polygon. This is the above mentioned Jordan’s curve theorem.

The algorithm checks every edge of the polygon in a loop to determine if the line from the point to infinity crosses it. In the example below, this line is drawn from the point to infinity to the right, but it can be any direction.

The steps are:

  1. For each edge in the polygon:
  2. If the edge crosses the imaginary line from the point to infinity, increase a counter.
  3. At then end, if the counter is uneven, return true. Else, return false.

A simple boolean variable that is inverted every time a crossing is found is also possible.

Example of the Algorithm

Consider the following polygon with 8 edges and two points for which we want to determine whether they are in the polygon:

Point in Polygon Jordan (Even-odd) Algorithm example polygon and points

The steps the algorithm performs on this polygon to determine whether the first (green) point is in the polygon are, starting from the first edge:

  1. Green Point crosses edge 8
  2. Green Point does not cross edge 1
  3. Green Point crosses edge 2
  4. Green Point does not cross edge 3
  5. Green Point crosses edge 4
  6. Green Point crosses edge 5
  7. Green Point does not cross edge 6
  8. Green Point does not cross edge 7
  9. Total number of crossings: 4
  10. Even number of edge crossings, therefore the point is not in the polygon

The steps the algorithm performs on this polygon to determine whether the second (red) point is in the polygon are, starting from the first edge:

  1. Red Point does not cross edge 8
  2. Red Point does not cross edge 1
  3. Red Point does not cross edge 2
  4. Red Point does not cross edge 3
  5. Red Point does not cross edge 4
  6. Red Point crosses edge 5
  7. Red Point does not cross edge 6
  8. Red Point does not cross edge 7
  9. Total number of crossings: 1
  10. Uneven number of edge crossings, therefore the point is in the polygon

Runtime Complexity of the Algorithm

The runtime complexity of the Jordan Algorithm aka. Crossing Number Algorithm aka. Even-Odd Algorithm to solve the point-in-polygon problem for a single point is linear with respect to the number of edges. This is evident by looking at the code, which contains a single loop over the edges, with no recursion or further function calls or loops.

Formally the runtime is O(|V|), |V|:=number of edges in the polygon.

Space Complexity of the Algorithm

The space complexity is also linear w.r.t. the number of edges, since only fixed-size variables need to be stored in addition to the polygon. Additionally, the algorithm can be implemented on-line, meaning there is no need to look at past edges during the loop, so they can be evicted from memory (or comparable performance improvement measures).

R

The R Logo

R is an interpreted language first released in 1993 with a significant increase in popularity in recent years. It is primarily used for data mining and -science as well as statistics, and is a popular language in non-computer science disciplines ranging from Biology to Physics. R is dynamically typed, and has one of the widest variety of libraries for statistics, machine learning, data mining etc.

Getting to “Hello World” in R

The most important things first - here’s how you can run your first line of code in R.

  1. Download and install the latest version of R from r-project.org. You can also download an earlier version if your use case requires it.
  2. Open a terminal, make sure the R command is working, and that the command your’re going to be using is referring to the version you just installed by running R --version. If you’re getting a “command not found” error (or similar), try restarting your command line, and, if that doesn’t help, your computer. If the issue persists, here are some helpful StackOverflow questions for Windows, Mac and Linux.
  3. As soon as that’s working, you can run the following snippet: print("Hello World"). You have two options to run this: 3.1 Run R in the command line, just paste the code snippet and press enter (Press CTRL + D and type n followed by enter to exit). 3.2 Save the snippet to a file, name it something ending with .R, e.g. hello_world.R, and run Rscript hello_world.R. Tip: use the ls command (dir in Windows) to figure out which files are in the folder your command line is currently in.

That’s it! Notice how printing something to the console is just a single line in R - this low entry barrier and lack of required boilerplate code is a big part of the appeal of R.

Fundamentals in R

To understand algorithms and technologies implemented in R, one first needs to understand what basic programming concepts look like in this particular language.

Variables and Arithmetic

Variables in R are really simple, no need to declare a datatype or even declare that you’re defining a variable; R knows this implicitly. R is also able to easily define objects and their property, in multiple different ways.

some_value = 10
my_object <- list(my_value = 4)
attr(my_object, 'other_value') <- 3

print((some_value + my_object$my_value + attr(my_object, 'other_value'))) # Prints 17

Arrays

Working with arrays is similarly simple in R:

# Create 2 vectors of length 3
vector1 <- c(1,2,3)
vector2 <- c(4,5,6)

# Create names for rows and columns (optional)
column.names <- c("column_1","column_2","column_3")
row.names <- c("row_1","row_2")

# Concatenate the vectors (as rows) to form an array, providing dimensions and row/column names
result <- array(c(vector1,vector2), dim = c(2,3), dimnames = list(row.names, column.names))

print(result)
# Prints:
#       column_1 column_2 column_3
# row_1        1        3        5
# row_2        2        4        6

As those of you familiar with other programming language like Java might have already noticed, those are not native arrays, but rather lists dressed like arrays. This means that arrays in R are considerably slower than in lower level programming languages. This is a trade off R makes in favor of simplicity. There are, however, packages which implement real arrays that are considerably faster.

Conditions

Just like most programming languages, R can do if-else statements:

value = 1
if(value==1){
   print("Value is 1")
} else if(value==2){
     print("Value is 2")
} else {
     print("Value is something else")
}

R can also do switch statements, although they are implemented as a function, unlike in other languages like Java:

x <- switch(
   1,
   "Value is 1",
   "Value is 2",
   "Value is 3"
)

print(x)

Note that this function is pretty useless, but there are other functions for more complex use cases.

Loops

R supports both for and while loops as well as break and next statements (comparable to continue in other languages). Additionally, R supports repeat-loops, which are comparable to while(true) loops in other languages, but simplify the code a little bit.

value <- 0
repeat {
  value <- value + 1
  if(value > 10) {
    break
  }
}
print(value)

value <- 0
while (value <= 10) {
  value = value + 1
}
print(value)

value <- c("Hello","World","!")
for ( i in value) {
  print(i)
}

for(i in 1:10){
  print(i)
}

Functions

Functions in R are easily defined and, for better or worse, do not require specifying return or arguments types. Optionally, a default for arguments can be specified:

my_func <- function (
  a = "World"
) {
  print(a)
  return("!")
}

my_func("Hello")
print(my_func())

(This will print “Hello”, “World”, and then ”!“)

Syntax

R requires the use of curly brackets ({}) to surround code blocks in conditions, loops, functions etc.; While this can lead to some annoying syntax errors, it also means the use of whitespace for preferred formatting (e.g. indentation of code pieces) does not affect the code.

Advanced Knowledge of R

For more information, R has a great Wikipedia article. The official website is r-project.org.