R Vignette for the week – Finding time & distance between two places

Churn or attrition prediction has been used for quite a while in different industries such as banking and financial services or the telecom industry. More recently, it is starting to be used within organizations to help predict the probability of employees leaving the company within a specified time.

Research has shown that one of the critical variables influencing the decision of employees to leave a company (apart from the manager) is the distance from home to the place of work. This is information that is typically available with the organization and hence can certainly be used. So given two addresses how can we get the distance or the driving time between them?

Luckily for us, the folks at Google have created an easy to use API that can be accessed from within R as well. As usual stackoverflow came to the rescue. Here it is:

library(XML)
library(RCurl)
distance2Points <- function(origin,destination){
 results <- list();
 xml.url <- paste0('http://maps.googleapis.com/maps/api/distancematrix/xml?origins=',origin,'&destinations=',destination,'&mode=driving&sensor=false')
 xmlfile <- xmlParse(getURL(xml.url))
 dist <- xmlValue(xmlChildren(xpathApply(xmlfile,"//distance")[[1]])$value)
 time <- xmlValue(xmlChildren(xpathApply(xmlfile,"//duration")[[1]])$value)
 distance <- as.numeric(sub(" km","",dist))
 time <- as.numeric(time)/60
 distance <- distance/1000
 results[['time']] <- time
 results[['dist']] <- distance
 return(results)
}

This requires the XML and RCurl packages and little else.

Running the following command:

distance2Points(“Hebbal,Bangalore”,”Richmond%20Road,Bangalore”)

which is the distance between my home and my place of work, returns:

$time
[1] 29.41667

$dist
[1] 10.783

 

with the time in seconds and the distance in Kms.

Simple isn’t it? You can run this for all your employees and then include the results in any models that you need to build. However, keep in mind that there is a limit of 2500 calls to the API / day. If you need to use more, then there is a paid option.

5 thoughts on “R Vignette for the week – Finding time & distance between two places

  1. Payaj Reply

    Hi,

    I was trying to use the same code for a dataset (rows = 434, origin = column A, destination = “445 East 86th Street”), but it is giving me an error : Error in `$<-.data.frame`(`*tmp*`, "distance", value = list()) :
    replacement has 0 rows, data has 434
    Can you help me with this?
    my code is :
    matching_factor <- function(origin){
    testData$distance<- list();
    xml.url <- paste0('http://maps.googleapis.com/maps/api/distancematrix/xml?origins=&#039;,curlEscape(urls=origin),'&destinations=',curlEscape(urls=testData$Address),'&mode=walking&sensor=false')
    xmlfile <- xmlParse(getURL(xml.url))
    dist <- xmlValue(xmlChildren(xpathApply(xmlfile,"//distance")[[1]])$value)
    distance <- as.numeric(sub(" miles","",dist))
    distance <- distance/1609.344
    testData$distance[['dist']] <- distance
    return(testData$distance)
    }
    matching_factor(origin = "445 East 86th Street")
    )
    Thanks !

  2. Ashley Reply

    To allow for the use of a more complete address, add the following to the above function:

    origin=gsub(” “, “+”, origin)
    destination=gsub(” “, “+”, destination)

    for example:

    distance2Points <- function(origin,destination){
    origin=gsub(" ", "+", origin)
    destination=gsub(" ", "+", destination)
    results <- list();
    xml.url <- paste0('http://maps.googleapis.com/maps/api/distancematrix/xml?origins=&#039;,origin,'&destinations=',destination,'&mode=driving&sensor=false')
    xmlfile <- xmlParse(getURL(xml.url))
    dist <- xmlValue(xmlChildren(xpathApply(xmlfile,"//distance")[[1]])$value)
    time <- xmlValue(xmlChildren(xpathApply(xmlfile,"//duration")[[1]])$value)
    distance <- as.numeric(sub(" miles","",dist))
    time <- as.numeric(time)/60
    distance <- (distance/1000)*0.621371
    results[['time']] <- time
    results[['dist']] <- distance
    return(results)
    }

    distance2Points(origin='350 5th Ave, New York, NY 10118', '445 East 86th Street, New York, NY')

    Gives:

    $time
    [1] 18.58333

    $dist
    [1] 4.955434

    Note that I also converted the distance to miles, rather than km, as it looks like you had done so as well.

  3. Ashley Reply

    You also want to add

    origin=gsub(“#”, “%23”,origin)
    destination=gsub(“#”, “%23”,destination)

    to the first couple lines of the function if your addresses may have any # in them!

    If you come across any other unusual symbols in your addresses that cause the function not to work, open up google maps and put in the address as you have it. Then look at the URL to see how google adjusts them. Then add the corresponding gsub function alongside the others.

  4. Pedro Coelho Reply

    Hello, I’m trying use your program for calculate driving distance between two nodes. But, when I run the code in R, that give me the next error:
    Error in function (type, msg, asError = TRUE) :
    Failed to connect to maps.googleapis.com port 80: Timed out .
    I really don’t understand in what place I must put my KEY from Google API.
    Can someone help me?
    Thanks,
    Pedro.

Leave a Reply

Your email address will not be published. Required fields are marked *


1 × = six

11 + = 12