R Functions to Interact with the EIA's Application Programming Interface (API)

The Energy Information Administration at the US Department of Energy has made a great deal of data available through an API. The data sets include 408,000 electricity series, 115,052 petroleum series, and 11,989 natural gas series. There are also 30,000 State Energy Data System series. The series can be browsed here: http://www.eia.gov/beta/api/qb.cfm

I have created a couple of R functions which will allow you to browse the data sets from with R, and download data directly into R as an object of class xts. Following is the function to browse the categories and find series IDs form within R – I'll post the functions which extract the data shortly. These functions are works in progress.

To use the functions you'll need to request a key from the EIA here: http://www.eia.gov/beta/api/register.cfm

You will need to have the following libraries XML and plyr libraries loaded.

library(XML)
library(plyr)

The function is:

getCatEIA <- function(key, cat=999999999){

  key <- unlist(strsplit(key, ";"))

  ifelse(cat==999999999,
         url <- paste("http://api.eia.gov/category?api_key=", key, "&out=xml", sep="" ),

         url <- paste("http://api.eia.gov/category?api_key=", key, "&category_id=", cat, "&out=xml", sep="" )
         )

  doc <- xmlParse(file=url, isURL=TRUE)

  print("########Parent Category########")
  tryCatch(print(xmlToDataFrame(nodes = getNodeSet(doc, "//category/parent_category_id"))), warning=function(w) FALSE, error=function(w) FALSE)

  print("########Sub-Categories########")
  print(xmlToDataFrame(nodes = getNodeSet(doc, "//childcategories/row")))

  print("########Series IDs########")
  print(xmlToDataFrame(nodes = getNodeSet(doc, "///childseries/row")))
       }

To use the function you pass your key, and the optional category ID. If you leave the category ID blank the function will return the top categories. For example:

key <- "[your key here]"
cat <- 40827

Then:

getCatEIA(key, cat)

returns

[1] "########Parent Category########"
   text
1 40203
[1] "########Sub-Categories########"
  category_id           name
1       40828           Real
2       40829 Current-dollar
[1] "########Series IDs########"
data frame with 0 columns and 0 rows

The result shows you the parent and sub categories. Note there are no series in the category. The series IDs are the identifiers for the actual data.

To see the root (top level directory) use:

getCatEIA(key)

which returns:

[1] "########Parent Category########"
[1] "########Sub-Categories########"
  category_id                            name
1       40203 State Energy Data System (SEDS)
2           0                     Electricity
3      456170                     Natural Gas
4      229973                       Petroleum
[1] "########Series IDs########"
data frame with 0 columns and 0 rows

To see series IDs we can choose:

getCatEIA(key, cat=40828)

which returns

[1] "########Parent Category########"
   text
1 40827
[1] "########Sub-Categories########"
data frame with 0 columns and 0 rows
[1] "########Series IDs########"
         series_id                                              name f
1  SEDS.GDPRX.OH.A                 Real gross domestic product, Ohio A
2  SEDS.GDPRX.MT.A              Real gross domestic product, Montana A
3  SEDS.GDPRX.NC.A       Real gross domestic product, North Carolina A

 units               updated 
 Million chained (2005) dollars 26-JUN-13 06.34.27 PM
 Million chained (2005) dollars 26-JUN-13 06.34.27 PM
 Million chained (2005) dollars 26-JUN-13 06.34.27 PM
< there are 52 total series here:  showing 3 for space>

You can use these series IDs (together with function I'll post shortly) to pull the data.

 

 

Email this to someoneShare on RedditTweet about this on Twitter

Leave a Reply

Your email address will not be published. Required fields are marked *