R: Input and output: scripts, saving and loading data (including database access)
http://www.psychol.cam.ac.uk/statistics/R/enteringdata.html

Index

  1. General file-handling commands
  2. Scripts
  3. Redirecting output
  4. Text files
  5. Microsoft Excel spreadsheets
  6. SPSS data
  7. ODBC data sources (databases)
  8. R native format
  9. Other data-moving techniques

General file-handling commands

A few file-handling commands may be useful:

setwd("c:/myfiles") # use / or \\ to separate directories under Windows (\\ becomes \ once processed through the escape character mechanism)
dir() # list the contents of the current directory

Scripts

Running scripts:

source("myfile.R") # load and execute a script of R commands

For a startup script, edit ".Rprofile" in your home directory (for details see ?Startup). Here's an example:

# RNC ~/.Rprofile

# auto width adjustment
.adjustWidth <- function(...){
       options(width=Sys.getenv("COLUMNS"))
       TRUE
}
.adjustWidthCallBack <- addTaskCallback(.adjustWidth)

.First <- function() cat("\n   Script ~/.Rprofile executed.\n\n")
.Last <- function()  cat("\n   Goodbye!\n\n")

Redirecting output

Redirecting output:

sink("myfile.txt") # redirect console output to a file
sink() # restore output to the screen

pdf("mygraph.pdf") # subsequent graphical output will go to a PDF
png("mygraph.png") # subsequent graphical output will go to a PNG
jpeg("mygraph.jpeg") # subsequent graphical output will go to a JPEG
bmp("mygraph.bmp") # subsequent graphical output will go to a BMP
postscript("mygraph.ps") # subsequent graphical output will go to a PostScript file
dev.off() # back to the screen

Text files (including comma-delimited value or CSV files)

my.data = read.csv(filename)
my.data = read.csv(file.choose())
# Note: (1) = and <- are synonymous, and are the assignment operator (while == tests for equality)
#       (2) file.choose() pops up a live filename picker
#       (3) The default is to assume a header row with variable names (header=TRUE),
#           and no row names, but you can change all these defaults (e.g. row.names=1 reads
#           row names from the first column).

attach(my.data) # you might then want to attach the new data to the path, though this is optional

write.csv(my.data, filename2) # Write the data to a new file. There are several options available; see the help (use ?write.csv)
write.csv(my.data, file="d:/temp/newfile.csv", row.names=FALSE) # Here's one: turn off row names to avoid creating a spurious additional column.

read.table(...)  # } A more generic way to read/write tabular data from/to disk
write.table(...) # } (read.csv and write.csv are specialized versions of read.table and write.table)

In the R Commander, you can use Data / Import data / from text file or clipboard, and, having selected a data set, Data / Active data set / Export active data set.

Microsoft Excel spreadsheets

There are several ways to read from Excel spreadsheets. I find this easy:

library(RODBC)
channel <- odbcConnectExcel("Osteomalacia_data.xls") # specify the filename
patientdata <- sqlFetch(channel, "Vitamin_D_levels") # specify a sheet within the spreadsheet
indexcasedata <- sqlFetch(channel, "Sheet2") # by default Excel names individual sheets Sheet1, Sheet2, ..., though you may have renamed them something more informative
odbcClose(channel)

But in a Linux environment, the Windows ODBC settings etc. probably won't work out of the box, so this works well:

library(gdata)
data <- read.xls("myexceldata.xls", sheet=1) # load the first worksheet

SPSS data

Very easy:

library(foreign)
mydata <- data.frame(read.spss("filename.sav"))
# Remember you can also use file.choose() in place of the filename, as above.

For saving in SPSS format from R, see extensions.

ODBC data sources (and other database connections)

There are special mechanisms to support Excel files (see above) and Microsoft Access databases, but in general any database can be accessed via its ODBC interface. To use this, the actual database in question (e.g. "C:\MyExperiment\MonkeyCantab_database_7.mdb") needs to be set up via the Windows ODBC Manager (Data Source Administrator) tool, giving that particular database a data source name (DSN), such as "MonkeyCantabDB7". This is then used by all programs that request data via ODBC. Like this:

# 1. Connect
library(RODBC)
channel <- odbcConnect("my_DSN") # specify your DSN here
# if you need to specify a username/password, use:
#  channel <-odbcConnect("mydsn", uid="username", pwd="password")

# 2. List all tables
sqlTables(channel)

# 3. Fetch a whole table into a data frame
mydataframe <- sqlFetch(channel, "my_table_name") # fetch a table from the database in its entirety
close(channel)

# 4. Fetch the results of a query into a data frame. Example:
mydf2 <- sqlQuery(channel, "SELECT * FROM MonkeyCantab_LOOKUP_TaskTypes WHERE TaskType < 6")

If you're using MySQL, you can talk to the database directly:

library(RMySQL) # use install.packages("RMySQL") if this produces an error
# if the install.packages() command produces an error, under Ubuntu:
# use "sudo apt-get install libmysql++-dev" (in addition to MySQL itself, i.e. the
# "mysql-server mysql-client mysql-navigator mysql-admin" packages)
con <- dbConnect(MySQL(), host="localhost", port=3306, dbname="mydatabase", user="myuser", password="mypassword")
dbListTables(con)
dbListFields(con, "table_name")
d <- dbReadTable(con, "table_name")
e <- dbGetQuery(con, "SELECT COUNT(*) FROM table_name")
# and much more possible

R native format

As you'd expect, the easiest:

save(myobject1, myobject2, ..., file="D:/temp/mydata.rda")
load(file="D:/temp/mydata.rda")
# note that the load command recreates the "mydata" object without prompting
# you can also use save.image() to save a whole workspace

In the R Commander, you can use Data / Load data set, and, having selected a data set, Data / Active data set / Save active data set.

Other data-moving techniques

To export the definition of an R object (which you can then re-import using "object = THISTHING"):

dput(object, "")

To read a tabular object with a header row from the clipboard (see here):

object = read.table("clipboard", header=T)
Valid HTML 4.01 Transitional
Valid CSS