Using SQLite in R

Using SQLite in R

I received a ‘database-in-a-file’ from SQLite, a filename called uniprot_annotation.sqlite3. It contains around 6 million connection between uniprot genes and some pathways and the idea of using it, is to improve the access speed. To import it to R, first the packakge RSQLite needs to be installed:

install.packages("RSQLite")

Then we start by making the connection to the database

library("RSQLite")
con <-  dbConnect(SQLite(), dbname="uniprot_annotation.sqlite3")

If the given filename does not exist, a new database is created, otherwise the existing database is opened.
To show the tables inside this database connection, we can then type

dbListTables(con)

Assuming that there is a data table called uniprot2ko inside our database, one would copy the whole table to a R data frame by typing

uni2ko <- dbGetQuery( con,'select * from uniprot2ko' )

although this would be obviously a rather stupid thing to do, as we would loose then the speed we get in accessing the database with more defined queries. A more reasonable query would be e.g. to get all rows, where a certain field equals a certain term. Lets assume,w e want to get all uniprot IDs that are associated to a certain KEGG orthology (e.g. K00001), then we would type

K0001 <- dbGetQuery( con,'select * from uniprot2ko where ko="K00001"' )

Of course in the query command you can use now all types of different queries as known from SQL.

Once you do not need the SQLite connection anymore, it can be closed by typing

dbDisconnect(conn = con)

If the aim is to write an existing R data frame (e.g. called myData) into a SQLite table, one would do the following. First establish a connection to a new file, that does not exist on the hard drive

library("RSQLite")
con <- dbConnect(SQLite(), dbname="myData.sqlite3")

At that stage you can see that the file was created on the HDD, but is just 0 bytes large. Next, we need to populate the database by writing the data frame into it

dbWriteTable(con, "myDataDB", myData)

Here, myDataDB is the table of the new created database table and myData is the data frame that we want to write into it. From the HDD we can see then also that the file was populated and we can connect to it in the way, as described above.

Back to: Introduction to R – Overview

Leave a Reply