Relational databases in general

I have found the most useful way to store data is in a relational database, often called a relational database management system (RDBMS). A relational database stores data in tables, which are made up of fields and records:

A table:	five fields:
	Date	Rat	NumResponses	NumStimuli	NumReinforcements
one record:	17/2/00 12:29:00	M4	56	5	1
another:	17/2/00 14:37:06	M5	437	43	8
… and so on	17/2/00 12:54:00	M4	263	26	5

The driving principle behind a relational database is this: never duplicate data. Let's say our rats came from two groups, Sham and Lesion. If we wanted to record this in the database, so we could analyse data by group, we could store it like this:

Table BigData
Date	Rat	Group	NumResponses	NumStimuli	NumReinforcements
17/2/00 12:29:00	M4	sham	56	5	1
17/2/00 14:37:06	M5	lesion	437	43	8
17/2/00 12:54:00	M4	sham	263	26	5

However, this introduces two problems. Firstly, it generates very large tables. Secondly, and more importantly, it is unclear what to do if the data is inconsistent – let's say the underlined 'sham' was changed to 'lesion' by mistake. The database would then not know whether rat M4 was in the Sham or Lesion group – there would be entries for both. The solution to both problems is to create two tables, linked on the smallest possible unit of information (in this example, the rat name):

Date	Rat	NumResponses	NumStimuli	NumReinforcements
17/2/00 12:29:00	M4	56	5	1
17/2/00 14:37:06	M5	437	43	8
17/2/00 12:54:00	M4	263	26	5

Rat	Group
M4	sham
M5	lesion

By using the rat name as a key (also known as a foreign key), the database can link the two tables together whenever we want to know how many responses the two groups made on average.

When we want to find out that sort of information, we query the database, specifying how we want to see the data. We could, for example, obtain the following (ignoring a glaring scientific error!):

Group	NumberOfSubjects	MeanNumResponses	MeanNumStimuli	MeanNumReinforcements
sham	2	159.5	15.5	3
lesion	1	437	43	8

So relational databases split up the data (which should be entered in well-designed tables without any duplication of information) from queries that look at the data in an infinite variety of ways.

Microsoft Access 97 is a commonly-used relational database for PCs. It isn't perfect, by a long shot, but I've found it good enough. It supports structured query language (SQL) for designing queries; this is a powerful quasi-English language. For example, the query shown above would be written in SQL like this:

If you find all this a bit cryptic, Access also provides a graphical interface for designing queries.

Given a well-designed database, you should be able to get the data out in any conceivable way. The size of this manual doesn't permit a detailed look at relational database design or queries, but there are abundant sources. If you use Microsoft Access, there's the help system, but I also recommend Viescas JL (1997), Running Microsoft Access 97, Microsoft Press. Beyond that there is a whole field of database design.

I operate on the principle that any view of the data is achievable. If the graphical query design can't do it, you can use SQL. If SQL can't do it alone, you can use Visual Basic to augment it. If all that fails (and it hasn't failed me yet) you can always re-export the data and use a general-purpose programming language to analyse it. If the data's there, you can get at it.

One thing is worth noting: modern statistical packages (e.g. SPSS, http://www.spss.com/) are starting to support the ODBC standard for exchanging information with databases. You can set up database queries to create views of the data that your stats packages can use, then set up sequences of ODBC capture, analysis and graphical presentation in your stats package. Then whenever you import new data, you can run the entire analysis in a matter of seconds. If you handle large volumes of data, it easily repays the initial effort.