Relational databases in general

Top  Previous 

I have found the most useful way to store data is in a relational database, often called a relational database management system (RDBMS). A relational database stores data in tables, which are made up of fields and records:

 

A table:

five fields:


Date


Rat


NumResponses


NumStimuli


NumReinforcements


one record:

17/2/00 12:29:00

M4

56

5

1

another:

17/2/00 14:37:06

M5

437

43

8

… and so on

17/2/00 12:54:00

M4

263

26

5

 

The driving principle behind a relational database is this: never duplicate data. Let's say our rats came from two groups, Sham and Lesion. If we wanted to record this in the database, so we could analyse data by group, we could store it like this:

 

Table BigData

Date


Rat


Group


NumResponses


NumStimuli


NumReinforcements


17/2/00 12:29:00

M4

sham

56

5

1

17/2/00 14:37:06

M5

lesion

437

43

8

17/2/00 12:54:00

M4

sham

263

26

5

 

However, this introduces two problems. Firstly, it generates very large tables. Secondly, and more importantly, it is unclear what to do if the data is inconsistent – let's say the underlined 'sham' was changed to 'lesion' by mistake. The database would then not know whether rat M4 was in the Sham or Lesion group – there would be entries for both. The solution to both problems is to create two tables, linked on the smallest possible unit of information (in this example, the rat name):

 

Table Responses

Date


Rat


NumResponses


NumStimuli


NumReinforcements


17/2/00 12:29:00

M4

56

5

1

17/2/00 14:37:06

M5

437

43

8

17/2/00 12:54:00

M4

263

26

5

 

Table Groups

Rat


Group


M4

sham

M5

lesion

 

By using the rat name as a key (also known as a foreign key), the database can link the two tables together whenever we want to know how many responses the two groups made on average.

 

When we want to find out that sort of information, we query the database, specifying how we want to see the data. We could, for example, obtain the following (ignoring a glaring scientific error!):

 

Query AverageByGroups

Group


NumberOfSubjects


MeanNumResponses


MeanNumStimuli


MeanNumReinforcements


sham

2

159.5

15.5

3

lesion

1

437

43

8

 

Summary of database principles

 

So relational databases split up the data (which should be entered in well-designed tables without any duplication of information) from queries that look at the data in an infinite variety of ways.

 

A concrete example: Microsoft Access 97

 

Microsoft Access 97 is a commonly-used relational database for PCs. It isn't perfect, by a long shot, but I've found it good enough. It supports structured query language (SQL) for designing queries; this is a powerful quasi-English language. For example, the query shown above would be written in SQL like this:

 

SELECT group,

   count(*) as NumberOfRats,

   avg(NumResponses) as MeanNumResponses,

   avg(NumStimuli) as MeanNumStimuli,

   avg(NumReinforcements) as MeanNumReinforcements

FROM responses, groups

WHERE responses.rat = groups.rat

GROUP BY group

;

 

If you find all this a bit cryptic, Access also provides a graphical interface for designing queries.

 

Getting data out of a database

 

Given a well-designed database, you should be able to get the data out in any conceivable way. The size of this manual doesn't permit a detailed look at relational database design or queries, but there are abundant sources. If you use Microsoft Access, there's the help system, but I also recommend Viescas JL (1997), Running Microsoft Access 97, Microsoft Press. Beyond that there is a whole field of database design.

 

Tip

 

Tip_Hand

I operate on the principle that any view of the data is achievable. If the graphical query design can't do it, you can use SQL. If SQL can't do it alone, you can use Visual Basic to augment it. If all that fails (and it hasn't failed me yet) you can always re-export the data and use a general-purpose programming language to analyse it. If the data's there, you can get at it.

 

One thing is worth noting: modern statistical packages (e.g. SPSS, http://www.spss.com/) are starting to support the ODBC standard for exchanging information with databases. You can set up database queries to create views of the data that your stats packages can use, then set up sequences of ODBC capture, analysis and graphical presentation in your stats package. Then whenever you import new data, you can run the entire analysis in a matter of seconds. If you handle large volumes of data, it easily repays the initial effort.