Relational databases in general |
Top Previous |
I have found the most useful way to store data is in a relational database, often called a relational database management system (RDBMS). A relational database stores data in tables, which are made up of fields and records:
The driving principle behind a relational database is this: never duplicate data. Let's say our rats came from two groups, Sham and Lesion. If we wanted to record this in the database, so we could analyse data by group, we could store it like this:
However, this introduces two problems. Firstly, it generates very large tables. Secondly, and more importantly, it is unclear what to do if the data is inconsistent – let's say the underlined 'sham' was changed to 'lesion' by mistake. The database would then not know whether rat M4 was in the Sham or Lesion group – there would be entries for both. The solution to both problems is to create two tables, linked on the smallest possible unit of information (in this example, the rat name):
Table Responses
Table Groups
By using the rat name as a key (also known as a foreign key), the database can link the two tables together whenever we want to know how many responses the two groups made on average.
When we want to find out that sort of information, we query the database, specifying how we want to see the data. We could, for example, obtain the following (ignoring a glaring scientific error!):
Query AverageByGroups
Summary of database principles
So relational databases split up the data (which should be entered in well-designed tables without any duplication of information) from queries that look at the data in an infinite variety of ways.
A concrete example: Microsoft Access 97
Microsoft Access 97 is a commonly-used relational database for PCs. It isn't perfect, by a long shot, but I've found it good enough. It supports structured query language (SQL) for designing queries; this is a powerful quasi-English language. For example, the query shown above would be written in SQL like this:
SELECT group, count(*) as NumberOfRats, avg(NumResponses) as MeanNumResponses, avg(NumStimuli) as MeanNumStimuli, avg(NumReinforcements) as MeanNumReinforcements FROM responses, groups WHERE responses.rat = groups.rat GROUP BY group ;
If you find all this a bit cryptic, Access also provides a graphical interface for designing queries.
Getting data out of a database
Given a well-designed database, you should be able to get the data out in any conceivable way. The size of this manual doesn't permit a detailed look at relational database design or queries, but there are abundant sources. If you use Microsoft Access, there's the help system, but I also recommend Viescas JL (1997), Running Microsoft Access 97, Microsoft Press. Beyond that there is a whole field of database design.
Tip
I operate on the principle that any view of the data is achievable. If the graphical query design can't do it, you can use SQL. If SQL can't do it alone, you can use Visual Basic to augment it. If all that fails (and it hasn't failed me yet) you can always re-export the data and use a general-purpose programming language to analyse it. If the data's there, you can get at it.
One thing is worth noting: modern statistical packages (e.g. SPSS, http://www.spss.com/) are starting to support the ODBC standard for exchanging information with databases. You can set up database queries to create views of the data that your stats packages can use, then set up sequences of ODBC capture, analysis and graphical presentation in your stats package. Then whenever you import new data, you can run the entire analysis in a matter of seconds. If you handle large volumes of data, it easily repays the initial effort.
|