|
Techniques — Database basics
|
Back to techniques
https://rcardinal.ddns.net/techniques/linuxnet.html
|
Based on Ubuntu 10.04 (a variety of Debian Linux).
sudo apt-get install mysql-server mysql-client mysql-navigator mysql-admin
It'll ask for a root password during installation.
Configuration is in /etc/mysql/my.cnf (and /etc/mysql/conf.d/). Logs are in /var/log/mysql/. Data is by default in /var/lib/mysql/ (so a database called "xxx" becomes a directory of that name there, probably with data elsewhere in /var/lib/mysql/ too).
ssh server -L 3306:localhost:3306, and leave the SSH client running once logged in)...hostname=localhost, port=3306, username=databaseuser, password=databasepassword.
hostname=yourserver, port=22, username=unixuser, method=password, password=unixpassword).Logging in and saying hello:
# log in, prompting for a password (you don't need to specify the --host or --port options for a local login with default settings) # Also: -u and -p are shorthand for --user and --password: mysql --host=127.0.0.1 --port=3306 --user=root --password
# Now, let's enter some SQL commands: SHOW DATABASES; USE database; SHOW TABLES; SHOW CREATE TABLE table; SELECT * FROM table; # etc... help help contents status quit
Creating a database and changing the root password:
CREATE DATABASE newdb; GRANT ALL ON newdb.* TO root@'myhostname' IDENTIFIED BY 'newpassword';
To dump all databases:
mysqldump -u username -ppassword --all-databases > backupfile.sql
To dump one database:
mysqldump -u username -ppassword databasename > backupfile.sql
As above, can use just -p for a live prompt for the password.
To import from a multi-database dump (which will contain CREATE DATABASE commands):
mysql -u username -p[password] < backupfile.sql
To import from a single-database dump (which will not):
mysql -u username -p[password] databasename < backupfile.sql
mysqldump -u username -ppassword db1name > db.sql echo "CREATE DATABASE db2name;" | mysql -u username -ppassword mysql -u username -ppassword db2name < db.sql
See SQLZoo.net, an excellent interactive SQL tutorial.
In MySQL, you can add user-defined functions, including user-defined aggregate functions (aggregate UDFs), such as MEDIAN(). See http://mysql-udf.sourceforge.net/; http://sourceforge.net/projects/mysql-udf/forums/forum/390115/topic/3698697; http://www.mooreds.com/wordpress/archives/376.
Specifically, that MEDIAN() one... download udf_median.cc. Then do this:
gcc -I /usr/include/mysql -I /usr/include -o udf_median.o -c udf_median.cc # this will need the MySQL header files... on Ubuntu, probably from package "libmysqlclient-dev" ld -shared -o udf_median.so udf_median.o sudo cp udf_median.so /usr/lib/mysql/plugin/ mysql [options]
then im MySQL:
CREATE AGGREGATE FUNCTION median RETURNS REAL SONAME 'udf_median.so';
If you need to drop it:
DROP FUNCTION median;
Now you can use it:
SELECT MEDIAN(field) FROM TABLE... ;
The most supported ODBC standard on Linux, it seems.
# install unixodbc sudo apt-get install unixodbc # print config information odbcinst -j # query the drivers installed odbcinst -q -d # query the data sources available odbcinst -q -s
# http://www.unixodbc.org/unixODBCsetup.html sudo apt-get install libmyodbc sudo ODBCConfig & Drivers > Add Name = myodbc Description = MySQL ODBC (myodbc) driver Driver = /usr/lib/odbc/libmyodbc.so Driver64 = Setup = /usr/lib/odbc/libodbcmyS.so Setup64 = UsageCount = [... leave... is 1] CPTimeout = CPReuse =
... which writes this stuff to /etc/odbcinst.ini
Now a specific data source:
ODBCConfig & User DSN > Add > pick the "myodbc" driver specify server (e.g. 127.0.0.1), port (e.g. 3306), database - plus a short name and description
ODBCConfig writes to ~/.odbc.ini — or sudo it for system-wide DSNs, in /etc/odbc.ini.
Specifying a source manually in these files:
[tim1] Description = Tim Composite Exp1 Driver = myodbc Server = 127.0.0.1 Port = 3306 Database = junk1 User = XXX_MYSQLUSER Password = XXX_MYSQLPASSWORD
This is not a terribly advanced connection. Don't use it for anything serious; convert the .MDB to another database instead (see below).
First, add the driver:
# http://mdbtools.sourceforge.net/ sudo apt-get install mdbtools libmdbtools libmdbodbc sudo ODBCConfig & Drivers > Add Name = mdb Description = MDB Tools (Microsoft Access format) ODBC driver Driver = /usr/lib/libmdbodbc.so.0 Driver64 = Setup = Setup64 = UsageCount = [... leave... is 1] CPTimeout = CPReuse =
Now a specific data source. ODBCConfig doesn't like it (as there's no Setup library; it says "cannot construct a property list"). So edit ~/.odbc.ini (or /etc/odbc.ini for a system-wide data source), e.g.
[tim1] Description = Tim Composite Exp1 Driver = mdb Database = /home/rudolf/tmp/Tim_Composite_Exp1.mdb
sudo perl -MCPAN -e shell; press Enter for all the defaults and pick a nearby CPAN mirror; the first time round (only) use o conf commit to write the config file to disk, as suggested; then e.g. install DBI to install the DBI module; then q to quit; or (2) sudo perl -MCPAN -e "install DBI" to do it all in one step.sudo apt-get install python-mysqldb # how would we find the name of the package that supplies PyODBC? aptitude search pyodbc # now install it sudo apt-get install python-pyodbc
# 1. Connect
library(RODBC)
channel <- odbcConnect("my_DSN") # specify your DSN here
# if you need to specify a username/password, use:
# channel <-odbcConnect("mydsn", uid="username", pwd="password")
# 2. List all tables
sqlTables(channel)
# 3. Fetch a whole table into a data frame
mydataframe <- sqlFetch(channel, "my_table_name") # fetch a table from the database in its entirety
close(channel)
# 4. Fetch the results of a query into a data frame. Example:
mydf2 <- sqlQuery(channel, "SELECT * FROM MonkeyCantab_LOOKUP_TaskTypes WHERE TaskType < 6")
or talk to MySQL directly: http://cran.r-project.org/web/packages/RMySQL/RMySQL.pdf. The native connection native connection is likely to be faster: http://mailman.unixodbc.org/pipermail/unixodbc-support/2005-March/000507.html.
#!/usr/bin/python2.7
# Converts a .MDB file (Microsoft Access) database to MySQL, copying structure and data.
# Uses the mdb-tools package.
# Under ubuntu, the following packages are required:
# sudo apt-get install python2.7 mdbtools mysql-server mysql-client mysql-admin
# though you may also want these, if you're planning to use Python/ODBC with MySQL:
# sudo apt-get install mysql-navigator libmdbtools libmdbodbc unixodbc python-mysqldb python-pyodbc
# What works:
# schema copied
# data copied
# What doesn't work:
# indexes are not described by mdb-schema, so these must be recreated manually
# relationships are not supported by mdb-schema, so these must be recreated manually
# We'll do this with calls to other command-line tools.
# See http://nialldonegan.me/2007/03/10/converting-microsoft-access-mdb-into-csv-or-mysql-in-linux/
# REVISED 1 Jan 2013: mdb-schema syntax has changed (-S option gone)
# SEE https://github.com/brianb/mdbtools
import sys, getpass, shlex, subprocess, re, os
def raw_default(prompt, dflt=None):
prompt = "%s [%s]: " % (prompt, dflt)
res = raw_input(prompt)
if not res and dflt:
return dflt
return res
def get_external_command_output(command):
args = shlex.split(command)
ret = subprocess.check_output(args) # this needs Python 2.7 or higher
return ret
def get_pipe_series_output(commands, stdinput=None):
# Python arrays indexes are zero-based, i.e. an array is indexed from 0 to len(array)-1.
# The range/xrange commands, by default, start at 0 and go to one less than the maximum specified.
# print commands
processes = []
for i in xrange(len(commands)):
if (i==0): # first processes
processes.append( subprocess.Popen( shlex.split(commands[i]), stdin=subprocess.PIPE, stdout=subprocess.PIPE) )
else: # subsequent ones
processes.append( subprocess.Popen( shlex.split(commands[i]), stdin=processes[i-1].stdout, stdout=subprocess.PIPE) )
return processes[ len(processes)-1 ].communicate(stdinput)[0] # communicate() returns a tuple; 0=stdout, 1=stderr; so this returns stdout
def replace_type_in_sql(sql, fromstr, tostr):
whitespaceregroup = "([\ \t\n]+)"
whitespaceorcommaregroup = "([\ \t\),\n]+)"
rg1 = "\g<1>"
rg2 = "\g<2>"
return re.sub(whitespaceregroup + fromstr + whitespaceorcommaregroup, rg1 + tostr + rg2, sql, 0, re.MULTILINE | re.IGNORECASE)
if len(sys.argv) != 2: # the program name is one of these
sys.exit("Syntax: convert_mdb_to_mysql.py mdbfile")
mdbfile = sys.argv[1]
tempfile = raw_default("Temporary filename", "TEMP.txt")
host = raw_default("MySQL hostname", "127.0.0.1") # not "localhost"
port = raw_default("MySQL port number", 3306)
user = raw_default("MySQL username", "root")
password = getpass.getpass("MySQL password: ")
mysqldb = raw_input("MySQL database to create: ")
print "Getting list of tables"
tablecmd = "mdb-tables -1 "+mdbfile
# -1: one per line (or table names with spaces will cause confusion)
tables = get_external_command_output(tablecmd).splitlines()
print tables
print "Creating new database"
createmysqldbcmd = "mysqladmin create %s --host=%s --port=%s --user=%s --password=%s" % (mysqldb, host, port, user, password)
# we could omit the actual password and the user would be prompted, but we need to send it this way later (see below), so this is not a huge additional security weakness!
# Linux/MySQL helpfully obscures the password in the "ps" list.
print get_external_command_output(createmysqldbcmd)
print "Shipping table definitions (sanitized), converted to MySQL types, through some syntax filters, to MySQL"
schemacmd="mdb-schema "+mdbfile+" mysql"
# JAN 2013: Since my previous script, mdb-schema's mysql dialect has got much better.
# Now convert the oddities that emerge:
schemasyntax = get_external_command_output(schemacmd)
# The following presupposes that no fields actually have these names (which are reserved, so they shouldn't!).
# Access data types: http://www.databasedev.co.uk/fields_datatypes.html
# An Access "Long Integer" is 4 bytes.
# There's no Access 8-byte integer (which is a BIGINT under MySQL: http://dev.mysql.com/doc/refman/5.0/en/numeric-types.html ).
#schemasyntax = replace_type_in_sql(schemasyntax, "Text", "VARCHAR")
#schemasyntax = replace_type_in_sql(schemasyntax, "Byte", "INT")
#schemasyntax = replace_type_in_sql(schemasyntax, "Long Integer", "INT")
#schemasyntax = replace_type_in_sql(schemasyntax, "Integer", "INT") # put this after "Long Integer"
#schemasyntax = replace_type_in_sql(schemasyntax, "Single", "FLOAT")
#schemasyntax = replace_type_in_sql(schemasyntax, "Double", "FLOAT")
#schemasyntax = replace_type_in_sql(schemasyntax, "Replication ID", "NUMERIC (16)")
#schemasyntax = replace_type_in_sql(schemasyntax, "DateTime \(Short\)", "DATETIME")
#schemasyntax = replace_type_in_sql(schemasyntax, "Currency", "FLOAT")
#schemasyntax = replace_type_in_sql(schemasyntax, "Boolean", "BOOLEAN") # MySQL: BOOLEAN is a synonym for TINYINT(1)
#schemasyntax = replace_type_in_sql(schemasyntax, "OLE", "VARCHAR")
#schemasyntax = replace_type_in_sql(schemasyntax, "Memo/Hyperlink", "TEXT")
#schemasyntax = re.sub("^--", "#", schemasyntax, 0, re.MULTILINE)
# mdb-schema uses "---------" for some of its comment lines;
# MySQL only permits "-- " (with a space) as the start of a comment, or "#": http://dev.mysql.com/doc/refman/5.0/en/ansi-diff-comments.html .
# one of many alternatives as a pipe filter would be:
# "perl -pe 's/^--/#/'"
#schemasyntax = re.sub("^DROP.*$", "", schemasyntax, 0, re.MULTILINE)
# we're creating a new database, so we don't need DROP statements, and they add to danger if the user specifies an existing database
# an alternative as a pipe filter would be:
# "grep -v '^DROP'"
#schemasyntax = re.sub("([\ \t]+)group([\ \t\),]+)", "\g<1>_group\g<2>", schemasyntax, 0, re.MULTILINE | re.IGNORECASE)
# Access allows "Group" as a table/field name; MySQL doesn't. See
# sed regular expressions are mostly documented in "man grep"
# perl regular expressions: see http://www.troubleshooters.com/codecorn/littperl/perlreg.htm
# ... or "sudo apt-get install perl-doc" then "man perl", "perldoc perlretut", and "perldoc perlrequick"
# ... obviously, doing these replacements in python would also be an option!
# This filter replaces "group" with "_group" in all relevant output from mdb-schema and mdb-export
# Use \1, \2... within the SAME regexp; in a replace expression, use $1, $2...
# In Python, best syntax for backreferences in replacement text is \g<1>, \g<2> and so on.
# (In Perl, \1 and \2 are backreferences within the same regexp, while $1, $2 are used in replace operations.)
# Alternative as a pipe filter would be:
# "perl -pe 's/([\ \t]+)group([\ \t\),]+)/$1_group$2/gi'"
# ... no, correctly quoted as `group`, it's fine
# "COMMENT ON COLUMN" produced by mdb-schema and rejected by MySQL:
schemasyntax = re.sub("^COMMENT ON COLUMN.*$", "", schemasyntax, 0, re.MULTILINE)
print "-----------------"
print schemasyntax
print "-----------------"
mysqlcmd = "mysql --host=%s --port=%s --database=%s --user=%s --password=%s" % (host, port, mysqldb, user, password) # regrettably we need the password here, as stdin will be coming from a pipe
# print schemasyntax
print get_pipe_series_output( [mysqlcmd], schemasyntax )
# For the data, we won't store the intermediate stuff in Python's memory, 'cos it's vast; I had one odd single-character mutation
# from "TimeInSession_ms" to "TimeInSession_mc" at row 326444 (perhaps therefore 37Mb or so into a long string).
# And I was trying to export ~1m records in that table alone.
# We'll use pipes instead and let the OS deal with the memory management.
# ... BUT (Jan 2013): now mdb-tools is better, text-processing not necessary - can use temporary disk file
# Turns out the bottleneck is the import to MySQL, not the export from MDB. So see http://dev.mysql.com/doc/refman/5.5/en/optimizing-innodb-bulk-data-loading.html
# The massive improvement is by disabling autocommit. (Example source database is 208M; largest table here is 554M as a textfile; it has 1,686,075 rows.)
# This improvement was from 20 Hz to the whole database in a couple of minutes (~13 kHz).
# Subsequent export from MySQL: takes a second or two to write whole DB (177M textfile).
print "Copying data to MySQL"
#semicolonfilter = "sed -e 's/)$/)\;/'"
#groupfilter = "perl -pe 's/([\ \t]+)group([\ \t\),]+)/$1_group$2/gi'"
for t in tables:
print "Processing table", t
#exportcmd = "mdb-export -I mysql -D \"%Y-%m-%d %H:%M:%S\" " + mdbfile + " " + t
# -I backend: INSERT statements, not CSV
# -D: date format
# MySQL's DATETIME field has this format: "YYYY-MM-DD HH:mm:SS"
# so we want this from the export
#print get_pipe_series_output( [exportcmd, semicolonfilter, groupfilter, mysqlcmd] )
#print get_pipe_series_output( [exportcmd, mysqlcmd] )
os.system('echo "SET autocommit=0;" > ' + tempfile)
exportcmd = 'mdb-export -I mysql -D "%Y-%m-%d %H:%M:%S" ' + mdbfile + ' "' + t + '" >> ' + tempfile
os.system(exportcmd)
os.system('echo "COMMIT;" >> ' + tempfile)
importcmd = mysqlcmd + " < " + tempfile
os.system(importcmd)
print "Finished."