I had a lot of data in a table in an Excel spreadsheet file and I was planning to use Haskell to analyse it. The data would be better structured as a database. Microsoft Office 365 provides Access, a database management system (DBMS). Could Haskell access data in an Access .accdb
file? The short answer was ‘Yes’, but it was not easy.
OBDC
Open Database Connectivity (ODBC) is an application programming interface (API) for database access. Windows includes OBDC drivers for .accdb
data sources.
The Microsoft application ODBC Data Source Administrator manages database drivers and data sources. On 64-bit Windows there is a 32-bit version and a 64-bit version. I used the 32-bit version to create a ‘User’ data source named DbExample
with driver Microsoft Access Driver (*.mdb, *.accdb)
and linked to a test database file DbExample.accdb
.
Haskell package HDBC-odbc
provides an ODBC backend for package HDBC
(Haskell Database Connectivity).
I compiled a small test executable as ODBCTest.exe
, based on the ‘getting started’ advice for HDBC-odbc
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
module Main where -- package HDBC import Database.HDBC (disconnect, hdbcClientVer, hdbcDriverName, rollback) -- package HDBC-odbc import Database.HDBC.ODBC (connectODBC) main :: IO () main = do conn <- connectODBC "DSN=DbExample" putStrLn $ "Driver name: " ++ hdbcDriverName conn putStrLn $ "Client version: " ++ hdbcClientVer conn rollback conn disconnect conn |
The executable did not work, failing with the error message:
ODBCTest.EXE: SqlError {seState = "[\"IM014\"]", seNativeError = -1, seErrorMsg = "connectODBC/sqlDriverConnect: [\"0: [Microsoft][ODBC Driver Manager] The specified DSN contains an architecture mismatch between the Driver and Application\"]"}
32-bit and 64-bit
On my 64-bit Windows machines, Office 365 (including Access) comprised 32-bit applications and only the 32-bit version of the ODBC driver was available. GHC for 64-bit Windows produces 64-bit applications. Consequently, the architecture of the driver (32-bit) and the application (64-bit) did not match.
I uninstalled all of the 32-bit Office 365 applications (present for a variety of languages) and installed 64-bit Office 365.
Installing 64-bit Access (Click-to-Run) did not, however, yield a 64-bit version of the OBDC driver in the 64-bit ODBC Data Source Administrator. This appeared to be a known issue. Installing the 64-bit version of the Microsoft Access Database Engine 2016 Redistributable caused the driver to be listed.
(On a machine with 32-bit Access, installing the 64-bit version of the Microsoft Access Database Engine 2016 Redistributable caused the driver to be listed but then 32-bit Access would no longer work.)
SQLGetInfo
The ODBC API provides a function SQLGetInfo
. Microsoft’s documentation for the function states that the BufferLength
argument is ignored if the InfoType
argument implies that the InfoValuePtr
will point to a buffer containing a SQLUSMALLINT
value. That appears to be what the authors of HDBC-odbc
, version 2.6.0.0, assumed.
However, the test executable failed with another error message:
ODBCTest.EXE: SqlError {seState = "[]", seNativeError = -1, seErrorMsg = "sqlGetInfo SQL_TXN_CAPABLE: []"}
Derrick Turk has identified that the Microsoft Access Driver fails if BufferLength
is not set to the size of a SQLUSMALLINT
value.
Following Mr Turk’s example, I created a fork of package HDBC-odbc
that set BufferLength
and edited my stack.yaml
file to include the local corrected version under packages:
.
The test executable worked as expected:
1 2 3 |
>stack exec ODBCTest Driver name: odbc Client version: 03.80.0000 |
SQLTables
The ODBC API provides a function SQLTables
, which returns (among others things) the list of table names and table types stored in a data source. Table types may include TABLE, VIEW, SYSTEM TABLE, GLOBAL TEMPORARY, LOCAL TEMPORARY, SYNONYM and ALIAS.
Module Database.HDBC.Types
exports class IConnection
which provides method getTables :: conn -> IO [String]
. This method promises to provide the names of all tables accessible by the connection, excluding system tables.
Module Database.HDBC.ODBC
exports data type Connection
as an instance of IConnection
. getTables :: Connection -> IO [String]
is implemented through:
1 |
SQLTables (stmt, NULL, 0, NULL, 0, (SQLCHAR *)"%", 1, (SQLCHAR *)"TABLE", 5) |
I extended the constructor of Connection
to add getTableTypes :: Connection -> IO [String]
and getTablesOfType :: Connection -> String -> IO [String]
. The former is implemented through a call to the C function:
1 2 |
SQLTables (stmt, (SQLCHAR *)"", 0, (SQLCHAR *)"", 0, (SQLCHAR *)"", 0, (SQLCHAR *)"%", 1) |
The latter is implemented through the C function, where tabletype
points to a comma-delimited list of valid table types:
1 2 3 4 |
SQLRETURN simpleSqlTables(SQLHSTMT stmt, SQLCHAR *tabletype, SQLSMALLINT ttlen) { return SQLTables(stmt, NULL, 0, NULL, 0, (SQLCHAR *)"%", 1, tabletype, ttlen); } |
I compiled another small test executable:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
module Main where import Database.HDBC import Database.HDBC.ODBC import Data.List (intercalate) main :: IO () main = do conn <- connectODBC "DSN=DbExample" tts <- getTableTypes conn putStrLn $ "All table type: " ++ show tts ts <- getTablesOfType conn (intercalate "," tts) putStrLn $ "All types of tables: " ++ show ts rollback conn disconnect conn |