Install Impala ODBC for Cloudera on Arch Linux

For a project, I had to automate requests to a Impala database hosted on a Cloudera VM and I had to install an OCDB for this, but it wasn't so simple to do. I got problems to configure correctly unixodbc and afterwards to install the driver which is not available on Arch. I detail in this article how I manage the problems and what are the solutions I found. If you are in a hurry, there is a TL;DR at the end.

Process

The first issue I encountered was that the drivers given by Cloudera are not compatible for Arch Linux. They are only available for Red Hat, Debian and Suse. So, the first thing to do was to install debtap, a little program to convert.deb packages to Arch Linux packages. Once done, I could convert the Cloudera driver for Debian to an Arch package. This is not very difficult. However, it seems it's not perfectly done because I had issues when trying to connect to the database with this driver afterwards.

After using a ldd /opt/cloudera/impalaodbc/lib/64/libclouderaimpalaodbc64.so I could see that the libsasl2.so.2 was not found. And this library should be installed with the cyrus-sasl package. So I should have needed to add the package to the Cloudera package I just converted. In fact, I just installed the package manually. But that is not finished because, furthermore, after installing the cyrus-sasl package, I still didn't had the libsasl2.so.2 library. Accordingly, I did a bad thing which was to create libsasl2.so.2 from libsasl2.so.3 because I had this last library. And it worked.

sudo cp /usr/lib/libsasl2.so.3 /usr/lib/libsasl2.so.2 

Okay, but now, I had only installed the driver. Before that, I had to install unixodbc which is needed to effectively use an ODBC driver. Unixodbc needs to be configured according to your odbc drivers and according to the databases on which you would like to connect. In my case, I wished to connect to an Impala database. Thanks to @manuel_lemaire who spent a day and a half configuring it correctly, I could use his configurations and manage to make it work in about one hour. Below are the configuration files we used (you need to create manually the cloudera.impalaodbc.ini file ) :

/etc/odbcinst.ini

[ODBC Drivers]

Impala=Installed  

[Impala]

Description=Cloudera Impala ODBC Driver (64-bit)

Driver = /opt/cloudera/impalaodbc/lib/64/libclouderaimpalaodbc64.so

Don't forget to put your credentials into the configuration file below:

/etc/odbc.ini

[ODBC Data Sources]

Impala=Cloudera Impala ODBC Driver 64-bit  

[Impala]

Driver=/opt/cloudera/impalaodbc/lib/64/libclouderaimpalaodbc64.so

HOST=localhost

PORT=21050

UID=_YOUR-USERNAME_

PWD=_YOUR-PASSWORD_

DATABASE=_YOUR-DATABASE-NAME_

/etc/cloudera.impalaodbc.ini

[Driver]

## - Note that this default DriverManagerEncoding of UTF-32 is for iODBC.

## - unixODBC uses UTF-16 by default.

## - If unixODBC was compiled with -DSQL_WCHART_CONVERT, then UTF-32 is the correct value.

##   Execute 'odbc_config --cflags' to determine if you need UTF-32 or UTF-16 on unixODBC

## - SimbaDM can be used with UTF-8 or UTF-16.

##   The DriverUnicodeEncoding setting will cause SimbaDM to run in UTF-8 when set to 2 or UTF-16 when set to 1.

DriverManagerEncoding=UTF-32

ErrorMessagesPath=/opt/cloudera/impalaodbc/ErrorMessages/

LogLevel=0

LogPath=

SwapFilePath=/tmp

## - Uncomment the ODBCInstLib corresponding to the Driver Manager being used.

## - Note that the path to your ODBC Driver Manager must be specified in LD_LIBRARY_PATH (LIBPATH for AIX).

## - Note that AIX has a different format for specifying its shared libraries.

# Generic ODBCInstLib

#   iODBC

ODBCInstLib=libiodbcinst.so

#   SimbaDM / unixODBC

#ODBCInstLib=libodbcinst.so

# AIX specific ODBCInstLib

#   iODBC

#ODBCInstLib=libiodbcinst.a(libiodbcinst.so.2)

#   SimbaDM

#ODBCInstLib=libodbcinst.a(odbcinst.so)

#   unixODBC

#ODBCInstLib=libodbcinst.a(libodbcinst.so.1)

Alright, almost done. We just needed to set up some environment variables and to add them to our bashrc or zshrc, whatever we are using. We needed to add these following lines:

~/.bashrc

export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu/odbc"

export ODBCINI="/etc/odbc.ini"

export ODBCSYSINI="/etc"

export CLOUDERAIMPALAINI="/opt/cloudera/impalaodbc/lib/64/cloudera.impalaodbc.ini"

export LD_PRELOAD="/usr/lib/libodbcinst.so"

Caution: the LD_PRELOAD line is different for Debian which is:

export LD_PRELOAD="/usr/lib/x86_64-linux-gnu/libodbcinst.so"

Summary / TL;DR

Well, I hope you understood my process of trying to make it work. To enumerate in the right order:

  1. Install unixodbc
  2. Add the 3 configuration files and the environment variables above
  3. Download the Cloudera Impala ODBC Driver for Debian
  4. Convert it to Arch Linux package with debtap
  5. Add the cyrus-sasl package to the dependencies of the Arch package
  6. Install it
  7. If you have an issue when trying to use the ODBC, it may be because of the libsasl2.so.2 which is missing and you can fix it as I did above

Bonus:

If you wish to use Python, the following script made by @manuel_lemaire should make it easier for you to start:

import pyodbc


​


pyodbc.autocommit = True


conn = pyodbc.connect('DSN=Impala;',autocommit=True) 


​


cursor = conn.cursor()


cursor.execute('SELECT * FROM table')


results = cursor.fetchall()


print results

Note:

Don't forget to install the pyodbc module for Python. As Arch has the latest version of Python and it's often causing problems, I recommend you to create a virtualenv with Python 2.7 before doing anything:

virtualenv -p /usr/bin/python2.7 _folder-name  
_pip install pyodbc_  
_

Conclusion

I'm completely a newbie using this kind of software, but I'm quite happy to manage to make it work correctly. Many thanks to @manuel_lemaire who helped me to set up the configuration for the ODBC. Why don't I submit the Cloudera Impala ODBC to AUR to make it easier for you? Because first I don't know how to redistribute it legally, I need to investigate the license and second, the way I dealt the libsasl2.so.2 library wasn't very good if you don't have the libsasl2.so.3for instance.

Don't hesitate to react if you find mistakes in this article!

A comment?

You found an error in this article? Some advice? You can send a comment by email to "blog at killiankemps.fr" with "[Comment][en][Install Impala ODBC for Cloudera on Arch Linux]" as subject.
(The "@" has been replaced by "at" to avoid bad bots to parse the email address)

Send a comment by email