Monday, September 24, 2012

Comparing multiple columns in MySQL

Few days ago, I needed to write an SQL query on combined unique key. The situation is that I have a table with one unique constraint on two columns. Let's call those columns A, and B. The table had these values:

+---+---+
| A | B |
+---+---+
| 1 | 1 |
| 1 | 2 |
| 2 | 3 |
| 2 | 4 |
+---+---+

At first, I tried this query below:

SELECT * FROM tbl WHERE (A, B) IN ((1, 1))

It worked beautifully. MySQL EXPLAIN command said it only needed to look at one row.

So I ventured further to put in one more value to search for:

SELECT * FROM tbl WHERE (A, B) IN ((1, 1), (1, 2))

This time, EXPLAIN told me it needed to scan the whole table (4 rows). I thought it was odd and tried to make a comparison with the regular query:

SELECT * FROM tbl WHERE (A=1 AND B=1) OR (A=1 AND B=2)

The third query turned out to scan only two rows!

Perhaps the IN operator has not been optimized in this case yet so for now I've got to do with a clumsier query. I could, of course, shorten the third query to A=1 AND B IN (1, 2) to make it less wieldy.

On a related note, SQLite does not even support tuple comparison.

Tuesday, September 11, 2012

Database connection pooling in Django made easy

Django does not pool its SQL connections. Every single web request is served with one new connection to the DBMS. If the web server and DB server are far apart, the cost of establishing new connection could be expensive.

Fortunately, it is easy to take advantage of SQL Alchemy's excellent pooling support in Django. What one needs to do is simply to call sqlalchemy.pool.manage on the Database name in the Django DB backend base module.

For example, the MySQL backend could be made to use connection pooling with these three lines:

from django.db.backends.mysql import base
from sqlalchemy import pool
base.Database = pool.manage(base.Database)

Extra parameters can be passed to manage to tune the pool.

This monkey patching should (or rather must) be done before any call to Django ORM. If you, for instance, use gunicorn_django to serve your application, you can place those lines in its configuration file. Alternatively, you can also stash them in your application's settings.py file.

Tuesday, September 4, 2012

Protecting Android application

Few days ago, Tim "diff" Strazzere gave a presentation about different ways that developers can protect their Android applications.

I am noting down here a few key points that he nailed down succintly. These are also the same points that I have always advocated for. I'm happy to find them resonating so well in the community.

  • Do not provoke the crackers. Instead, try to make peace with them. Most of the respected (the elite tier) crackers are actually pretty decent guys and can be reasoned with.
  • Limit the information crackers can get. This includes function names, debug logs, crash report, web service request parameters, and so on. Any descriptive name, hint, or keyword could give away the much needed information to the crackers.
  • Delay feedback to the crackers. Accept registration information easily early on. And fail when the feature is actually invoked.
  • Phone home. Or use server provided data as parameters in the client. A valid user would get sensible data; an invalid one would get crippled data.

These practices could be generalized into just a few simple, commonsensical principles. Basically, the compilation process is a lossy process. A compiler produces a "less documented" artifact from a "better documented" source code. And we should do our best to maintain that leverage against crackers. Then, the classic laws of locality come into effect. Disrupting locality of time and locality of space is the goal to aim for. And lastly, as Bruce Schneier put it, "amateurs attack machines; professionals target people." Battling against crackers needs to be a battle against the way people do things, against the crackers' way of thinking. Beside technological measures, other factors such as behavioral, and psychological aspects must be also considered.

Sunday, September 2, 2012

virtualenv and pip

Update: virtualenv 1.8.1 released on September 3 would fix this problem with --never-download.

pip 1.2 was released about 10 hours ago. It is so new and shiny that it breaks my current deployment process that has worked fine with pip 1.1.

Before today, I have a requirements.txt file that specifies all dependencies for my Python web app. My deployment script basically creates a virtual environment with virtualenv and then does a 2-pass pip'ing like this:

pip install --download=pipcache -r requirements.txt
pip install --find-links file://absolute/path/to/pipcache -r requirements2.txt

requirements2.txt is derived from requirements.txt to eliminate all URLs. For example, requirements.txt has this line,

http://code.google.com/p/pymysql/source/browse/?r=66#egg=PyMySQL-0.3

while requirements2.txt has this:

PyMySQL-0.3

The reason for two passes is to cache downloaded packages and ignore PyPI. The option --download-cache does help with the first goal but always need PyPI.

That deployment process was working fine, until 10 hours ago. Suddenly pip errors out in the second pass, saying PyMySQL cannot be found. Looking at the log file, I suspect that pip does a case sensitive string comparison between pymysql and PyMySQL. Where pip gets the string pymysql from, I do not know.

Then I immediately looked for a way to force virtualenv to pin pip 1.1. Of course, it's not possible. Which is kind of ironic because virtualenv encourages pip over setuptools, yet latest pip fails. A case of oversight, I guess.

So, the solution is to let virtualenv pick the latest pip as usual, and then use pip to degrade itself to 1.1.