Dynamic table prefix with SQLAlchemy and Alembic
- Internals' deep dive
Our post series Internals' deep dive guide you through the most interesting and challenging aspects of the Fief codebase ๐ฐ
When developing a web service like Fief, it's almost always necessary to read and write from and to a database. It's also highly probable this database will be an SQL database, like PostgreSQL.
In Python, the go-to library to work with SQL databases is SQLAlchemy. This library is a pure jewel, allowing you to either work at low-level with SQL queries (the Core part) or at higher-level with the ORM capabilities.
Of course, when designing a SQL schema for a software, it's very rare we can define it once and for all for all the lifetime of the app. When we want to implement (or remove!) features, we'll likely need to change or add tables to our database. Hence, we need to migrate the existing schema โ with all the data it contains โ to the new one, without breaking anything. It would be quite dangerous and error-prone to do it by hand with SQL queries. That's why SQLAlchemy team also maintains Alembic, a library dedicated to create and run migrations scripts.
Standard setup ๐งญ
In a standard setup, here is how you would define a data model with SQLAlchemy ORM:
class User(Base):
__tablename__ = "users"
id: Mapped[int] = mapped_column(primary_key=True)
email: Mapped[str] = mapped_column(String(320), nullable=False)
where Base
is our declarative mapping class. We'll come back to it later! Notice also how we can tell SQLAlchemy the actual name of our table in the database, using __tablename__
.
Now, with Alembic, we can create a new migration script in order to create this table in our database. Thanks to the autogeneration feature, it's even able to explore our mappings and generate the corresponding SQL commands, so we don't have to write everything by hand (even though we should always have a look at what it did ๐).
This script will look like this:
revision = "27c6a30d7c24"
down_revision = None
import sqlalchemy as sa
from alembic import op
def upgrade():
### commands auto generated by Alembic - please adjust! ###
op.create_table(
"users",
sa.Column("id", sa.Integer()),
sa.Column("email", sa.String(length=320), nullable=False),
sa.PrimaryKeyConstraint("id"),
)
### end Alembic commands ###
def downgrade():
### commands auto generated by Alembic - please adjust! ###
op.drop_table("users")
### end Alembic commands ###
We see that Alembic hard-coded the table name, users
, in each command. It means that we won't be able to easily modify it at runtime. Besides, what if we want to install several applications on the same database? There'll probably be table names conflicts.
Static table prefix
To solve this, we can tell SQLAlchemy to automatically add a prefix before all our table names. Since every models have to inherit from Base
, it's very easy to overload the __tablename__
class attribute:
class Base(DeclarativeBase):
def __init_subclass__(cls) -> None:
cls.__tablename__ = f"myapp_{cls.__tablename__}"
super().__init_subclass__()
What we do here is to overload the __init_subclass__
method to modify the given __tablename__
attribute with a prefix.
Now, SQLAlchemy will look for the table named myapp_users
when performing a query. Of course, it reflects in Alembic migration:
revision = "27c6a30d7c24"
down_revision = None
import sqlalchemy as sa
from alembic import op
def upgrade():
### commands auto generated by Alembic - please adjust! ###
op.create_table(
"myapp_users",
sa.Column("id", sa.Integer()),
sa.Column("email", sa.String(length=320), nullable=False),
sa.PrimaryKeyConstraint("id"),
)
### end Alembic commands ###
def downgrade():
### commands auto generated by Alembic - please adjust! ###
op.drop_table("myapp_users")
### end Alembic commands ###
However, the table name is still hard-coded. For most applications, this approach works quite well: you'll likely install your app in only one place on a single database where you control all the table names present.
For an app like Fief, which purpose is to be installed by anyone on any server, this is not good enough: we need a way to let users choose their table prefix during setup.
The challenge ๐
To solve this, we'll need to tackle several problems:
- Set a dynamic table prefix on our declarative mapper.
- Pass the dynamic prefix value as a variable during migration
- Alembic should not hard-code the table prefix in the migration scripts (obviously, we are lazy and we don't want to edit the scripts by hand)
Setting a dynamic prefix on Base
The first step is to set the prefix with a dynamic value in the Base
declarative mapper class. Here is how it's implemented in Fief:
import os
from sqlalchemy.orm import DeclarativeBase
from fief.settings import settings
TABLE_PREFIX_PLACEHOLDER = "__FIEF__"
GENERATE_MIGRATION = os.environ.get("GENERATE_MIGRATION") == "1"
TABLE_PREFIX = (
TABLE_PREFIX_PLACEHOLDER if GENERATE_MIGRATION else settings.database_table_prefix
)
def get_prefixed_tablename(name: str) -> str:
return f"{TABLE_PREFIX}{name}"
class Base(DeclarativeBase):
def __init_subclass__(cls) -> None:
cls.__tablename__ = get_prefixed_tablename(cls.__tablename__)
super().__init_subclass__()
The approach on Base
is the same we reviewed for a static prefix. What changes however is how we determine this prefix.
The TABLE_PREFIX
constant is the interesting part. You see that we assign its value following two situations:
- If the environment variable
GENERATE_MIGRATION
is set to1
, we set a placeholder,__FIEF__
. - If not, we take the value from our settings class. This will be the dynamic prefix set by the user.
So, why we need this placeholder trick? It's important to understand that our models mapping will be explored in two different situations:
- When the application is running; in this case we want our "real" table prefix set by the user.
- When Alembic generates migrations scripts; in this case, having a placeholder will help us to modify the script with a dynamic variable afterwards.
Isn't
get_prefixed_tablename
function useless? We could directly assign the formatted string inside__init_subclass__
.Indeed but, actually, in Fief, we sometimes define imperative tables which don't inherit from
Base
. Hence, we need a common logic to determine the table name.
Generating migrations with dynamic prefix
Now that we have this placeholder prefix in place, we can generate Alembic migrations. The first challenge is to have a way to retrieve the table prefix value inside the migration script.
Fortunately, Alembic provides a very flexible way to pass custom options and values inside MigrationContext
. This object stores basically everything Alembic needs to know to perform a migration. And it's easily accessible inside a migration script through op.get_context()
. Then, we can use the opts
dictionary to lookup for custom values.
So, we'll consider we have a table_prefix
key set on this dictionary holding our dynamic prefix value. We'll see later how we can actually set it when Alembic runs.
Since we'll need this value in each migration script, it's a good idea to put it in the migration template, script.py.mako
. It looks like this:
def upgrade():
table_prefix = op.get_context().opts["table_prefix"]
${upgrades if upgrades else "pass"}
def downgrade():
table_prefix = op.get_context().opts["table_prefix"]
${downgrades if downgrades else "pass"}
Quite simply, we assign the variable table_prefix
right before the operations auto-generated by Alembic.
So, let's see the output of such migration:
env GENERATE_MIGRATION=1 alembic -c fief/alembic.ini revision --autogenerate -m "My first migration"
Of course, we don't forget to set our custom GENERATE_MIGRATION
environment variable ๐
The result will look like this:
revision = "27c6a30d7c24"
down_revision = None
import sqlalchemy as sa
from alembic import op
def upgrade():
table_prefix = op.get_context().opts["table_prefix"]
### commands auto generated by Alembic - please adjust! ###
op.create_table(
"__FIEF__users",
sa.Column("id", sa.Integer()),
sa.Column("email", sa.String(length=320), nullable=False),
sa.PrimaryKeyConstraint("id"),
)
### end Alembic commands ###
def downgrade():
table_prefix = op.get_context().opts["table_prefix"]
### commands auto generated by Alembic - please adjust! ###
op.drop_table("__FIEF__users")
### end Alembic commands ###
So, we do have our table_prefix
variable (which is unused). Besides, the table names are now prefixed with the placeholder prefix. Indeed, from Alembic point-of-view, all we did was to set the prefix to the static string __FIEF__
.
We could stop here! With a text editor and a bunch of RegEx, we could manually transform every occurence of __FIEF__table
by f"{table_prefix}_table"
to inject the dynamic value. And actually, that's what we did for a while ๐
Until we mastered a whole new power: codemods.
Transform Alembic migrations using a codemod
Codemods are automated scripts able to parse Python source code, apply operations on the syntax tree and output the updated source code. That's exactly what we want to do: for each string literal that contains __FIEF__
, we want to change it to a formatted string with the table_prefix
variable.
The codemod framework is part of the LibCST library, which purpose is to parse Python source code into Concrete Syntax Tree. Basically, we're able to work with objects and structures representing a Python source code... Inside another Python script.
This framework aims at building dedicated CLI tools to refactor entire codebase, like the excellent bump-pydantic
project by Kludex. Of course, it also works for simple use cases like ours. And it's actually quite straightforward to do! Here is the full code to perform our transformation:
import libcst as cst
from libcst.codemod import VisitorBasedCodemodCommand
from fief.models.base import TABLE_PREFIX_PLACEHOLDER
class ConvertTablePrefixStrings(VisitorBasedCodemodCommand):
DESCRIPTION: str = (
"Converts strings containing table prefix placeholder "
"to a format-string with dynamic table prefix."
)
def leave_SimpleString(
self, original_node: cst.SimpleString, updated_node: cst.SimpleString
) -> cst.SimpleString | cst.FormattedString:
value = updated_node.evaluated_value
if not isinstance(value, str):
return updated_node
if TABLE_PREFIX_PLACEHOLDER in value:
before, after = value.split(TABLE_PREFIX_PLACEHOLDER)
before = before.replace('"', '\\"')
after = after.replace('"', '\\"')
return cst.FormattedString(
[
cst.FormattedStringText(before),
cst.FormattedStringExpression(cst.Name("table_prefix")),
cst.FormattedStringText(after),
]
)
return updated_node
All we need is to inherit from the VisitorBasedCodemodCommand
base class. Then, we just have to implement a method for each node we're interested in. Here, we want to look at SimpleString
, which corresponds to simple, non-formatted string, like "__FIEF__table"
.
Then, we check if this string contains our placeholder. If it does, we replace it by a FormattedString
node containing the text before, the variable table_prefix
(using a Name
node) and the text after.
And that's it! LibCST comes with a CLI tool we can use to directly summon codemods on one or several files. Let's try it:
python -m libcst.tool codemod -x fief.alembic.table_prefix_codemod.ConvertTablePrefixStrings my_migration.py
Now, our migration script looks like this:
revision = "27c6a30d7c24"
down_revision = None
import sqlalchemy as sa
from alembic import op
def upgrade():
table_prefix = op.get_context().opts["table_prefix"]
### commands auto generated by Alembic - please adjust! ###
op.create_table(
f"{table_prefix}_users",
sa.Column("id", sa.Integer()),
sa.Column("email", sa.String(length=320), nullable=False),
sa.PrimaryKeyConstraint("id"),
)
### end Alembic commands ###
def downgrade():
table_prefix = op.get_context().opts["table_prefix"]
### commands auto generated by Alembic - please adjust! ###
op.drop_table(f"{table_prefix}_users")
### end Alembic commands ###
Magic isn't it? ๐ช We transformed Python code using Python code!
But wait, there's more. As you probably understood by now, we're super lazy. We don't want to manually run that command each time we generate a new migration. Could it be done automatically? Of course!
Alembic has a post_write_hooks
mechanism allowing you to trigger commands after a migration has been generated. We just need to tell it what to do in alembic.ini
:
[post_write_hooks]
hooks = codemod,format,lint
codemod.type = exec
codemod.executable = python
codemod.options = -m libcst.tool codemod -x fief.alembic.table_prefix_codemod.ConvertTablePrefixStrings REVISION_SCRIPT_FILENAME
format.type = exec
format.executable = ruff
format.options = format REVISION_SCRIPT_FILENAME
lint.type = exec
lint.executable = ruff
lint.options = check --fix REVISION_SCRIPT_FILENAME
The REVISION_SCRIPT_FILENAME
variable automatically points to the current file we're working on. Notice how we also take this chance to run our linter, Ruff, on it.
Great! Now, in a single command, we can automatically generate migrations with commands having a dynamic table prefix.
Set the table_prefix
configuration option
Finally, we need to make this table_prefix
option available to Alembic migration scripts. Fortunately, Alembic is very flexible. The env.py
script contains all the logic it executes when a migration is run. The key is to customize the context.configure
call. Here is what it looks like in Fief:
context.configure(
connection=connection,
target_metadata=target_metadata,
version_table=version_table,
table_prefix=TABLE_PREFIX,
)
We simply import our TABLE_PREFIX
constant and assign it to the table_prefix
key. This way, it'll be available when calling op.get_context().opts["table_prefix"]
.
The special case of version_table
You might have noticed the version_table
argument we also added to .configure()
. In Alembic, the version table is a very simple table storing the latest migration ID we applied; so it knows where to start from on the next run.
By default, it's called alembic_version
. But of course, we want it also to have our dynamic prefix. That's why we customize it like this in env.py
:
version_table = f"{TABLE_PREFIX}{config.get_main_option('version_table_name')}"
We pull the actual name from a config option we set in alembic.ini
, but we could have simply done version_table = f"{TABLE_PREFIX}alembic_version"
We did it, lads ๐
Great! With this setup, we have a fully automated way of generating and applying database migrations with a dynamic table prefix.
As we've seen, the basics are quite simple: it's just a matter of setting the right variable at the right place. The real challenge comes from the transformation of the generated migration, so the table_prefix
variable is injected. Fortunately, LibCST makes it very easy for us to define powerful transformation scripts. It really feels magical to write code that changes code!
The downside of this approach is that, during development, we actually need to apply the database migration with the placeholder. It means that we have tables named __FIEF__users
in the database. It's unfortunately necessary because Alembic first checks for the actual schema to determine the migration commands it has to generate. But that's not too bad and, who knows, maybe we'll find a way to circumvent that too ๐ (if you have suggestions about this, please tell me!)