# Set up dev environment (always use uv, never pip directly)
uv sync --dev --all-extras
# Run tests as non-root user (required - ArchiveBox always refuses to run as root)
sudo -u testuser bash -c 'source .venv/bin/activate && python -m pytest archivebox/tests/ -v'
testuser)uv sync --dev --all-extras # Always use uv, never pip directly
source .venv/bin/activate
ArchiveBox has a root check that prevents running as root user. All ArchiveBox commands (including tests) must run as non-root user inside a data directory:
# Run all migration tests
sudo -u testuser bash -c 'source /path/to/.venv/bin/activate && python -m pytest archivebox/tests/test_migrations_*.py -v'
# Run specific test file
sudo -u testuser bash -c 'source .venv/bin/activate && python -m pytest archivebox/tests/test_migrations_08_to_09.py -v'
# Run single test
sudo -u testuser bash -c 'source .venv/bin/activate && python -m pytest archivebox/tests/test_migrations_fresh.py::TestFreshInstall::test_init_creates_database -xvs'
archivebox/tests/
├── test_migrations_helpers.py # Schemas, seeding functions, verification helpers
├── test_migrations_fresh.py # Fresh install tests
├── test_migrations_04_to_09.py # 0.4.x → 0.9.x migration tests
├── test_migrations_07_to_09.py # 0.7.x → 0.9.x migration tests
└── test_migrations_08_to_09.py # 0.8.x → 0.9.x migration tests
Tests must exercise real code paths:
python -m archivebox commands via subprocessIf something is hard to test: Modify the implementation to make it easier to test, or fix the underlying issue. Never mock, skip, simulate, or exit early from a test because you can't get something working inside the test.
Never use @skip, skipTest, or pytest.mark.skip. Every test must run. If a test is difficult, fix the code or test environment - don't disable the test.
init command must return exit code 0 (not [0, 1])==) not loose bounds (>=)def test_migration_preserves_snapshots(self):
"""Migration should preserve all snapshots."""
result = run_archivebox(self.work_dir, ['init'], timeout=45)
self.assertEqual(result.returncode, 0, f"Init failed: {result.stderr}")
ok, msg = verify_snapshot_count(self.db_path, expected_count)
self.assertTrue(ok, msg)
test_migrations_helpers.py)seed_0_X_data()archivebox init to trigger migrationsverify_* functionsstatus, list, add, etc.)When testing 0.8.x (dev branch), you must record ALL replaced migrations:
# The squashed migration replaces these - all must be recorded
('core', '0023_alter_archiveresult_options_archiveresult_abid_and_more'),
('core', '0024_auto_20240513_1143'),
# ... all 52 migrations from 0023-0074 ...
('core', '0023_new_schema'), # Also record the squashed migration itself
New files created by root need permissions fixed for testuser:
chmod 644 archivebox/tests/test_*.py
ArchiveBox commands must run inside a data directory. Tests use temp directories - the run_archivebox() helper sets DATA_DIR automatically.
Tests disable all extractors via environment variables for faster execution:
env['SAVE_TITLE'] = 'False'
env['SAVE_FAVICON'] = 'False'
# ... etc
Use appropriate timeouts for migration tests (45s for init, 60s default).
SQLite handles circular references with IF NOT EXISTS. Order matters less than in other DBs.
add commandadd creates one Crawl with one or more Snapshotsreplaces attribute in squashed migrations lists what they replaceUse consistent naming for everything to enable easy grep-ability and logical grouping:
Principle: Fewest unique names. If you must create a new unique name, make it grep and group well.
Examples:
# Filesystem migration methods - all start with fs_
def fs_migration_needed() -> bool: ...
def fs_migrate() -> None: ...
def _fs_migrate_from_0_7_0_to_0_8_0() -> None: ...
def _fs_migrate_from_0_8_0_to_0_9_0() -> None: ...
def _fs_next_version(current: str) -> str: ...
# Logging methods - ALL must start with log_ or _log
def log_migration_start(snapshot_id: str) -> None: ...
def _log_error(message: str) -> None: ...
def log_validation_result(ok: bool, msg: str) -> None: ...
Rules:
_ prefix for internal/private helpers within the same familylog_ or _loggrep -r "def.*fs_.*(" archivebox/grep -r "def.*log_.*(" archivebox/Do not invent new data structures, variable names, or keys if possible. Try to use existing field names and data structures exactly to keep the total unique data structures and names in the codebase to an absolute minimum.
Example - GOOD:
# Binary has overrides field
binary = Binary(overrides={'TIMEOUT': '60s'})
# Binary reuses the same field name and structure
class Binary(models.Model):
overrides = models.JSONField(default=dict) # Same name, same structure
Example - BAD:
# Don't invent new names like custom_bin_cmds, binary_overrides, etc.
class Binary(models.Model):
custom_bin_cmds = models.JSONField(default=dict) # ❌ New unique name
Principle: If you're storing the same conceptual data (e.g., overrides), use the same field name across all models and keep the internal structure identical. This makes the codebase predictable and reduces cognitive load.
sqlite3 /path/to/index.sqlite3 "SELECT app, name FROM django_migrations WHERE app='core' ORDER BY id;"
sqlite3 /path/to/index.sqlite3 "PRAGMA table_info(core_snapshot);"
sudo -u testuser bash -c 'source .venv/bin/activate && python -m pytest archivebox/tests/test_migrations_08_to_09.py -xvs 2>&1 | head -200'
./bin/kill_chrome.sh