Mirror of the ArchivalBox software source repository. Useful for storing snapshots of web pages and their resources for offline viewing or preservation.
#archival #utilities #python #webpage #web-archival #web-snapshot #storage

Nick Sweeting 5ea8a1ad9c Update README.md před 7 roky
.github 9cb2eafefc fix coc email před 7 roky
archivebox e8808b0a1f add WARC downloading před 7 roky
bin 329b06ec5a make output dir in script před 7 roky
docs @ fa24236b0c e8e16f392a add docs před 7 roky
etc 2903fead90 fix nginx example config to use new name před 7 roky
.dockerignore 771cdfc49b Update .dockerignore před 7 roky
.gitignore ec0e3209fb Update .gitignore před 7 roky
.gitmodules e8e16f392a add docs před 7 roky
CNAME 6b48d881fa Create CNAME před 7 roky
Dockerfile 57d42339a4 rename pip dir archive to archivebox před 7 roky
LICENSE 61f6f02b59 Initial commit před 8 roky
README.md 5ea8a1ad9c Update README.md před 7 roky
_config.yml c0f46b2728 Set theme jekyll-theme-merlot před 7 roky
archive 2d73d8884b fix the setup/archive symlinks před 7 roky
setup 3085a6a758 clean up binaries in PATH před 7 roky

README.md

Logo

ArchiveBox
The open source self-hosted web archive Github Stars Twitter URL

(Recently renamed from Bookmark Archiver)

"Your own personal Way-Back Machine"

💻 Demo | Website | Source | Changelog | Roadmap

▶️ Quickstart | Details | Configuration | Troubleshooting


Save an archived copy of the websites you visit (the actual content of each site, not just the list of links). Can archive entire browsing history, or just links matching a filter or bookmarks list.

ArchiveBox can import links from:

  • Browser history or bookmarks (Chrome, Firefox, Safari, IE, Opera)
  • Pocket
  • Pinboard
  • RSS or plain text lists
  • Shaarli, Delicious, Instapaper, Reddit Saved Posts, Wallabag, Unmark.it, and more!

For each site, it outputs (configurable):

  • Browsable static HTML archive (wget)
  • PDF (Chrome headless)
  • Screenshot (Chrome headless)
  • HTML after 2s of JS running (Chrome headless)
  • Favicon
  • Submits URL to archive.org
  • Index summary pages: index.html & index.json

The archiving is additive, so you can schedule ./archive to run regularly and pull new links into the index. All the saved content is static and indexed with json files, so it lives forever & is easily parseable, it requires no always-running backend.

DEMO: archive.sweeting.me

git clone https://github.com/pirate/ArchiveBox.git
cd ArchiveBox
./setup

# Export your bookmarks, then run the archive command to start archiving!
./archive ~/Downloads/firefox_bookmarks.html

Documentation

We use the Github wiki system for documentation.

You can also access the docs locally by looking in the ArchiveBox/docs/ folder.

Getting Started

Reference

More Info

Screenshots

Desktop ScreenshotMobile Screenshot
CLI Screenshot