
Continuously downloads ZIM files for consumption (e.g., Kiwix archives).
pip install -r requirements.txt
Create a .tallest.config.json file in your project directory:
{
"sources": [
{
"name": "Wikipedia ZIM Archive",
"url": "https://dumps.wikimedia.org/kiwix/zim/wikipedia/",
"type": "zim",
"targetPattern": ".*wikipedia_en_all_maxi_.*",
"download_dir": "./downloads"
}
]
}
| Option | Type | Description |
|---|---|---|
name |
string | Display name for the downloaded file |
url |
string | Base URL of the ZIM index to parse |
type |
string | Source type (currently only zim is supported) |
targetPattern |
string | Regex pattern to match files (optional) |
downloadDir |
string | Directory to store downloaded files (default: ./) |
| Variable | Description | Default |
|---|---|---|
TALLEST_CONFIG_PATH |
Path to the configuration JSON file | ./.tallest.config.json |
TALLEST_MAX_DOWNLOADS |
Maximum number of concurrent downloads | 4 |
TALLEST_CONFIG_PATH=./.tallest.config.json python main.py
Or simply:
python main.py
The configuration file path can be overridden via the TALLEST_CONFIG_PATH environment variable.
See LICENSE