I recently found a good open-source project that can help me digitize all my paper invoices/receipts/bank statements, by using its tags and OCR, saving and searching documents using my phone becomes a breeze.
The application is paperless-ngx , which is the latest fork of paperless-ng (which is based on paperless). You should choose paperless-ngx because its predecessors are not actively maintained anymore so a new community was formed to keep this great project, huge thanks to all community members!
I’m going to show how to set it up within 5 minutes using docker containers. The whole stack can be run on a Synology NAS if you don’t have a Linux server at home, follow this article for detail
Step 1: Install docker (a couple of prerequisite steps are needed but are not documented here for simplicity reason. Read this for details).
sudo apt-get install docker-ce docker-ce-cli containerd.io
Step 2: Install Portainer (this is not mandatory but it’s just way easier to use to manage docker containers and networking)
In the below command, specify your own location and then create a folder portainer first(you can call it whatever you like, but I feel portainer is easy enough)
sudo docker run -d --name=portainer -p 9000:9000 -v /var/run/docker.sock:/var/run/docker.sock -v /YOUR_LOCATION/portainer:/data --restart=always portainer/portainer-ce
Step 3: Configure Portainer
Go to your browser, http://YOUR_SERVER_IP:9000/
Create a user
Configure a local environment (again, refer to this article for details with screenshots)
Make sure to update Public IP in Portainer, this IP will be used to access Paperless-ngx’s exposed port later
Step 4: Create a stack in Portainer (a stack is a combination of several docker containers with network connectivity).
- Type in a name for this stack, e.g. paperlessngx
- Paste the following Docker compose commands in “Web editor”. The below compose commands are based on Paperless sample code on Github
- A few notes here:
- I exposed Postgresql port so that you can directly connect to this instance via port 9432
- Change USERMAP_UID & USERMAP_GID. Docker runs Portainer as root but Portainer is actually run under your normal user account so here user id and user group id are from your current user. Use id to find these two values
- My mount point is /mnt/seagate8t1, change this to whatever suitable on your own system
- The below created 4 docker volumes, you can expose more such as redis or postgres for backup purpose, if needed
- I only intend to use paperless-ngx within my local network so no https or other encryption configured
- Latest version of gotenberg api-timeout replaced a number of other time-out related parameters, I set it to 2 minutes to avoid 503 internal server error when a large file needs more time to be processed
version: "3.4"
services:
broker:
image: redis:latest
restart: unless-stopped
volumes:
- /mnt/seagate8t1/paperlessngx/redis:/data
db:
image: postgres:14.2
restart: unless-stopped
volumes:
- /mnt/seagate8t1/paperlessngx/postgres:/var/lib/postgresql/data
ports:
- 9432:5432
environment:
POSTGRES_DB: paperless
POSTGRES_USER: paperless
POSTGRES_PASSWORD: paperless
webserver:
image: ghcr.io/paperless-ngx/paperless-ngx:latest
restart: unless-stopped
depends_on:
- db
- broker
- gotenberg
- tika
ports:
- 8777:8000
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000"]
interval: 30s
timeout: 10s
retries: 5
volumes:
- /mnt/seagate8t1/paperlessngx/paperless/data:/usr/src/paperless/data
- /mnt/seagate8t1/paperlessngx/paperless/media:/usr/src/paperless/media
- /mnt/seagate8t1/paperlessngx/paperless/export:/usr/src/paperless/export
- /mnt/seagate8t1/paperlessngx/paperless/consume:/usr/src/paperless/consume
environment:
PAPERLESS_REDIS: redis://broker:6379
PAPERLESS_DBHOST: db
PAPERLESS_TIKA_ENABLED: 1
PAPERLESS_TIKA_GOTENBERG_ENDPOINT: http://gotenberg:3000
PAPERLESS_TIKA_ENDPOINT: http://tika:9998
# The UID and GID of the user used to run paperless in the container. Set this
# to your UID and GID on the host so that you have write access to the
# consumption directory.
USERMAP_UID: 1000
USERMAP_GID: 1000
# Additional languages to install for text recognition, separated by a
# whitespace. Note that this is
# different from PAPERLESS_OCR_LANGUAGE (default=eng), which defines the
# language used for OCR.
# The container installs English, German, Italian, Spanish and French by
# default.
# See https://packages.debian.org/search?keywords=tesseract-ocr-&searchon=names&suite=buster
# for available languages.
#PAPERLESS_OCR_LANGUAGES: tur ces
# Adjust this key if you plan to make paperless available publicly. It should
# be a very long sequence of random characters. You don't need to remember it.
#PAPERLESS_SECRET_KEY: change-me
# Use this variable to set a timezone for the Paperless Docker containers. If not specified, defaults to UTC.
#PAPERLESS_TIME_ZONE: Australia/Sydney
# The default language to use for OCR. Set this to the language most of your
# documents are written in.
#PAPERLESS_OCR_LANGUAGE: eng
PAPERLESS_ADMIN_USER: admin
PAPERLESS_ADMIN_PASSWORD: YOUR_PASSWORD
gotenberg:
image: gotenberg/gotenberg:latest
command:
- "gotenberg"
- "--api-timeout=2m"
restart: unless-stopped
environment:
CHROMIUM_DISABLE_ROUTES: 1
tika:
image: apache/tika
restart: unless-stopped
volumes:
data:
media:
export:
consume:
- Deploy your stack
Step 5: Log into Paperless-ngx via http://YOUR_IP:8777, using the admin password configured above
One last thing if you have reached here 🙂
If you deploy the above on a Synology NAS, make sure your firewall allows traffic within the newly created network (Portainer->networks->paperlessngx_default) as otherwise your Paperless-ngx container will be stuck at /sbin/docker-prepare.sh