I recently found a good open-source project that can help me digitize all my paper invoices/receipts/bank statements, by using its tags and OCR, saving and searching documents using my phone becomes a breeze.

The application is paperless-ngx , which is the latest fork of paperless-ng (which is based on paperless). You should choose paperless-ngx because its predecessors are not actively maintained anymore so a new community was formed to keep this great project, huge thanks to all community members!

I’m going to show how to set it up within 5 minutes using docker containers. The whole stack can be run on a Synology NAS if you don’t have a Linux server at home, follow this article for detail

Step 1: Install docker (a couple of prerequisite steps are needed but are not documented here for simplicity reason. Read this for details).

sudo apt-get install docker-ce docker-ce-cli containerd.io

Step 2: Install Portainer (this is not mandatory but it’s just way easier to use to manage docker containers and networking)

In the below command, specify your own location and then create a folder portainer first(you can call it whatever you like, but I feel portainer is easy enough)

sudo docker run -d --name=portainer -p 9000:9000 -v /var/run/docker.sock:/var/run/docker.sock -v /YOUR_LOCATION/portainer:/data --restart=always portainer/portainer-ce

Step 3: Configure Portainer

Go to your browser, http://YOUR_SERVER_IP:9000/

Create a user

Configure a local environment (again, refer to this article for details with screenshots)

Make sure to update Public IP in Portainer, this IP will be used to access Paperless-ngx’s exposed port later

Step 4: Create a stack in Portainer (a stack is a combination of several docker containers with network connectivity).

  • Type in a name for this stack, e.g. paperlessngx
  • Paste the following Docker compose commands in “Web editor”. The below compose commands are based on Paperless sample code on Github
  • A few notes here:
    1. I exposed Postgresql port so that you can directly connect to this instance via port 9432
    2. Change USERMAP_UID & USERMAP_GID. Docker runs Portainer as root but Portainer is actually run under your normal user account so here user id and user group id are from your current user. Use id to find these two values
    3. My mount point is /mnt/seagate8t1, change this to whatever suitable on your own system
    4. The below created 4 docker volumes, you can expose more such as redis or postgres for backup purpose, if needed
    5. I only intend to use paperless-ngx within my local network so no https or other encryption configured
    6. Latest version of gotenberg api-timeout replaced a number of other time-out related parameters, I set it to 2 minutes to avoid 503 internal server error when a large file needs more time to be processed
version: "3.4"
services:
  broker:
    image: redis:latest
    restart: unless-stopped
    volumes:
      - /mnt/seagate8t1/paperlessngx/redis:/data

  db:
    image: postgres:14.2
    restart: unless-stopped
    volumes:
      - /mnt/seagate8t1/paperlessngx/postgres:/var/lib/postgresql/data
    ports:
      - 9432:5432
    environment:
      POSTGRES_DB: paperless
      POSTGRES_USER: paperless
      POSTGRES_PASSWORD: paperless

  webserver:
    image: ghcr.io/paperless-ngx/paperless-ngx:latest
    restart: unless-stopped
    depends_on:
      - db
      - broker
      - gotenberg
      - tika
    ports:
      - 8777:8000
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000"]
      interval: 30s
      timeout: 10s
      retries: 5
    volumes:
      - /mnt/seagate8t1/paperlessngx/paperless/data:/usr/src/paperless/data
      - /mnt/seagate8t1/paperlessngx/paperless/media:/usr/src/paperless/media
      - /mnt/seagate8t1/paperlessngx/paperless/export:/usr/src/paperless/export
      - /mnt/seagate8t1/paperlessngx/paperless/consume:/usr/src/paperless/consume
    environment:
      PAPERLESS_REDIS: redis://broker:6379
      PAPERLESS_DBHOST: db
      PAPERLESS_TIKA_ENABLED: 1
      PAPERLESS_TIKA_GOTENBERG_ENDPOINT: http://gotenberg:3000      
      PAPERLESS_TIKA_ENDPOINT: http://tika:9998
     
# The UID and GID of the user used to run paperless in the container. Set this
# to your UID and GID on the host so that you have write access to the
# consumption directory.
      USERMAP_UID: 1000
      USERMAP_GID: 1000
# Additional languages to install for text recognition, separated by a
# whitespace. Note that this is
# different from PAPERLESS_OCR_LANGUAGE (default=eng), which defines the
# language used for OCR.
# The container installs English, German, Italian, Spanish and French by
# default.
# See https://packages.debian.org/search?keywords=tesseract-ocr-&searchon=names&suite=buster
# for available languages.
      #PAPERLESS_OCR_LANGUAGES: tur ces
# Adjust this key if you plan to make paperless available publicly. It should
# be a very long sequence of random characters. You don't need to remember it.
      #PAPERLESS_SECRET_KEY: change-me
# Use this variable to set a timezone for the Paperless Docker containers. If not specified, defaults to UTC.
      #PAPERLESS_TIME_ZONE: Australia/Sydney
# The default language to use for OCR. Set this to the language most of your
# documents are written in.
      #PAPERLESS_OCR_LANGUAGE: eng

      PAPERLESS_ADMIN_USER: admin
      PAPERLESS_ADMIN_PASSWORD: YOUR_PASSWORD


  gotenberg:
    image: gotenberg/gotenberg:latest
    command:
     - "gotenberg"
     - "--api-timeout=2m"
    restart: unless-stopped
    environment:
      CHROMIUM_DISABLE_ROUTES: 1      

  tika:
    image: apache/tika
    restart: unless-stopped
    
volumes:  
  data:
  media:  
  export:
  consume:
  • Deploy your stack

Step 5: Log into Paperless-ngx via http://YOUR_IP:8777, using the admin password configured above

One last thing if you have reached here 🙂

If you deploy the above on a Synology NAS, make sure your firewall allows traffic within the newly created network (Portainer->networks->paperlessngx_default) as otherwise your Paperless-ngx container will be stuck at /sbin/docker-prepare.sh

Digitize your paper invoices and documents
Tagged on:                     

Leave a Reply

Your email address will not be published. Required fields are marked *

67 + = 77

This site uses Akismet to reduce spam. Learn how your comment data is processed.