sbt-idp/cope2n-ai-fi/modules/sdsvkie
2023-12-14 13:40:12 +07:00
..
demos Merge branch 'main' of https://code.sdsdev.co.kr/dx-tan/SBT-IDP into optimize_performance 2023-12-14 13:40:12 +07:00
notebooks Merge branch 'main' of https://code.sdsdev.co.kr/dx-tan/SBT-IDP into optimize_performance 2023-12-14 13:40:12 +07:00
scripts Merge branch 'main' of https://code.sdsdev.co.kr/dx-tan/SBT-IDP into optimize_performance 2023-12-14 13:40:12 +07:00
sdsvkie Merge branch 'main' of https://code.sdsdev.co.kr/dx-tan/SBT-IDP into optimize_performance 2023-12-14 13:40:12 +07:00
.gitignore Merge branch 'main' of https://code.sdsdev.co.kr/dx-tan/SBT-IDP into optimize_performance 2023-12-14 13:40:12 +07:00
arial.ttf Merge branch 'main' of https://code.sdsdev.co.kr/dx-tan/SBT-IDP into optimize_performance 2023-12-14 13:40:12 +07:00
eval_with_api.py Merge branch 'main' of https://code.sdsdev.co.kr/dx-tan/SBT-IDP into optimize_performance 2023-12-14 13:40:12 +07:00
index.html Merge branch 'main' of https://code.sdsdev.co.kr/dx-tan/SBT-IDP into optimize_performance 2023-12-14 13:40:12 +07:00
README.md Merge branch 'main' of https://code.sdsdev.co.kr/dx-tan/SBT-IDP into optimize_performance 2023-12-14 13:40:12 +07:00
requirements.txt Merge branch 'main' of https://code.sdsdev.co.kr/dx-tan/SBT-IDP into optimize_performance 2023-12-14 13:40:12 +07:00
setup.cfg Merge branch 'main' of https://code.sdsdev.co.kr/dx-tan/SBT-IDP into optimize_performance 2023-12-14 13:40:12 +07:00
setup.py Merge branch 'main' of https://code.sdsdev.co.kr/dx-tan/SBT-IDP into optimize_performance 2023-12-14 13:40:12 +07:00

SDSVKIE

Feature

  • Extract information from documents: VAT Invoice, Receipt
  • Language: VI + EN

What's news

- Ver 1.0.1:

  • Improve postprocessing for receipts
  • Support handling multiple pages for PDF files
  • Lastest weight: /mnt/ssd1T/hoanglv/Projects/KIE/sdsvkie/workdirs/sdsap_receipt/exp_9_lr5e_6_no_scheduler/best
  • Lastest config: /mnt/ssd1T/hoanglv/Projects/KIE/sdsvkie/workdirs/sdsap_receipt/exp_9_lr5e_6_no_scheduler/config.yaml

I. Setup

Dependencies

  • Python: 3.8
  • Torch: 1.10.2
  • CUDA: 11.6
  • transformers: 4.28.1
pip install -v -e .

II. Inference

from sdsvkie import Predictor
import cv2 

predictor = Predictor(
    cfg="./workdirs/training/sdsap_receipt/exp_3/config.yaml", 
    weights="./workdirs/training/sdsap_receipt/exp_3/best",
    device="cpu",
)
img = cv2.imread("./demos/4 Sep OPC to Home.jpg")
out = predictor(img)
output = out['end2end_results']

III. Training

  • Prepare dataset: The structure of the dataset directory is organized as follows:

└── base_dataset
├── train
├──── sub_dir_1
├────── img1.txt
├────── img1.txt
├────── ...
├──── sub_dir_2
├────── img2.txt
├────── img2.txt
├── test
├──── imgn.jpg
├──── imgn.txt

  • Edit and run scripts:
sh ./scripts/train.sh

TODO

  • Add more fields: sub_total, tips, seller_address, item list
  • Support muliple pages
  • Review result KIE for invoice (vnpt_exp_4_model)
  • Fix unnormalize box error in some cases
  • Support multiple pages
  • Create 200 multiple pages invoice
  • Finalize multi page testset
  • Eval result