<p align="center">
    <h1 align="center">SDSVKVU</h1>
  </p>

  ***Feature***
  - Extract pairs of key-value in documents: Invoice/Receipt, Forms, Government documents (Id cards, driver license, birth's certificate)
  - Language: VI + EN

  ***What's news***
  ### - Ver 0.0.1:
  - Support inputs: image, PDF file (single or multi pages)
  - Extract all pairs key-value return raw_outputs
    + Weights: weights/key_value_understanding-20230716-085549_final
  - For VAT invoices : Extract 14 specific fields
    + Weights: weights/key_value_understanding-20230627-164536_fi
  - For SBT invoices ("sbt" option): Extract table in SBT invoice
    + Weights: weights/key_value_understanding-20230812-170826_sbt_2
  ### - Ver 0.0.2: Add more option: "vtb" - Vietin Bank
  - For Vietin Bank document ("vtb" option): Extract 6 specific fileds
    + Weights: weights/key_value_understanding-20230824-164236_vietin
  ### - Ver 0.0.3: Add default option:
  - Return all potential pairs of key-value, title, only key, triplet, and table with raw key 
  ### - Ver 0.0.4: Add more option: "manulife" - Manulife Issurance
  - For Manulife Insurance document ("manulife" option): Extract all potential pairs of key-value, title, only key, triplet, and table with raw key + Type of medical documents
    + Weights: weights/key_value_understanding-20231024-125646_manulife2
  ### Ver 0.1.0: Modify KVU model for SBT
  ### - Ver 0.1.0: Add option: "sbt_v2" - SBT project
  - For SBT imei/invoice ("sbt_v2" option): Extract 4 specific fields
    + Weights: weights/key_value_understanding_for_sbt-20231108-143935

  ## I. Setup 
  ***Dependencies***
  - Python: 3.10
  - Torch: 1.11.3
  - CUDA: 11.6
  - transformers: 4.30.0
  ```
  pip install -v -e .
  ```


  ## II. Inference
  run cmd: python test.py
  ```
  import os
  from sdsvkvu import load_engine, process_img
  os.environ["CUDA_VISIBLE_DEVICES"]="1"

  if __name__ == "__main__":
    kwargs = {"device": "cuda:0"}
    img_dir = "/mnt/ssd1T/tuanlv/02-KVU/sdsvkvu/visualize/test_img/RedInvoice_WaterPurfier_Feb_PVI_829_0.jpg"
    save_dir = "/mnt/ssd1T/tuanlv/02-KVU/sdsvkvu/visualize/test2/"
    engine = load_engine(kwargs)
    # option: "vat" for vat invoice outputs, "sbt": sbt invoice outputs, else for raw outputs
    outputs = process_img(img_dir, save_dir, engine, export_all=False, option="vat") 
  ```

  # Structure project
    .
    ├── sdsvkvu
    │   ├── main.py
    ├── externals
    │   │   ├── __init__.py
    │   │   ├── basic_ocr
    │   │   │   ├── ... 
    │   │   ├── ocr_engine
    │   │   │   ├── ...
    │   │   ├── ocr_engine_deskew
    │   │   │   ├── ...
    │   ├── model
    │   │   ├── combined_model.py
    │   │   ├── document_kvu_model.py
    │   │   ├── __init__.py
    │   │   ├── kvu_model.py
    │   │   └── relation_extractor.py
    │   ├── modules
    │   │   ├── __init__.py
    │   │   ├── predictor.py
    │   │   ├── preprocess.py
    │   │   └── run_ocr.py
    │   ├── requirements.txt
    │   ├── settings.yml
    │   ├── sources
    │   │   ├── __init__.py
    │   │   ├── kvu.py
    │   │   └── utils.py
    │   ├── utils
    │   │   ├── dictionary
    │   │   │   ├── __init__.py
    │   │   │   ├── sbt.py
    │   │   │   └── vat.py
    │   │   │   └── vtb.py
    │   │   │   ├── manulife.py
    │   │   │   ├── sbt_v2.py
    │   │   ├── __init__.py
    │   │   ├── post_processing.py
    │   │   ├── query
    │   │   │   ├── __init__.py
    │   │   │   ├── sbt.py
    │   │   │   └── vat.py
    │   │   │   └── vtb.py
    │   │   │   ├── all.py
    │   │   │   ├── manulife.py
    │   │   │   ├── sbt_v2.py
    │   │   └── utils.py
    ├── weights
    │   └── key_value_understanding-20230627-164536_fi
    │   ├── key_value_understanding-20230812-170826_sbt_2
    │   └── key_value_understanding-20230716-085549_final 
    │   └── key_value_understanding-20230824-164236_vietin
    │   └── key_value_understanding-20231024-125646_manulife2
    │   └── key_value_understanding_for_sbt-20231108-143935
    ├── LICENSE
    ├── MANIFEST.in
    ├── pyproject.toml
    ├── README.md
    ├── scripts
    │   └── run.sh
    ├── setup.cfg
    ├── setup.py
    ├── test.py
    └── visualize