Public Notes on
View Public Collections
Improved file parsing for LLM’s. Contribute to Filimoa/open-parse development by creating an account on GitHub.

#llm #parser #text #convert #markdown #split #extraction #content #python #library #pdf #ocr

Show More
A Unified Toolkit for Deep Learning Based Document Image Analysis - Layout-Parser/layout-parser

#pdf #layout #parser #llm #python #image #ocr

Show More
Amazon Textract is a machine learning (ML) service that uses optical character recognition (OCR) to automatically extract text, handwriting, and data from scanned PDF documents, forms, and tables.

#content #extraction #ocr #pdf #parser #api

Show More
Community maintained fork of pdfminer - we fathom PDF - pdfminer/pdfminer.six

#python #pdf #content #extraction #parser #library

Show More
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. - pymupdf/PyMuPDF

#python #pdf #content #extraction #parser #library

Show More
A library and language for building parsers, interpreters, compilers, etc. - ohmjs/ohm

#dsl #lexical #language #parser #javascript

Show More
CLI tool and python library that converts the output of popular command-line tools, file-types, and common strings to JSON, YAML, or Dictionaries. This allows piping of output to tools like jq and simplifying automation scripts. - kellyjonbrazil/jc: CLI tool and python library that converts the output of popular command-line tools, file-types, and common strings to JSON, YAML, or Dictionaries. This allows piping of output to tools like jq and simplifying automation scripts.
Show More
Contribute to tatatap-com/sowhat development by creating an account on GitHub. #javascript #command #parser #plain-text #pub
Show More
A fast, extensible and spec-compliant Markdown parser in pure Python. - miyuchina/mistletoe: A fast, extensible and spec-compliant Markdown parser in pure Python. #python #markdown #parser #library #pub
Show More