Public Notes by reorx Tagged #parser

cloudflare/lol-html: Low output latency streaming HTML parser/rewriter with CSS selector-based API github.com

html •
library •
parser •
rust •
sscrapercrawler

Filimoa/open-parse: Improved file parsing for LLM’s github.com

Improved file parsing for LLM’s. Contribute to Filimoa/open-parse development by creating an account on GitHub.

#llm #parser #text #convert #markdown #split #extraction #content #python #library #pdf #ocr

content •
convert •
extraction •
library •
llm •
markdown •
ocr •
parser •
pdf •
python •
split •
text

Layout-Parser/layout-parser: A Unified Toolkit for Deep Learning Based Document Image Analysis github.com

A Unified Toolkit for Deep Learning Based Document Image Analysis - Layout-Parser/layout-parser

#pdf #layout #parser #llm #python #image #ocr

image •
layout •
llm •
ocr •
parser •
pdf •
python

OCR Software, Data Extraction Tool - Amazon Textract - AWS aws.amazon.com

Amazon Textract is a machine learning (ML) service that uses optical character recognition (OCR) to automatically extract text, handwriting, and data from scanned PDF documents, forms, and tables.

#content #extraction #ocr #pdf #parser #api

api •
content •
extraction •
ocr •
parser •
pdf

Document AI | Google Cloud cloud.google.com

api •
content •
extraction •
ocr •
parser •
pdf

pdfminer/pdfminer.six: Community maintained fork of pdfminer - we fathom PDF github.com

Community maintained fork of pdfminer - we fathom PDF - pdfminer/pdfminer.six

#python #pdf #content #extraction #parser #library

content •
extraction •
library •
parser •
pdf •
python

pymupdf/PyMuPDF: PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. github.com

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. - pymupdf/PyMuPDF

#python #pdf #content #extraction #parser #library

content •
extraction •
library •
parser •
pdf •
python

ohmjs/ohm: A library and language for building parsers, interpreters, compilers, etc. github.com

A library and language for building parsers, interpreters, compilers, etc. - ohmjs/ohm

#dsl #lexical #language #parser #javascript

dsl •
javascript •
language •
lexical •
parser

kellyjonbrazil/jc: CLI tool and python library that converts the output of popular command-line tools, file-types, and common strings to JSON, YAML, or Dictionaries. This allows piping of output to tools like jq and simplifying automation scripts. github.com

CLI tool and python library that converts the output of popular command-line tools, file-types, and common strings to JSON, YAML, or Dictionaries. This allows piping of output to tools like jq and simplifying automation scripts. - kellyjonbrazil/jc: CLI tool and python library that converts the output of popular command-line tools, file-types, and common strings to JSON, YAML, or Dictionaries. This allows piping of output to tools like jq and simplifying automation scripts.

cli •
json •
parser •
python

unifiedjs/unified: ☔️ interface for parsing, inspecting, transforming, and serializing content through syntax trees github.com

javascript •
parser •
pub •
syntax

tatatap-com/sowhat github.com

Contribute to tatatap-com/sowhat development by creating an account on GitHub. #javascript #command #parser #plain-text #pub

command •
javascript •
parser •
plain-text •
pub

miyuchina/mistletoe: A fast, extensible and spec-compliant Markdown parser in pure Python. github.com

A fast, extensible and spec-compliant Markdown parser in pure Python. - miyuchina/mistletoe: A fast, extensible and spec-compliant Markdown parser in pure Python. #python #markdown #parser #library #pub

library •
markdown •
parser •
pub •
python