Public Notes
on
histre
Improved file parsing for LLM’s. Contribute to Filimoa/open-parse development by creating an account on GitHub.
#llm #parser #text #convert #markdown #split #extraction #content #python #library #pdf #ocr
Show More
OCR Software, Data Extraction Tool - Amazon Textract - AWS
aws.amazon.com
Amazon Textract is a machine learning (ML) service that uses optical character recognition (OCR) to automatically extract text, handwriting, and data from scanned PDF documents, forms, and tables.
#content #extraction #ocr #pdf #parser #api
Show More
Document AI | Google Cloud
cloud.google.com
Community maintained fork of pdfminer - we fathom PDF - pdfminer/pdfminer.six
#python #pdf #content #extraction #parser #library
Show More
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. - pymupdf/PyMuPDF
#python #pdf #content #extraction #parser #library
Show More
microsoft/table-transformer-detection · Hugging Face
huggingface.co
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
#llm #model #table #pdf #content #extraction
Show More
UniTable: Towards a Unified Table Foundation Model - poloclub/unitable
#pdf #table #content #extraction #llm #machine-learning
Show More
DOMPurify - a DOM-only, super-fast, uber-tolerant XSS sanitizer for HTML, MathML and SVG. DOMPurify works with a secure default, but offers a lot of configurability and hooks. Demo: - cure53/DOMPurify
#javascript #library #dom #content #extraction #purify
Show More
Transform your content into type-safe data collections - sdorra/content-collections
#static-site-generator #content #typing #typescript #library #cms #api
Show More
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments - adbar/trafilatura
#python #content #extraction #html #library #readability
Show More
Article extraction benchmark: dataset and evaluation scripts - scrapinghub/article-extraction-benchmark
#article #extraction #content #readability #benchmark #library
Show More
Jasper - AI Copywriter | AI Content Generator for Teams
www.jasper.ai
📜 Extract meaningful content from the chaos of a web page - parser/README.md at main · postlight/parser
#readability #content #extraction #javascript #library
Show More
Create personalized summaries with Upword
www.upword.ai
Quickly capture key ideas using Upword’s AI-powered notes and create personalized & slick summaries. Upword transforms any content into knowledge. Read, listen and share your summaries.
#capture #content #extraction #ai #annotation
Show More
Collect and share the web
Get started for free