Public Notes on
View Public Collections
Improved file parsing for LLM’s. Contribute to Filimoa/open-parse development by creating an account on GitHub.

#llm #parser #text #convert #markdown #split #extraction #content #python #library #pdf #ocr

Show More
Amazon Textract is a machine learning (ML) service that uses optical character recognition (OCR) to automatically extract text, handwriting, and data from scanned PDF documents, forms, and tables.

#content #extraction #ocr #pdf #parser #api

Show More
Community maintained fork of pdfminer - we fathom PDF - pdfminer/pdfminer.six

#python #pdf #content #extraction #parser #library

Show More
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. - pymupdf/PyMuPDF

#python #pdf #content #extraction #parser #library

Show More
We’re on a journey to advance and democratize artificial intelligence through open source and open science.

#llm #model #table #pdf #content #extraction

Show More
UniTable: Towards a Unified Table Foundation Model - poloclub/unitable

#pdf #table #content #extraction #llm #machine-learning

Show More
DOMPurify - a DOM-only, super-fast, uber-tolerant XSS sanitizer for HTML, MathML and SVG. DOMPurify works with a secure default, but offers a lot of configurability and hooks. Demo: - cure53/DOMPurify

#javascript #library #dom #content #extraction #purify

Show More
Transform your content into type-safe data collections - sdorra/content-collections

#static-site-generator #content #typing #typescript #library #cms #api

Show More
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments - adbar/trafilatura

#python #content #extraction #html #library #readability

Show More
Article extraction benchmark: dataset and evaluation scripts - scrapinghub/article-extraction-benchmark

#article #extraction #content #readability #benchmark #library

Show More
📜 Extract meaningful content from the chaos of a web page - parser/README.md at main · postlight/parser #readability #content #extraction #javascript #library
Show More
Quickly capture key ideas using Upword’s AI-powered notes and create personalized & slick summaries. Upword transforms any content into knowledge. Read, listen and share your summaries. #capture #content #extraction #ai #annotation
Show More