Public Notes on
View Public Collections
Get unified metadata from websites using Open Graph, Microdata, RDFa, Twitter Cards, JSON-LD, HTML, and more. - microlinkhq/metascraper

#metadata #html #extraction #opengraph

Show More
Improved file parsing for LLM’s. Contribute to Filimoa/open-parse development by creating an account on GitHub.

#llm #parser #text #convert #markdown #split #extraction #content #python #library #pdf #ocr

Show More
Amazon Textract is a machine learning (ML) service that uses optical character recognition (OCR) to automatically extract text, handwriting, and data from scanned PDF documents, forms, and tables.

#content #extraction #ocr #pdf #parser #api

Show More
Community maintained fork of pdfminer - we fathom PDF - pdfminer/pdfminer.six

#python #pdf #content #extraction #parser #library

Show More
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. - pymupdf/PyMuPDF

#python #pdf #content #extraction #parser #library

Show More
We’re on a journey to advance and democratize artificial intelligence through open source and open science.

#llm #model #table #pdf #content #extraction

Show More
UniTable: Towards a Unified Table Foundation Model - poloclub/unitable

#pdf #table #content #extraction #llm #machine-learning

Show More
DOMPurify - a DOM-only, super-fast, uber-tolerant XSS sanitizer for HTML, MathML and SVG. DOMPurify works with a secure default, but offers a lot of configurability and hooks. Demo: - cure53/DOMPurify

#javascript #library #dom #content #extraction #purify

Show More
fast python port of arc90's readability tool, updated to match latest readability.js! - buriy/python-readability

#python #readability #library #extraction

Show More
Minimal keyword extraction with BERT. Contribute to MaartenGr/KeyBERT development by creating an account on GitHub.

#keyword #extraction #embedding #transformer

Show More
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments - adbar/trafilatura

#python #content #extraction #html #library #readability

Show More
Article extraction benchmark: dataset and evaluation scripts - scrapinghub/article-extraction-benchmark

#article #extraction #content #readability #benchmark #library

Show More
📜 Extract meaningful content from the chaos of a web page - parser/README.md at main · postlight/parser #readability #content #extraction #javascript #library
Show More
Quickly capture key ideas using Upword’s AI-powered notes and create personalized & slick summaries. Upword transforms any content into knowledge. Read, listen and share your summaries. #capture #content #extraction #ai #annotation
Show More
Smort.io www.smort.io
Smort lets you easily edit, annotate and share articles. Read better. Become Smort. #highlight #readability #article #extraction #pub
Show More
📜 Extract meaningful content from the chaos of a web page - postlight/mercury-parser #readability #nodejs #extraction #pub
Show More
A Firefox and Google Chrome extension to clip websites and download them into a readable markdown file. - deathau/markdownload: A Firefox and Google Chrome extension to clip websites and download them into a readable markdown file. #markdown #clipping #extraction #chrome #extension #pub
Show More
Plask is a browser-based AI motion capture tool and animation editor. With any camera, creators can digitize their movements, automate animation work, collaborate with colleagues, and export them all on one platform. #animation #ai #motion #capture #extraction #pub
Show More