Public Notes
on
histre
Get unified metadata from websites using Open Graph, Microdata, RDFa, Twitter Cards, JSON-LD, HTML, and more. - microlinkhq/metascraper
#metadata #html #extraction #opengraph
Show More
Improved file parsing for LLM’s. Contribute to Filimoa/open-parse development by creating an account on GitHub.
#llm #parser #text #convert #markdown #split #extraction #content #python #library #pdf #ocr
Show More
OCR Software, Data Extraction Tool - Amazon Textract - AWS
aws.amazon.com
Amazon Textract is a machine learning (ML) service that uses optical character recognition (OCR) to automatically extract text, handwriting, and data from scanned PDF documents, forms, and tables.
#content #extraction #ocr #pdf #parser #api
Show More
Document AI | Google Cloud
cloud.google.com
Community maintained fork of pdfminer - we fathom PDF - pdfminer/pdfminer.six
#python #pdf #content #extraction #parser #library
Show More
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. - pymupdf/PyMuPDF
#python #pdf #content #extraction #parser #library
Show More
microsoft/table-transformer-detection · Hugging Face
huggingface.co
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
#llm #model #table #pdf #content #extraction
Show More
UniTable: Towards a Unified Table Foundation Model - poloclub/unitable
#pdf #table #content #extraction #llm #machine-learning
Show More
DOMPurify - a DOM-only, super-fast, uber-tolerant XSS sanitizer for HTML, MathML and SVG. DOMPurify works with a secure default, but offers a lot of configurability and hooks. Demo: - cure53/DOMPurify
#javascript #library #dom #content #extraction #purify
Show More
fast python port of arc90's readability tool, updated to match latest readability.js! - buriy/python-readability
#python #readability #library #extraction
Show More
Minimal keyword extraction with BERT. Contribute to MaartenGr/KeyBERT development by creating an account on GitHub.
#keyword #extraction #embedding #transformer
Show More
Easy Scraper
easyscraper.com
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments - adbar/trafilatura
#python #content #extraction #html #library #readability
Show More
Article extraction benchmark: dataset and evaluation scripts - scrapinghub/article-extraction-benchmark
#article #extraction #content #readability #benchmark #library
Show More
extractGPT - Chrome Web Store
chrome.google.com
Scrape and Monitor Data from Any Website with No Code
www.browse.ai
📜 Extract meaningful content from the chaos of a web page - parser/README.md at main · postlight/parser
#readability #content #extraction #javascript #library
Show More
Create personalized summaries with Upword
www.upword.ai
Quickly capture key ideas using Upword’s AI-powered notes and create personalized & slick summaries. Upword transforms any content into knowledge. Read, listen and share your summaries.
#capture #content #extraction #ai #annotation
Show More
Smort.io
www.smort.io
Smort lets you easily edit, annotate and share articles. Read better. Become Smort.
#highlight #readability #article #extraction #pub
Show More
📜 Extract meaningful content from the chaos of a web page - postlight/mercury-parser
#readability #nodejs #extraction #pub
Show More
A Firefox and Google Chrome extension to clip websites and download them into a readable markdown file. - deathau/markdownload: A Firefox and Google Chrome extension to clip websites and download them into a readable markdown file.
#markdown #clipping #extraction #chrome #extension #pub
Show More
Plask is a browser-based AI motion capture tool and animation editor. With any camera, creators can digitize their movements, automate animation work, collaborate with colleagues, and export them all on one platform.
#animation #ai #motion #capture #extraction #pub
Show More
Collect and share the web
Get started for free