HTMLQ
Page content
stumpled upon some thing cool, htmlq! It’s like jq, but for HTML.
Installation Rust
htmlq need rust. so, let’s install rust first.
doas pkg_add rust
Add Link to Path
cat << 'EOF' |doas tee -a /etc/profile
# Rust/Cargo
export PATH=$PATH:/root/.cargo/bin
EOF
. /etc/profile
Install HTMLQ
doas cargo install htmlq
some Examples
Extract Links
curl -s https://www.openbsd.org | htmlq --attribute href a |head
Example
user@nixbox$ curl -s https://www.openbsd.org | htmlq --attribute href a |head
goals.html
plat.html
security.html
crypto.html
events.html
innovations.html
faq/faq4.html#Download
anoncvs.html
https://cvsweb.openbsd.org/
https://github.com/openbsd
Extract complete links URL
curl --silent https://www.nytimes.com | htmlq a --attribute href -b https://www.nytimes.com
Example
user@nixhost$ curl --silent https://www.nytimes.com | htmlq a --attribute href -b https://www.nytimes.com |head
https://www.nytimes.com/#site-content
https://www.nytimes.com/#site-index
https://www.nytimes.com/
https://www.nytimes.com/
https://www.nytimes.com/international/?action=click®ion=Editions&pgtype=Homepage
https://www.nytimes.com/ca/?action=click®ion=Editions&pgtype=Homepage
https://www.nytimes.com/es/
https://cn.nytimes.com/
https://myaccount.nytimes.com/auth/login?response_type=cookie&client_id=vi&redirect_uri=https%3A%2F%2Fwww.nytimes.com%2Fsubscription%2Fonboarding-offer%3FcampaignId%3D7JFJX&asset=masthead
Dump HTML Body, Highlight with BAT
curl --silent https://blog.stoege.net | htmlq 'body' | bat --language html
Example
───────┬──────────────────────────────────────────────────────────────────────────────────────────────────────────────────
│ STDIN
───────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ <body class="body">
2 │ <div class="container container--outer">
3 │ <header class="header">
4 │ <div class="container header__container">
5 │
6 │ <div class="logo">
7 │ <a class="logo__link" href="/" rel="home" title="blog-stöge-net">
8 │ <div class="logo__item logo__text">
9 │ <div class="logo__title">blog-stöge-net</div>
10 │ <div class="logo__tagline">BSD is for people who love Unix, Linux is for people who hate Micr
│ osoft</div>
11 │ </div>
12 │ </a>
13 │ </div>
Any Comments ?
sha256: 716535b98edfc4992c8fed3f26938efeeb74e1a8464c9535238dbfad16c0354f