Python - Little Wordcloud
Page content
Do you like Word Clouds ?
I do …!
following a litte Script which Parse a Website and build a appropriate Word Cloud
Script
mkdir ~/mywordcloud; cd ~/mywordcloud
cat <<'EOF' > main.py
import fire
import matplotlib.pyplot as plt
import pandas as pd
import re
import requests
from bs4 import BeautifulSoup
from wordcloud import STOPWORDS, WordCloud
def gen_cloud_tag(url: str = "https://blog.stoege.net"):
# add https
if not url.startswith("https://"):
url = "https://" + url
# get Webpage
response = requests.get(url, timeout=5, allow_redirects=True)
soup = BeautifulSoup(response.text, "html.parser")
words = soup.get_text()
# split with multiple delimiters
words = re.split(r"[\n\r]", words)
# build Dataframe
df = pd.DataFrame(words)
# Stop Words
comment_words = ""
stopwords = set(STOPWORDS)
# iterate
for val in df.values:
# typecaste each val to string
val = str(val)
# split the value
tokens = val.split()
# Converts each token into lowercase
for i in range(len(tokens)):
tokens[i] = tokens[i].lower()
comment_words += " ".join(tokens) + " "
# Build Wordcloud
wordcloud = WordCloud(
width=800,
height=800,
background_color="white",
stopwords=stopwords,
min_font_size=10,
).generate(comment_words)
# Build Image
plt.figure(figsize=(8, 8), facecolor=None)
plt.imshow(wordcloud)
plt.axis("off")
plt.tight_layout(pad=0)
# show Image
plt.show()
if __name__ == "__main__":
fire.Fire(gen_cloud_tag)
EOF
Init Project
you need a few python libraries. use some virtual env like venv, poetry or whatever your want
poetry init --python=^3.10 -q
poetry env use $(which python3.10)
poetry lock
poetry shell
Install Packages in Venv
poetry add wordcloud matplotlib pandas requests bs4 fire
poetry install
Run Script
poetry run python main.py blog.stoege.net
https://www.compass-security.com
https://www.ost.ch
Any Comments ?
sha256: f1f525c3332fe272c5de2f7456519ed0f227d8d5e24a814b4c89d426dffca72a