I have generally been very happy about moving my blog to Hugo, except that I have struggled with getting it properly indexed with Google, Bing, etc. One of the benefits of using standard WordPress is that it comes with a lot of automatic functions for such things. In theory, that should also happen with Hugo. However, for some reason, my blog did not work.
After various trials and errors, I think I am getting on the right track. However, one things was missing: content in the meta description part of the blog. I have added a generic meta description for the whole blog, but all SEO (Search Engine Optimization) pages say that you need hand-crafted 155-long descriptions for a page. I can do that moving forward, but not with the 1500+ previous blog posts.
So I turned to ChatGPT for help, and after some back and forth, I came up with this Python script that would run through all my blog posts and add a description in the header based on the first 155 characters of my post:
import re
from pathlib import Path
# Define the directory where your blog posts are stored
content_dir = Path('content')
# Function to extract text for the description
def extract_description(text, max_length=155):
# Remove markdown links/images, HTML tags, markdown headings, and special characters
clean_text = re.sub(r'!\[.*?\]\(.*?\)|<[^>]+>|\[.*?\]\(.*?\)|#+', '', text)
clean_text = re.sub(r'[^A-Za-z0-9 ]+', '', clean_text) # Keeps only alphanumeric characters and spaces
clean_text = re.sub(r'\s+', ' ', clean_text).strip() # Normalize whitespace and strip leading/trailing whitespace
if len(clean_text) > max_length:
return clean_text[:max_length].rsplit(' ', 1)[0] + '...'
return clean_text
# Function to update the markdown file by appending the description to the front matter
def update_file_with_description(md_file_path, front_matter, description):
# Append the generated description within the TOML front matter
updated_front_matter = f'---\n{front_matter}\ndescription: "{description}"\n---\n'
main_content = content[front_matter_match.end():].lstrip()
updated_content = updated_front_matter + main_content
# Write the updated content back to the file
with open(md_file_path, 'w', encoding='utf-8') as file:
file.write(updated_content)
print(f'Updated {md_file_path}')
# Iterate through the markdown files in the content directory
for md_file in content_dir.rglob('*.md'):
print(f'Processing file: {md_file}')
with open(md_file, 'r', encoding='utf-8') as file:
content = file.read()
front_matter_match = re.search(r'^---\s*(.*?)\s*---', content, re.DOTALL)
if front_matter_match and 'description:' not in front_matter_match.group(1):
front_matter = front_matter_match.group(1)
description = extract_description(content[front_matter_match.end():])
update_file_with_description(md_file, front_matter, description)
else:
print(f'No update required or TOML front matter not found in {md_file}')
print('Finished processing all files.')
It is a quick and dirty solution, but at least it is better than nothing.