I have generally been very happy about moving my blog to Hugo, except that I have struggled with getting it properly indexed with Google, Bing, etc. One of the benefits of using standard WordPress is that it comes with a lot of automatic functions for such things. In theory, that should also happen with Hugo. However, for some reason, my blog did not work.

After various trials and errors, I think I am getting on the right track. However, one things was missing: content in the meta description part of the blog. I have added a generic meta description for the whole blog, but all SEO (Search Engine Optimization) pages say that you need hand-crafted 155-long descriptions for a page. I can do that moving forward, but not with the 1500+ previous blog posts.

So I turned to ChatGPT for help, and after some back and forth, I came up with this Python script that would run through all my blog posts and add a description in the header based on the first 155 characters of my post:

import re
from pathlib import Path

# Define the directory where your blog posts are stored
content_dir = Path('content')

# Function to extract text for the description
def extract_description(text, max_length=155):
    # Remove markdown links/images, HTML tags, markdown headings, and special characters
    clean_text = re.sub(r'!\[.*?\]\(.*?\)|<[^>]+>|\[.*?\]\(.*?\)|#+', '', text)
    clean_text = re.sub(r'[^A-Za-z0-9 ]+', '', clean_text)  # Keeps only alphanumeric characters and spaces
    clean_text = re.sub(r'\s+', ' ', clean_text).strip()  # Normalize whitespace and strip leading/trailing whitespace
    if len(clean_text) > max_length:
        return clean_text[:max_length].rsplit(' ', 1)[0] + '...'
    return clean_text

# Function to update the markdown file by appending the description to the front matter
def update_file_with_description(md_file_path, front_matter, description):
    # Append the generated description within the TOML front matter
    updated_front_matter = f'---\n{front_matter}\ndescription: "{description}"\n---\n'
    main_content = content[front_matter_match.end():].lstrip()
    updated_content = updated_front_matter + main_content
    
    # Write the updated content back to the file
    with open(md_file_path, 'w', encoding='utf-8') as file:
        file.write(updated_content)
    print(f'Updated {md_file_path}')

# Iterate through the markdown files in the content directory
for md_file in content_dir.rglob('*.md'):
    print(f'Processing file: {md_file}')
    
    with open(md_file, 'r', encoding='utf-8') as file:
        content = file.read()
        
    front_matter_match = re.search(r'^---\s*(.*?)\s*---', content, re.DOTALL)
    if front_matter_match and 'description:' not in front_matter_match.group(1):
        front_matter = front_matter_match.group(1)
        description = extract_description(content[front_matter_match.end():])
        update_file_with_description(md_file, front_matter, description)
    else:
        print(f'No update required or TOML front matter not found in {md_file}')

print('Finished processing all files.')

It is a quick and dirty solution, but at least it is better than nothing.