Preface or Naive Market Overview

Video and audio are the dominant content types nowadays. Not because we attend more video calls during pandemic. Not because the leading mobile apps are becoming video-centric. But because video/audio content engage users for longer, meaning more money from ads – all hail the free market. Since its emergence at the beginning of this century, video blogging has evolved into a new industry with huge potential. Just ask Google how much money social media influencers make.

Call to Arms

How to take advantage of the trending opportunity of vlogging without spending time on recording and editing videos? Python is the answer. Here is where software development comes in and helps automate your routine.

Python is not only for AI, machine learning, data science, data engineering, and web apps. It can be used to create videos with animations and audio. And this is not about face swap or deep fake videos.

Facing the Challenge

Creating automated videos might sound easy but you have to tell the computer how to do it. Don’t count on it if you plan to generate video about your trip to Burning Man without feeding tons of photos. Although there are many solutions to generate random terrain and landscape.

Start simple. Think about video patterns, template, structure of your coming video content. In our case study it's a video dictionary or wordbook with translation and visualization.

Project Skeleton

There are several things needed to create video workbook with translation and visualization:

  1. Words
  2. Phonetic transcription
  3. Translation
  4. Audio Dubbing
  5. Visualization

Words

Traditional paper dictionaries are becoming obsolete. Even top nouns obtained from online dictionaries might look plain. The idea was to understand English corpus used on the Internet.

Here comes Reddit, a social media platform, news aggregator with rich dataset and different thematic groups (subreddits). Finance domain was selected to get all text data about investment, stocks, markets. Wallstreetbets subreddit was best suited for this goal. You might have heard how they fueled Gamestop's price surge recently.

All data was parsed with an official API called PRAW and Selenium framework.

Transcription

IPA is not only a hoppy beer style but also an alphabet. The International Phonetic Alphabet (IPA) represents all the sounds humans produce. The python package eng_to_ipa can easily convert English text into IPA. Cheers!

Translation

There are many free and open source packages to translate text with bulk translation and auto language detection. Googletrans has been chosen for this project.

Audio Dubbing

Same for audio. gTTS package was used to generate audio from text, which is a CLI tool to interface with Google Translate's text-to-speech API. pydub package helps to edit and manage audio files.

Visualization

As for visualization, it can be done in several ways. We tried photo stocks parsing, flickr API but ended up with Google Photos because of more relevant output.

The final word

I’m pleased to present Random Bullet – automated video generator for learning foreign languages.

The videos have been uploaded to YouTube. As for now, only English-Russian translation. Check them out:

I’d appreciate any feedback or thoughts on this project. Write a comment below or contact me on linkedin.