Animal Crossing style speech sythesizer

I’ve been putting together some code that takes any sentence and splits it (crudely) into syllables based on the number of vowels and some rules. animal crossing -> an i mal ro si for example. Here’s an example of how animal crossing did it, but with individual letters instead of syllables: https://youtu.be/BCI_J4WBtzk?t=16s

Aside from some small scripts in python, this is my first project and I was wondering if anyone had any suggestions on how to manage the sound files. The plan so far is to build a dictionary so that when a speech request is made, it puts together the individual sound bytes for the syllables in the sentence and does some magic based on character mood, punctuation etc.

My question is about the media itself: should I load all the sound files at loadcontent? or would it be best to load and unload specific syllables when a request is made to the ‘speech engine’?

1 Like

@arlyon

You could try just loading the content at the beginning of the game and then think about optimization if there is a problem. Create a hypothesis (e.g. “I think it’s going to be slow”) then test your code and observe the performance results. You might be surprised from the results you find :slightly_smiling:

Speech doesn’t need a high sample rate, so using a lower sample rate will save some memory. If you’re using the DirectX build of MonoGame, you could also specify a lower compression quality in the SoundEffectProcessor (ADPCM support is still to be finalised in the platforms that use OpenAL).

How many syllables do you support? With a lower sample rate and possibly using ADPCM, you should be able to have all syllables in memory at once without any issues.

I plan to have a large number of two letter syllables, and the most common one-syllable words (this, than, where, why).

It currently searches lists for 4 letter matches -> 3 letter matches - > 2 -> a vowel if it cant find anything else. I cant imagine there would be more than 50 or 100. Would that be low enough to just load into memory (with a reduced sample rate) and forget about?

Thanks for your reply.

You’re probably right. I’ll try different ways and see which is better. I like to get caught up in theorising ‘maximising efficiency’ and waste time instead of just jumping in and trying. Thanks.

@arlyon

I think it’s totally okay to come up with a plan of attack to solving a problem using prior knowledge, but even that needs supporting evidence grounded in the real world. For example consider just the time complexity in a List<T> (resizing array) vs a Dictionary<T> in finding elements by a key. It’s well known for this type of problem that a List<T> has O(n) where n is the size of the List<T>, and Dictionary<T> has O(1). But remember this is just theory; the List<T> could most likely be faster than a Dictionary<T> for small n on a specific machine. The machine can also have other factors that wreck the model that computational complexity can provide in certain scenarios such as the CPU’s cache coherency. At the end of the day for solving problems I find it better just to come up with a crud, “brute-force” if you will, approach. You then having something to show for your work. If there is a problem with that implementation, e.g. it’s too slow on mobile, you can always iterate on it to create a more high-perf solution if you have the time.

I’m currently making my way through some basic software engineering books that goes into the more complex things at length. Thanks for your reply. You’re right about being able to go back and iterate as well.