Imran's personal blog

May 26, 2016

Genetic Algorithms vs Gradient Descent

Filed under: Uncategorized — ipeerbhai @ 8:28 pm

I’ve been working with a BitArray pattern recognition system for sound processing.  I implemented a genetic algorithm with a single-point mutation ability and tested that algorithm against a data set of sounds ( me talking and a recording of violins playing.  The idea is to detect me talking over the violin noise, with the hope of eventually being able to tell speech from noise apart. )

It didn’t work at all.  I could create semi-optimal ( aka local minima ) solutions that could mostly guess me talking vs the violin, but not always.  There was a global solution — by pure chance I hit it a few times where the system worked correctly. ( about 1/10 of the time, I hit the correct global optima, 9/10 I hit a local optima ).

I wanted to see if I could evolve the local optima detectors to the global.  With a SNIP mutation, it didn’t work ( though I hypothesized it should work some of the time.  The global optima is a single bit, bit 47, being false in the encoded samples. )

I calculated from this the number of mutations to get to global optima from all suboptimal solutions.  I calculate at least 4 to around 7 serial snips, with add/deletion being far more valuable than transpose.

Cost tracking indicates that the global optima takes 318,000 if/then tests to achieve in a good case.  ( 500 sample points in the space — small data… )

I have no idea what gradient descent would take here.  But, I now know an appropriate DNN topography to guess correctly.   25 samples in input layer, 25 neurons + bias, and 1 output neuron should simulate my genetic selector.  Then I can tell what is more efficient.  I’m suspecting that poly-snip genetic would work, along with 25 neuron DNN.  I’ll have to impliment both and see which is more efficient DNN or Genetic?

May 9, 2016

What am I up to in 2016?

Filed under: Uncategorized — ipeerbhai @ 3:01 pm

For the past month, I’ve been working on a machine learning program, accidentally.

A year or so ago, I wrote a little app that uses cloud AI to do language translation.  It worked!  Only for me!  See, I grew up in the American Midwest.  I actually went to the University of Nebraska for a while.  I speak broadcast perfect — I could be a news anchorperson.  I also understand AI.  In machine translation, I understand that it’s just “transcoding” based on word frequency, Kenneth. This means I can have this kind of conversation with myself:

“How many dogs do you have?”/ “I have two dogs”.

So, because of these factors, I can use a translation AI without problem.  But I often interact with people who are older, have strong accents, and don’t really understand the processing time and optimal speech patterns for cloud machine translation.  They speak differently:

“How many dogs do you have?” / “two”.

Fragmented, fast, impatient, and ambiguous.  A machine system won’t handle this conversation well.  The accented, older human is now just frustrated with the thing.  They didn’t have enough clue from the system of what was going on, and it took too long for it to work.  They want “Effortless” translation, or they don’t believe/trust it at all.

So, I wanted to solve the problem of conversational translation along with a slew of other problems like contact search.  Thus, I stepped through the looking glass and decided it was time I learned AI development.  I went looking for frameworks, and discovered Encog, a C# neural network/ ML framework, and played around with it.  I discovered the amount of featurization and pre-processing for sound NN was higher/harder than I liked.  It could be done, but only with a metric tonne of labeled data — data I don’t have.

So, I looked at “small data” ideas.  One that interested me was the two-dimensional vector field learner that Numenta has.  I began a pure C# implementation ( I normally don’t code in C# because I hate UWP — but this kind of project uses old .NET APIS and no UWP).  And along the way, it hit me — this two dimensional learner was a neural network, and machine learning is really just pattern recognition.  The sparse maps are like labels — another way of saying, “Like these, not those”.  The two dimensional field could be represented by a vector of A elements, where A = M x N of the original field.

But there’s power in the representation that I hadn’t expected.  Turns our that viewing NN as a two dimensional vector and using masking leads to easier human understanding of what the heck is actually going on in the system.  And this leads to new ideas ( which I’m not ready to share yet, because they’re possibly insane ).

Now a days, I’m developing out the system because it’s intellectually engaging.  I’ve started from Ideas, seen how they work in existing frameworks, then moved and maybe improved those ideas into my own framework — because I believe “if you don’t build it, you don’t understand it”.  My framework is woefully incomplete.  It will always create a pattern based on the least significant bits.  It’s easy to fool, and doesn’t use enough horizontal data when building masks.  But it can do something amazing — it can tell apart two sounds with exactly one sample of each sound, and does so without a label.

And that’s not the most exciting part!  As I’ve been playing with these ideas, a new one has emerged about how to stack and  parallelize the detectors and make an atemporal representation of sound streams.  This seems to match what Noam Chomsky says about how human “Universal Grammar” must work. If this idea pans out ( and it’s maybe months of implementation time to find out ), then there’s a small chance that I’ll figure out some part of the language translation problem.

All that excitement is tempered by the fact that I have limited time.  Eventually, I’ll run out of money, and thus time, to do this research.  So the problems I must solve are:

  1. Can I build a framework that’s able to solve the problems I’m interested in?
  2. If not, can the pattern detectors solve problems others are interested in?
  3. Can I sell something from this system to fund my own time?

Anyways, that’s what I’ve been up to recently.

Echo needs a competitor

Filed under: Uncategorized — ipeerbhai @ 2:04 pm

One thing I learned working in big tech — there’s always someone watching.

Take the Amazon Alexa.  You can bet the big 4 tech firms are watching Amazon and trying to decide if they’ll make technology to compete.  And I really wish they would.  I have an echo and love it, but programming for the Echo is crap.

Why is programming for the echo crap?  So many issues:

  1. provisioning services is a nightmare.  You do’t even know what services you need to provision, much less have access to a configuration file.  Lots of AWS console pages — lots — to get to hello world.
  2. No audio stream.  If you want to make a phone app, forget it.  Amazon won’t give you the voice data.  There’s AVS that you can use to send voice you capture to Alexa — but there’s no access to the voice in the Echo.
  3. 90 second, fixed format playback from the API.  You literally chunk everything as 90 second long mp3s.
  4. NodeJS.  Voice is not web, and the stateless nature of web design makes no sense in voice apps.  The biggest issue is that your app will respond to any of the registered commands in any sequence.  Conversations, however, are always sequential.  It’s just the wrong language for the job.
    1. NodeJS, outside the web, is sort of a problem.  There’s real harm in imposing the async paradigm on problems that are much simpler to read in a stateful manner.
    2. And not just any NodeJS — you can’t write the code in your own editor.  Amazon wants to make sure they own the coding platform, and you have to write Alexa code in their web editor.
  5. Can’t really sell what you make.  Amazon won’t let you monetize the actual ASK — instead, you have to sell something else, like an unlock code, on Android.
  6. AVS platform lock — AVS is essentially only available for Linux/Android.  If you want to use AVS on PC/Mac, well, you’re SOL.
  7. Overly cloudy.  I’m not a fan of the cloud, because it adds complexity that doesn’t need to be there.  But Alexa takes the cake on too much cloud for no reason.  Can’t write the code on local system — must be in browser.  Can’t run any part on your own hardware — must run in AWS.  Evey instance requires a lambda spin-up.  Can’t sell what you make.  Developers give too much control away when using Alexa.

My team won the Echo prize in the recent Seattle VR Hack-a-thon.  The team at Amazon is amazing, and echo is an amazing product.  Again, I own and love my echo.  But without a competitor, the developer experience is really sub-par.  I also don’t like these cloud companies forcing devs to lock in to them — can’t even use your own editor?  Come on!

So that’s my argument — that Echo needs competition from the big tech companies.  Sure, some start-up can make a great echo-like product with a better developer experience.  There are small-shop products that make similar products that I run across on KickStarter/Indiegogo.  But those companies are vertically focused — no developer experience at all — where the big 4 make APIs…

Create a free website or blog at WordPress.com.