Imran's personal blog

September 22, 2016

Voice Matters

Filed under: Uncategorized — ipeerbhai @ 6:56 pm

Experian just released a survey of 180 Echo owners about their Echo experience.  You can read the report here.  It showed some great findings — NPS of the echo was 19 — very high, but not extreme ( Google Chrome, for example, is around 35 ). Most impressive is that 35% of echo owners are shopping online, right now, with voice!  This means people like the echo and spend money with it.

Experian believes that Voice is now entering the, “Early adopter” phase of the hype cycle.  I’m surprised that it’s taken voice so long to get to this phase — but I’m an early adopter, having used the echo since their late betas.  I also have a VR setup, and I code for a living.

When voice dialers ( Siri, call my wife! ) became mainstream, they changed the world.  I use one every day I make a phone call.  This gave me hands free ability I use when I drive, and voice dialing is a mainstream use case.  I fully expect voice computing to go mainstream, and the market to grow here in leaps and bounds.

 

Advertisements

June 23, 2016

Learn OpenSCAD

Filed under: Uncategorized — ipeerbhai @ 10:21 pm

3d-tiara_preview_featured

Want to learn to make 3d objects like this cool Tiara from Adafruit?  Come to the OpenSCAD class at Metrix Create Space on Aug 4,2016.  We’ll go over the basics of drawing 3d objects by describing them in an open-source, free,  C-like language.

Register here:

http://www.metrixcreatespace.com/store/openscad-84

June 7, 2016

results of Genetic Algo vs NN

Filed under: Uncategorized — ipeerbhai @ 7:29 pm

For my voice AI project, I’ve been looking at genetic algorithms and neural nets.  I wrote a gate array learner and created a truth table of 4 input points and 2 output points.  I knew, ahead of time, that 2 XOR gates wired to inputs 1,2 and 3,4 respectively would perfectly fit the space.

I then wrote a genetic algorithm to solve the space problem, and counted how many times the algorithm tried to solve the problem before succeeding.

The algorithm tried between 811 and about 28,000  times before solving the space.  A Neural Net solved the same problem in between 43 and never times, even when given the same number of nodes.  A massively overfitted neural net of 2000 nodes in the hidden layer converged far faster than a low overfitted network of only 5 nodes in the hidden layer.

So, I’d probably call NN the winner — but only when massively overt-fitted.

May 26, 2016

Genetic Algorithms vs Gradient Descent

Filed under: Uncategorized — ipeerbhai @ 8:28 pm

I’ve been working with a BitArray pattern recognition system for sound processing.  I implemented a genetic algorithm with a single-point mutation ability and tested that algorithm against a data set of sounds ( me talking and a recording of violins playing.  The idea is to detect me talking over the violin noise, with the hope of eventually being able to tell speech from noise apart. )

It didn’t work at all.  I could create semi-optimal ( aka local minima ) solutions that could mostly guess me talking vs the violin, but not always.  There was a global solution — by pure chance I hit it a few times where the system worked correctly. ( about 1/10 of the time, I hit the correct global optima, 9/10 I hit a local optima ).

I wanted to see if I could evolve the local optima detectors to the global.  With a SNIP mutation, it didn’t work ( though I hypothesized it should work some of the time.  The global optima is a single bit, bit 47, being false in the encoded samples. )

I calculated from this the number of mutations to get to global optima from all suboptimal solutions.  I calculate at least 4 to around 7 serial snips, with add/deletion being far more valuable than transpose.

Cost tracking indicates that the global optima takes 318,000 if/then tests to achieve in a good case.  ( 500 sample points in the space — small data… )

I have no idea what gradient descent would take here.  But, I now know an appropriate DNN topography to guess correctly.   25 samples in input layer, 25 neurons + bias, and 1 output neuron should simulate my genetic selector.  Then I can tell what is more efficient.  I’m suspecting that poly-snip genetic would work, along with 25 neuron DNN.  I’ll have to impliment both and see which is more efficient DNN or Genetic?

May 9, 2016

What am I up to in 2016?

Filed under: Uncategorized — ipeerbhai @ 3:01 pm

For the past month, I’ve been working on a machine learning program, accidentally.

A year or so ago, I wrote a little app that uses cloud AI to do language translation.  It worked!  Only for me!  See, I grew up in the American Midwest.  I actually went to the University of Nebraska for a while.  I speak broadcast perfect — I could be a news anchorperson.  I also understand AI.  In machine translation, I understand that it’s just “transcoding” based on word frequency, Kenneth. This means I can have this kind of conversation with myself:

“How many dogs do you have?”/ “I have two dogs”.

So, because of these factors, I can use a translation AI without problem.  But I often interact with people who are older, have strong accents, and don’t really understand the processing time and optimal speech patterns for cloud machine translation.  They speak differently:

“How many dogs do you have?” / “two”.

Fragmented, fast, impatient, and ambiguous.  A machine system won’t handle this conversation well.  The accented, older human is now just frustrated with the thing.  They didn’t have enough clue from the system of what was going on, and it took too long for it to work.  They want “Effortless” translation, or they don’t believe/trust it at all.

So, I wanted to solve the problem of conversational translation along with a slew of other problems like contact search.  Thus, I stepped through the looking glass and decided it was time I learned AI development.  I went looking for frameworks, and discovered Encog, a C# neural network/ ML framework, and played around with it.  I discovered the amount of featurization and pre-processing for sound NN was higher/harder than I liked.  It could be done, but only with a metric tonne of labeled data — data I don’t have.

So, I looked at “small data” ideas.  One that interested me was the two-dimensional vector field learner that Numenta has.  I began a pure C# implementation ( I normally don’t code in C# because I hate UWP — but this kind of project uses old .NET APIS and no UWP).  And along the way, it hit me — this two dimensional learner was a neural network, and machine learning is really just pattern recognition.  The sparse maps are like labels — another way of saying, “Like these, not those”.  The two dimensional field could be represented by a vector of A elements, where A = M x N of the original field.

But there’s power in the representation that I hadn’t expected.  Turns our that viewing NN as a two dimensional vector and using masking leads to easier human understanding of what the heck is actually going on in the system.  And this leads to new ideas ( which I’m not ready to share yet, because they’re possibly insane ).

Now a days, I’m developing out the system because it’s intellectually engaging.  I’ve started from Ideas, seen how they work in existing frameworks, then moved and maybe improved those ideas into my own framework — because I believe “if you don’t build it, you don’t understand it”.  My framework is woefully incomplete.  It will always create a pattern based on the least significant bits.  It’s easy to fool, and doesn’t use enough horizontal data when building masks.  But it can do something amazing — it can tell apart two sounds with exactly one sample of each sound, and does so without a label.

And that’s not the most exciting part!  As I’ve been playing with these ideas, a new one has emerged about how to stack and  parallelize the detectors and make an atemporal representation of sound streams.  This seems to match what Noam Chomsky says about how human “Universal Grammar” must work. If this idea pans out ( and it’s maybe months of implementation time to find out ), then there’s a small chance that I’ll figure out some part of the language translation problem.

All that excitement is tempered by the fact that I have limited time.  Eventually, I’ll run out of money, and thus time, to do this research.  So the problems I must solve are:

  1. Can I build a framework that’s able to solve the problems I’m interested in?
  2. If not, can the pattern detectors solve problems others are interested in?
  3. Can I sell something from this system to fund my own time?

Anyways, that’s what I’ve been up to recently.

Echo needs a competitor

Filed under: Uncategorized — ipeerbhai @ 2:04 pm

One thing I learned working in big tech — there’s always someone watching.

Take the Amazon Alexa.  You can bet the big 4 tech firms are watching Amazon and trying to decide if they’ll make technology to compete.  And I really wish they would.  I have an echo and love it, but programming for the Echo is crap.

Why is programming for the echo crap?  So many issues:

  1. provisioning services is a nightmare.  You do’t even know what services you need to provision, much less have access to a configuration file.  Lots of AWS console pages — lots — to get to hello world.
  2. No audio stream.  If you want to make a phone app, forget it.  Amazon won’t give you the voice data.  There’s AVS that you can use to send voice you capture to Alexa — but there’s no access to the voice in the Echo.
  3. 90 second, fixed format playback from the API.  You literally chunk everything as 90 second long mp3s.
  4. NodeJS.  Voice is not web, and the stateless nature of web design makes no sense in voice apps.  The biggest issue is that your app will respond to any of the registered commands in any sequence.  Conversations, however, are always sequential.  It’s just the wrong language for the job.
    1. NodeJS, outside the web, is sort of a problem.  There’s real harm in imposing the async paradigm on problems that are much simpler to read in a stateful manner.
    2. And not just any NodeJS — you can’t write the code in your own editor.  Amazon wants to make sure they own the coding platform, and you have to write Alexa code in their web editor.
  5. Can’t really sell what you make.  Amazon won’t let you monetize the actual ASK — instead, you have to sell something else, like an unlock code, on Android.
  6. AVS platform lock — AVS is essentially only available for Linux/Android.  If you want to use AVS on PC/Mac, well, you’re SOL.
  7. Overly cloudy.  I’m not a fan of the cloud, because it adds complexity that doesn’t need to be there.  But Alexa takes the cake on too much cloud for no reason.  Can’t write the code on local system — must be in browser.  Can’t run any part on your own hardware — must run in AWS.  Evey instance requires a lambda spin-up.  Can’t sell what you make.  Developers give too much control away when using Alexa.

My team won the Echo prize in the recent Seattle VR Hack-a-thon.  The team at Amazon is amazing, and echo is an amazing product.  Again, I own and love my echo.  But without a competitor, the developer experience is really sub-par.  I also don’t like these cloud companies forcing devs to lock in to them — can’t even use your own editor?  Come on!

So that’s my argument — that Echo needs competition from the big tech companies.  Sure, some start-up can make a great echo-like product with a better developer experience.  There are small-shop products that make similar products that I run across on KickStarter/Indiegogo.  But those companies are vertically focused — no developer experience at all — where the big 4 make APIs…

April 21, 2016

codec2 sparse map

Filed under: Uncategorized — ipeerbhai @ 12:33 am

I’ve been playing with Codec2 and sparse maps.

Sparse maps are an idea I saw from an AI firm ( numenta? )  about how to visualize and filter vector arrays.  The basic idea is that you take a vector ( can be a binary vector, but could be a vector of ints ), assign some color value to some numbers, and spread the vector over a 2d map.  From this map, you can find some number of clusters, and those clusters are essentially concepts.  You build masks of these concepts to see if an output contains the concept.  They use it in natural language search.

I took an open source codec called codec2 and build a sparse map of 71 frames of me saying, “ah” and “sh”, and put that into a 640×480 picture from the codec’s 51-bit frames.

So, a frame is a vertical line in the picture.  bit 0 is the top, bit 51 is the bottom of the line.  Each 9×9 block represents either a 1 or a 0.  Red colored 9×9 blocks are 1, Green 9×9 pixel blocks are 0.  Frame 0 is the leftmost vertical line, and frame 71 is the rightmost.  There are no spaces between the colored blocks, so it looks continuous, but is really discrete blocks.  I did this in C#, so there will be byte order flipping issues which I haven’t corrected for ( essentially, BitArray.Copyto(byte[]) will copy in little-endian order which I then bitshift back into order, even though the bit array is in concatenated bit order — something I’ll fix later, but this error is consistent, so the generated color map is also consistent ).

The results are staggering.  Here’s the picture of “ah”

imran_ah

Here’s the picture of “sh”

test

These maps look interesting — I think a filter masks might be able to detect either:

  1. my voice.
  2. the phones being spoken.

Of course, this could be a dead end.  I haven’t seen if I can generate masks from this yet — but it looks super interesting, so I thought I’d share.

 

March 8, 2016

Why UWP must die

Filed under: Uncategorized — ipeerbhai @ 1:46 am

There’s a brew-haha going on about UWP.  I hate UWP.  Here’s why:

  1.  UWP is a dangerous fork of .NET.
    1. MS has not been keeping .NET non-UWP up to date.  For example, the desktop Cortana APIs are UWP locked.  But, you can use “Cortana” via azure in a convoluted way, or you can use straightforward APIs within UWP.  But, you can’t use Cortana in a straighforward way in .NET.
    2. Even when the API is in both UWP and .NET, the documentation is not updated for .NET. I’ve run into cases where the docs are UWP only, and the .NET version of the API has a different calling convention.
  2. UWP detracts from .NET improvement.  MS is using too much developer time keeping two forked APIs that do the same thing.  Nothing is stopping MS from updating .NET and bringing it to all platforms.  Nothing is stopping MS from making store APIs part of .NET.  .NET already supports strong cryptography, include strong-name signed dlls — everything that UWP is supposed to solve, .NET already provides.
  3. I hate developing UWP.  So much, that I’ve abandoned .NET development.  All new dev work I do is in NodeJS.  This is because UWP keeps creeping into things.  Starting VS?  You get pestered with ,”Where’s this month’s license?” even in VS community.

I love C# and .NET — really, I do.  I *want* to develop in the MS stack.  UWP has driven me away.  I can’t trust APIs I want are present.  I can’t trust the API docs.  I can’t get away from hassles.  Don’t get me started the annoyance of things like NuGet ( how do you debug a NuGet package deployment failure?  That’s a nightmare…  npm is so easy — rd /s /q node_modules and npm install… ) and the reducing number of devs.

To get me to reconsider the MS platform as a serious developer platform, UWP must die.

February 28, 2016

Tech’s diversity problem

Filed under: Uncategorized — ipeerbhai @ 8:32 pm

The New York Times recently ran a story addressing Tech’s diversity problem:

http://www.nytimes.com/2016/02/28/magazine/is-blind-hiring-the-best-hiring.html

In the story, they wrote about similar problems the Boston Symphony Orchestra had with diversity back in the 60s.  Here’s the bit I find fascinating — The Boston Symphony of that time rarely got female applicants, but when they switched to anonymous auditions and started hiring more women as a result, they started to get more women applicants, too!  It seems people are rational — they won’t apply to something if they believe they won’t get in.

In the tech community, there’s a lack of diversity — with many women and some racial minorities not applying to positions.  I’ve always thought this — why would they apply if they know (1)they’re less likely to get in; and (2) are less likely to advance?

This feedback effect — lack of diversity causes lack of applicants which causes more lack of diversity — is a loop that must be broken.  Many people I’ve talked to in tech blame the, “But X group never applies to our positions!  We’d hire them!” as a “true excuse” for not hiring diversity.  The statement is true — many positions open don’t have diverse applicants — yet the underlying cause is the existing lack of diversity.  Big tech would actually have to have reverse discrimination in place to counteract the existing structural problems — but big tech believes in the myth of meritocracy ( which I do not believe in — As Adam Smith pointed out hundreds of years ago, people are more similar in ability as individuals than different. ), and simply cannot see the forest for the trees.

This structural problem explains a lot.  Why are women good at math until the 3rd grade?  For the same reason that pre-school kids normalize achievement when they reach 3rd grade — that’s when there’s enough cognitive ability in a human to see structural bias.  The girls see the structural bias against them in society and redirect their efforts to where their payoff likelihood maximizes relative to others making the same choices.  This is a weird concept — Let’s pretend you’re going to be a “code Janitor”.  This “code janitor” is the idea of the worst job you can have as a developer, whatever it may be, in your company.  It likely is still well paid relative to a receptionist.  So, the purely rational choice would be to strive towards the code janitor position instead of the receptionist.  So, why are women and minorities more likely to strive to being the receptionist?  Because they have a more fair chance at getting the entry position in reception and can advance to the pinnacle of the field unfettered — whereas, as a developer, they’ll face higher hurdles to entry and advancement.  Because humans judge themselves relatively — a high-level receptionist will judge himself against low-level receptionists — it is rational to strive towards reception instead of technology.

The same applies to the pre-school kids ( who are educationally advanced beyond other 3rd graders ), who see the structural bias caused by normalized grading, and adjust their efforts.  These effects show up universally in 3rd grade because humans essentially gain cognitive abilities at very similar rates until they succumb to the incentives in their environment.

Thus, in tech, there will be lots of subsidies thrown at ineffective solutions to the diversity problem — because the core problem is structural, and humans intelligent — the amount of money thrown at education and diversity efforts are too little compared to the expected lifetime earnings differential a woman or minority expects to see.

February 18, 2016

Feed Forward NN in Matlab

Filed under: Uncategorized — ipeerbhai @ 8:21 am

Matlab is interesting because of the emphasis on vector math.  I’ve been looking at a simple feedforward vector matrix for neural networks in matlab.  Here’s the basic concepts of how to impliment one ( so I can do it again if I ever need to… )

Pretend I’m given a 3 layer network, 1 input layer, 1 hidden layer, and 1 output layer.

The function prototype is predict(t1, t2, X)

where t1 is a matrix (a,b) with a = the number of neurons in the next layer, and b being the number of predictors + 1 for each sigmoid activation function.

and where t2 is a matrix ( c, d) with c = the number of neurons in the output layer, and d being the number of predictors + 1 for each sigmoid activation function.

The number of entities we need to make predictions for is size(X,1);

Here’s a simple for loop to run the weights with a bias neuron in both Input and Hidden layers:

for thisEntity = 1:size(X,1)

thisInputLayerAsVector = [1; X(thisEntity , :)’]; % bias neuron + inputs.

% next, need to feed this forward to the hidden layer.

FeedForwardToHidden = [1; sigmoid(t1 * thisInputLayerAsVector)]; %bias + sigmoid of first weights.

FeedForwardToOutput = sigmoid(t2 * FeedForwardToHidden); % output to the number of final classifiers.

After you run this, FeedForwardToOutput will contain a vector “score” for a single entity line, with “1” in the “macthing” positions of this vector, and “0” in the non-matching.  Ideally, you should only have one “1” and the rest “0” for multi-class classifications, but that’s a function of training, not this math to compute the forward values of the NN.  Now, you’d need to figure out how to convert this score vector to something that makes sense to your use case.

« Previous PageNext Page »

Create a free website or blog at WordPress.com.