I’ve been playing with Codec2 and sparse maps.
Sparse maps are an idea I saw from an AI firm ( numenta? ) about how to visualize and filter vector arrays. The basic idea is that you take a vector ( can be a binary vector, but could be a vector of ints ), assign some color value to some numbers, and spread the vector over a 2d map. From this map, you can find some number of clusters, and those clusters are essentially concepts. You build masks of these concepts to see if an output contains the concept. They use it in natural language search.
I took an open source codec called codec2 and build a sparse map of 71 frames of me saying, “ah” and “sh”, and put that into a 640×480 picture from the codec’s 51-bit frames.
So, a frame is a vertical line in the picture. bit 0 is the top, bit 51 is the bottom of the line. Each 9×9 block represents either a 1 or a 0. Red colored 9×9 blocks are 1, Green 9×9 pixel blocks are 0. Frame 0 is the leftmost vertical line, and frame 71 is the rightmost. There are no spaces between the colored blocks, so it looks continuous, but is really discrete blocks. I did this in C#, so there will be byte order flipping issues which I haven’t corrected for ( essentially, BitArray.Copyto(byte) will copy in little-endian order which I then bitshift back into order, even though the bit array is in concatenated bit order — something I’ll fix later, but this error is consistent, so the generated color map is also consistent ).
The results are staggering. Here’s the picture of “ah”
Here’s the picture of “sh”
These maps look interesting — I think a filter masks might be able to detect either:
- my voice.
- the phones being spoken.
Of course, this could be a dead end. I haven’t seen if I can generate masks from this yet — but it looks super interesting, so I thought I’d share.