Imran's personal blog

January 2, 2013

How Anti-Viruses work

Filed under: Uncategorized — ipeerbhai @ 9:59 pm

I used to work in the antivirus industry a long time ago.  I learned back then what many people are just learning today — antivirus software is a fools game.  The problems are two-fold:

1. Assumed local infections.

2. Assumed fast scanning.

The core idea of Antivirus software is to create a database of known bad software, and scan each file in your computer to see if it matches.  Other techniques have been tried — Behavior monitoring, “whitelists”, etc — but none have worked.  Every AV company on earth, right now, uses a “canary in a coal-mine” approach.   Here’s how Anti virus software works.

You install some scanning engine.  This engine does these things:

  • Generates a cryptographic signature of all files on a computer and matches those files to a local database( aka signatures ) of known bad software.
  • Emulates a machine and trys to execute software in a protected environment, in hopes that the software will, “unpack” itself and match a known bad list.
  • Implements a heuristic engine that look for specific events ( API call graphs — call this API, then that API )
  • Trys to statically unpack software and match it to the known bad list.
  • “Phones home” about software it heuristcally detects for human analysis ( From the API call graphs, virtual machine, and sometimes intercept drivers ).
  • Has an execution unit for clean-up scripts.
  • Has a self-integrity cryptographic check of itself and all its system call suppliers.

A “signature” is actually a database of cryptographic “bad” sequences, “good” sequences ( filter out things that detected, but have this good thing in it. ), heuristic sequences( look for these things ).  The rest of an AV product uses this engine to do things or present user information.

But, there’s problems.

  • If the virus never touches your disk, it can never be detected.  This is big problem for “sandbox” execution environments, like web browsers. flash, PDF, etc…
  • If the virus is too new, it is unlikely to be detected.
  • If the virus is too new and too few, it will never be detected.

Some explanation on why:

If the virus never touches your disk, there’s nothing to match.  Even behavior rules don’t help — a known good program ( Say internet explorer ) is doing bad things.  It’s a “Patsy” of a “lifeless” virus.  And, what’s a bad thing?  Downloading a file?  Sending mail?  You’d have to understand the “bad thing” the patsy is doing to understand the problem — which again would be  database of things.

If the virus is too new, it’s not in the database, and unless it’s based off a known virus ( aka a polymorphic virus ), it has little chance of being detected until a human analyst takes a look at it.

If the virus is low in quantity ( espionage virus ), then no human analyst will ever look at the virus — simple economics here.  You’ve got humans doing virus analysis, and you need the signature they write to match as many people as possible.  Hence, low instance count, non-growing software, looks just like a Line of Business application ( billions of them! ), and won’t ever be looked at.

Hence, no tradtional AV software will work against all viruses.  The most common/old will be caught — but new and few will never be caught, even if they’re on disk. ( I haven’t even gotten into semi-disk, hypervisor, Bios/UFI viruses. )  A hypervisor virus would be the most incideous — detecting it would be nearly impossible…

Now, back to those assumptions — I just explained the local infection side of things.  The other side of things — fast scanning.  These signature databases are huge, and, after the initial scan, largely ineffective.  There’s a lot of incentive for AV companies to cheat ( and they do cheat.  Pay close attention to any kernel memory blocks that aren’t counted in user-mode after you install AV software.  You’ll notice a huge increase in system memory taken up, even though only a small amount of user memory is taken up.  AV software is a memory hog, and has to be — you’d be stuck decompressing the signatures all the time any other way ), and for the most part, one of the complaints people have is that scanning isn’t fast.  Which means people don’t do it.  They hope some sort of real-time protection will help them — for example, on file writes, but it’s almost never effective.  If you only scan the file when you write it, and “new and few” viruses are never caught, then you’ve “blessed” a virus, and don’t know it.

So, how do you actually beat viruses?  I’ve examined that question for a long time, and been thinking about human immunity and how to replicate it in a software system.  How does human immunity work?

  • Recognize self — Human immunity “trains” itself to not attack certain surface antigens in a critical period before birth. ( This won’t happen — OEMs won’t do this, as it costs money and provides no immediate benefit. But I’ve figured out a trick to simulate this with an AI, and done an exploratory project to prove to myself it could work)
  • Immobilize DNA — methylation and other poisons are available to inactivate virus transcription, provided the “self” genome is intact in an infected cell.
  •  T-helper cells — these don’t do anything until an infection, then they switch on/transport/kill in B cell switching ( so they start-up the cleaning process )
  • B Cells — these are long term storage, and work in monoclonal antibody production.
  • Memory cells — these are B cells that have multiplied and stick around to prevent re-infection.

Based on this idea, I’ve come up with a new model.  The core of the new model is a virtual scanner/cloud AI agent that functions like the variant B cells, along with local agents that work like T cells.  The beauty of this system is that it requires no analysts at all, and would be effective against “new/few” viruses.  I’ve submitted the idea to DARPA, and I hope it gets some traction there.  This idea is too large for one person to write out — it needs a team.  If any of you out there want to help out or fund this, drop me a note.  I’ll be happy to send you the paper I sent DARPA explaining how the system works and how to create/test it.  The problem — a system like this has never existed before, and though I can enumerate how it works and how to create it, I can’t tell ahead of time if it works at all.  The smallest possible test needs large amounts of data that isn’t currently centralized, since an AI agent needs a lot of data to train.  Hence, it’s a large, expensive, high risk project — exactly what no agency ever is set up to do.  The potential payoff — an AV system that can detect/stop zero day viruses that is far faster than today’s AV technology.  It’s not an evolution ( ha! ), but a revolution.  Like all revolutions, it may work out just dandy, or be an utter failure…

Thanks,

Imran

Advertisements

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: