[clug] project, video editing,editing out ads

steve jenkin sjenkin at canb.auug.org.au
Tue Apr 14 21:18:40 MDT 2015

> On 15 Apr 2015, at 9:57 am, Jason Nielsen <j.lee.nielsen at gmail.com> wrote:
> If the station logo / watermark is consistent for the main program and
> then disappears during the ads you might be able to check for that and
> drop every frame without it there.
> Jason Nielsen


Good point.

With digital delivery, we have a very powerful new weapon:

 we can do “forward parsing” then precise identification.
 We can record, scan, then analyse/parse, effectively having perfect knowledge of the future during the analysis phase.

In general, the Google approach of “98% automatic”, gives you massive leverage.
The remaining “2%”, or less, can be very effectively handled manually if needed. Depends on what your success criteria are.

There are some perfect properties of Adverts, Station Identifiers and Promos (including for news):

 1. In Digital, the frames are identical each time the fragment is aired. Similar problem to ‘rsync’, build a fingerprint DB.
 2. Fragments are repeated, often frequently. During scan, look for repeated sequences of frames.
 2. The fragments are close to perfectly timed & identical lengths, often 30-seconds. Helps to bound searches.
 2A. Ad-breaks feature consecutive repeated fragments. Helps identify the start/end of an Ad-break & change algorithm.
 3. Specific Ads are tied to specific times and regions. Your DB increases recognition power.
 4. Ad Campaigns run for scheduled periods, Ads don’t pop-up randomly, or as a ‘single-shot’. Helps declare a segment ‘content’ or not.
 5. Multiple stations on the same Network synchronously broadcast the same programme, but localise Adverts. Easy to ‘diff’ frames.
 6. Stations have reasonably rigid formulas for interspersing fragments and ad-breaks within programmes, varying with Day and time-of-day.
 7. There are ‘cheats and assists’ from the Electronic Programme Guide and broadcast sub-titling. The EPG is often incorrect or imprecise, in my experience. Ads aren’t sub-titled, giving a good leg-up. 

Even if its a programme on “Best & Worst Advertisements”, you should be able to come close to automatically identifying ‘content’ and ‘advert’ portions of the broadcast.

This leads to a variation of the old AI “Alpha-Beta” pruning technique:

- you can definitely _include_ as an Ad any often repeated fragment in your fingerprint DB. The ‘times seen’ count is important. Leading & trailing frames may be missing.
- you can exclude lots of content, like News bulletins and content immediately following an Ad-break.
   - you might be able to identify & leverage repeated programmes. Stations won’t change the timing of ad-breaks, they stream from disk.
- cross-station correlation of Networks (e.g. WIN TV) is an easy cheat. Still have to identify common Promos & ID’s.
- with very few hours of content, you can start to identify the station ‘formula’ and predict Ad-breaks & length to +/- 1-minute
- you can identify a new Ad solely on new material found within an Ad-break.
   - It might take you a few times round to split a mult-Ad run into single Ads. Or just identify 5- and 10-sec fragments.

In the end, having a human scan programme content (at speed) _and_ scan Adverts at real-speed will be a necessary check for v. high accuracy.
It’s surprising what you can notice at 16x and 32x. 8x is a pretty relaxed scanning speed for normal programming.
Sharing the fingerprint DB will leverage individuals efforts considerably. Like PGP, you can assign a trust value to various sources (as the DB _will_ be gamed) and like Junk Email, have a scoring algorithmn, not just blindly accept the word of a single unknown stranger.

The Myth TV folk out there already have a very active community with a lot of resources and a history of co-operation.

Starting a ‘simple’ sharing service for accurate programme overlays should be possible:
- an historical frame-accurate EPG allows people to block-record, then replay later with frame-accurate timing, not lose start/end and auto-skip Ads.
- a communal repeated-fragment DB

The caveat on all this is simple:

- this isn’t a set-and-forget Operation, it’s an on-going evolution, like guns and armour. The industry will adapt to your methods & respond to them.
- be prepared for active opposition. Big Media will take steps to shutdown _any_ cheap & effective Advert Skipping. Their money ensures political influence.

Hope that gives you some ideas.

Preferably, some bright person will be able to a) refute my assertions and b) tell us that it’s already being done somewhere.
Code reuse is so much better than “Reinventing the Wheel”.

Steve Jenkin, IT Systems and Design 
0412 786 915 (+61 412 786 915)
PO Box 48, Kippax ACT 2615, AUSTRALIA

mailto:sjenkin at canb.auug.org.au http://members.tip.net.au/~sjenkin

More information about the linux mailing list