[clug] [long] open text summarizer (was Re: (Article Summary))

Alex Satrapa grail at goldweb.com.au
Thu Nov 25 00:16:57 GMT 2004

Hash: SHA1

On 25 Nov 2004, at 10:38, Kim Holburn wrote:
> On 2004 Nov 25, , at 10:11 AM, Alex Satrapa wrote:
>> Following summary courtesy of Apple's SummaryService (part of Mac OS 
>> X), some editing by me:
> Huh???  I didn't know about that!
> Just out of interest, what settings did you use?  Is there an OS 
> implementation of that thing?

The Short Version

SummaryService is a Utility (/Applications/Utilities) which provides a 
service through the usual Mac OS X means - Application 
menu/Services/Summarize. There's a similar tool available for Linux - 
check out libots0 (but I'm not happy with the summaries it produces).

The Long Version

Apple Computer's SummaryService produces a summary of the selected text 
in a window (or all the text, should none be selected), which is 
displayed in a text box so you can copy the summary to use elsewhere. 
There is a slider under the text box which lets you control the summary 
size (anything from 1 sentence to 100% of the article), and pair of 
radio buttons indicating your preference for a sentence or paragraph 
form summary.

The Open Text Summarizer (the American spelling is part of its name, 
not my laziness) is a command line tool and a library that can be 
linked into other applications (I got that from 'apt-cache show 
libots0). Here is the command-line tool's version of the document:

> alex at rna28:~$ ots --ratio 15 long_article.txt
> It was on November 12, 1990, in a speech at the Comdex/Fall show in 
> Las Vegas, that Microsoft chairman Bill Gates first proclaimed his 
> vision for "information at your fingertips" - software to let people 
> easily find the data they wanted, wherever it was on their computer or 
> office network. It spent $US233 million ($301 million) on R&D since 
> 1998 - just 3.4 per cent of Microsoft's annual R&D budget - yet its 
> market value now tops $US3.7 billion. But when it comes to online 
> search, arguably the hottest technology of the past five years, 
> Microsoft has missed the boat. Microsoft's inability to create 
> leadership in entirely new product areas ... Over the past five years, 
> Microsoft's average patent "intellectual property quotient," or IPQ, 
> was 123, well above average. In the past decade, Microsoft was issued 
> proportionally more original patents than either Intel (71 per cent) 
> or Apple (68 per cent).
> Microsoft's cash cows, which generate 60 per cent of its revenue, 
> Windows and Office, will last forever on someone's PC unless Microsoft 
> comes out with a good reason to force people to buy new versions. 
> "Microsoft was spending over $US100 million a year on research and 
> development on that." Over the past five years, Microsoft spent an 
> average of $US9 million per patent, nearly twice its peer group. By 
> that logic, Microsoft, with its $6.8-billion annual R&D budget, must 
> consider as many as 35,000 new ideas just to find a few hundred worth 
> investing in every year. Even if Microsoft had been able to replicate 
> Google's dominance in search technology, Google's $US1.5 billion in 
> revenue would have lifted Microsoft's own top line by just 4 per cent. 
> As Microsoft forfeits future revenue growth for current income, it 
> continues to cede genuine innovations and important new markets to 
> future upstarts with bigger ideas - and far less to lose.

I think it gets the message a little mixed - the "It" in the second 
sentence is Google, not Microsoft. The rest of the summary tries to 
make sense. Microsoft wasn't, for example, spending over $US100 million 
a year trying to research means of making people buy new versions of 
their software. They were spending far more and disguising their 
research as, "product development" ;)

But Microsoft bashing is off-topic for list and thread, so I'll stop 

Hopefully people will now be aware that there is in fact a summary tool 
available for Linux - I believe it's used by Abiword, amongst others. I 
think it's a little broken, and might be useful for generating the 
rough draft for some human who is familiar with the article to correct. 
Apple Computer's SummaryService - from my experiences with the few 
times I've used it - can be trusted with giving me a summary of an 
article I haven't read yet, so I can determine whether or not I really 
want to. Can you see the difference?


PS: FWIW, both summary services decided that this email was about 
Microsoft's innovation, not the two text summary services.
Version: GnuPG v1.2.3 (Darwin)
Comment: I love my Mac =)


More information about the linux mailing list