Saturday, October 22, 2011

Online Docs / Documentation

Idea is to have a  holy grail  documentation system , which can vastly improve the accessibility/usage of documents in a  company/education institute etc..

Key requirements to satisfy:
  - fast to access ( most of the content as one web page with href  # ?)
 - well organised ( easy to comprehend )
 - easy to author (   )
 - easy to edit ( on the page itself , with a popup window? / Ajax call to server )

some good templates to consider
 1/  OpenJDK http://openjdk.java.net ( Adv:  no flicker , huge prod. tested documentation sytle ( big openjdk community )
    - compare with SVG open 2011 workshop page ( no flicker, no links )
 2/  level TAB base http://keith-wood.name/gChart.html   ( all content single page )
 3/ Sphinx ( for content Authoring ) : http://sphinx.pocoo.org/tutorial.html#documenting-objects ( it has server , Docutil )
      - used by our fav. PhD student
 4/ wikia : http://japaneserecipes.wikia.com/wiki/Azuki_Ice_Cream
     - this this kind of  too many edit may not be good for Programmers

- JDOM  http://jdom.org/docs/faq.html  ( some other simple Styles )
http://planetjdk.org/
   
features of above Doc Styles:
1/ openJDK
    http://openjdk.java.net/guide/producingChangeset.html#merge 
      - has TOC for this kind of docs ( improvement: we may need TOC as open/Collapse DIV to see contens instead of click to new page , see Quora.com  comments collapse div )
    http://openjdk.java.net/projects/jdk7/builds/
      -  menu at top of the page << features  build ...>> this is one kind of template for TOC lists
      -  Build Google calender , nice calender integration when involved big teams

2/  JQuery gcharts:
      - improvement: if used for TOC style documentation , have TOC section once user clicks on ' 2nd level tab' , and also 'top' image icon on each section so that user can go top TOP of the page any time with our much of mouse click wheel UP .

3/ Sphinx
Sphinx uses reStructuredText as its markup language, and many of its strengths come from the power and straightforwardness of reStructuredText and its parsing and translating suite, the Docutils.
  Docutils is an open-source text processing system for processing plaintext documentation into useful formats, such as HTML or LaTeX. It includes reStructuredText, the easy to read, easy to use, what-you-see-is-what-you-get plaintext markup language.

Friday, October 21, 2011

HTML5 Canvas CSS3 Audio Video


Authority
Svg/Canvas , Audio , Video
Adoption: Whole Industry is embracing HTML5
Apps:  showing HTML5 promise
Apps/Libraries in Details


major buzz words of HTML5:  Canvas, CSS3 , Audio , Video

1/ Authority

These are people in HTML5 area , considered Authority (experts) in their fields due to their 'broad understanding' of the subject and ' future visibility '
a/ Damon Oehlman :  Amazon EC2 Vs. RackSpace , expert in JS done lots of JS 
b/ 8Bit Rocket:  Author of 'HTML5 Canvas' book , developed lots of Games
I have always tried to  figure out new trends early, and most of the time we are Too Early (i.e we made crappy videos 15 years before youtube.com, we created Flash games 7 years before it became a viable indie game option, etc.)  
- However, we were hoping to hit the HTML5 Canvas nail on the head and I think, this time, our timing was just about right.  Our HTML5 Canvas book has already sold so many copies, that we have been asked to write another one.


1.2 / SVG  vs. Canvas



  • SVG is a document format for scalable vector graphics.
  • Canvas is a javascript API for drawing vector graphics to a bitmap of a specific size.
SVG is a markup language for vector graphics and has DOM. This makes it very easy to alter the content after its creation.
- Canvas is a painting surface just like MS Paint without an undo button. You cannot alter the content. You only can overpaint it. It isvery performant because the browser does not need to handle a complete DOM for the imagehttp://caniuse.com/#cats=SVG - SVG support matrixhttp://www.20thingsilearned.com/en-US/what-is-the-internet/1


2/ Adoption: Whole Industry is embracing HTML5


Browser support for HTML5
a/ Chrome and Safari are ahead with HTML5 , Firefox caught up now.
b/ IE9 beta in  3/2010 , production in 3/2011 ,  IE9 works only on Vista and WIN7  NOT on XP

Internet Explorer 9 RC released: Everything you need to know - 2/2011 


a/ HTML5: A Look Behind the Technology Changing the Web - WSJ
   - Pandora released HTM5 website   - Some 34% of the 100 most popular websites used HTML5 in the quarter ended in September, according to binvisions.com
 Cadir Lee, Zynga's chief technology officer, predicts companies will keep tailoring apps for hit devices like Apple's for some time. Yet he thinks HTML5 could eventually evolve to be an even broader technology movement, 
b/  IE9 is in prod. since 3/2011 ,  MSDN released Jquery plug-ins making Jquery standard.

c/ Adobe Drops Mobile Browser Flash Support 
I have always tried to  figure out new trends early, and most of the time we are Too Early (i.e we made crappy videos 15 years before youtube.com, we created Flash games 7 years before it became a viable indie game option, etc.)  

- However, we were hoping to hit the HTML5 Canvas nail on the head and I think, this time, our timing was just about right.  Our HTML5 Canvas book has already sold so many copies, that we have been asked to write another one.

3/ Apps :  showing HTML5 promise


html5 chess   : Used an existing JavaScript chess engine (GarboChess – created by Gary Linscott).  asr: This is using good Garbo engine but not good UI , below one has Good UI , so use that good UI with Garbo engine.
Chess:  - All logic in .js file  , Here is details


Editors/IDE:  ( These show power of HTML5 )
ACE: Ace is a standalone code editor written in JavaScript. Our goal is to create a web based code editor that matches and extends the features, usability and performance of existing native editors such as TextMate, Vim or Eclipse.
 -  It can be easily embedded in any web page and JavaScript application. 
 - check this ACE is used embedded in many other applications 
(asr: we can use 'SAVE' etc. funcationality of this ACE in  'mindmapps' etc. kind of Apps. 

PythonAnywhereYou can use PythonAnywhere to write Python without installing anything locally.
  - Show off your web applications from our servers  - Collaborate and work together
  IDE LIVE -  Create web apps in JavaScript right from your browser
By releasingChemDoodle Web Components open-source, yet continuing to financially support the library, iChemLabs ensures that the next generation of scientific applications is easily achievable by academia, government and industry, and helps to make sure that the cost of education decreases while using the web to further spread science.




4/ Apps/Libraries in details

HTML 5 Canvas: Creating Gaudy Text Animations…Just Like Flash! (sort of)
There are more things that you can do with text, but most of them are manipulations of the global context. I’ll report back when I have mastered that aspect of the canvas to make something worth showing. 
- However, the point of this tutorial is to show that while some of the FX you can create with the HTML 5 Canvas are similar to those in Flash, creating them takes a lot more low level code than seems reasonable for a Flash designer. Obviously, tools will be created to build some of these things automatically (you can bet Adobe is on it right now), but those tools will still end-up writing out code much like this…all of it clearly visible in the HTML page.




- 1) Adobe Edge
For the average user, the release of Adobe Edge is mostly valuable for what it says about the future of Web surfing. For years now, Firefox, Chrome, and Safari have battled over who has the most HTML5-compliant browser--but these distinctions don't really matter until there's a stream of HTML5 content on the web. A user-friendly tool for creating HTML5 (such as Edge) may be just what the standard needs to really take off.

Unfortunately, HTML, CSS and JavaScript don’t offer any easy way to create animations. Developers comfortable writing raw code in text editors have, thus far, been the driving force behind web standards-based animation. Designers and animators accustomed to development tools like Flash, which offers visual layouts and drag-and-drop animation, have been left out of the web standards animation trend.


In its current form Edge will export your animations using div tags, some CSS animations, a fair bit of JSON and a combination of jQuery and some custom JavaScript to hold everything together.
Why go with div and CSS-based animations when there’s Canvas and SVG? Well, for one thing, this is a very early preview and Adobe claims that eventually Edge will support canvas and SVG (in fact Edge already has some support for importing SVG file).
politics here: asr: this Adobe guy says IOS 4 canvas problems are one reason why Adobe EDGE is NOT with Canvas/SVG as of now . It seems Apple playing not to have a superfast Canvas/SVG on Safari Browser to protect 'Native Apps' business. , same is the case with microsoft ( not to lose desktop ) , so that is why delay with IE9 .
-2) Sancha UI tool ( this is similar to Adobe Edge ) , finally Adobe may win as it has Ton of Flash developers with skill transform to Edge .

------


The future is bright

With Mozilla and Google so deeply invested in the next iteration of the Web, it's really no surprise that Internet Explorer 9 is so excellent. Microsoft knows that the Open Web platform could usurp desktop and native mobile apps
3) FireFox 4 , SVG , MathML support


----

Thursday, October 20, 2011

E-learning / TTS Text To Speech / Machine Learning ML / python NLTK



http://searchstorage.techtarget.com/definition/How-many-bytes-for
http://fnoschese.wordpress.com/2011/05/10/khan-academy-my-final-remarks/
http://www.hackeducation.com/2011/07/19/the-wrath-against-khan-why-some-educators-are-questioning-khan-academy/
http://code.google.com/p/khanacademy/issues/detail?id=191
Price:
------------
Rackspace price: 10 GB = $2 , so 10MB is   200 cents/1000 => 0.2 cents , so each lesson is 0.2 cents , say they use 10 lessons(100 minutes total  per day)  then  10x 0.2  =  2 cents  

Size of a 10 Minute .WAV File for NaturallySpeaking 3

Answer ID 3129   |    Published 07/09/2002 12:00 AM   |    Updated 04/16/2010 04:54 PMSize of a 10 Minute .WAV File for NaturallySpeaking 3Question:How many megabytes will a 10 minute WAV file for use with NaturallySpeaking be?Answer:The WAV file for use with NaturallySpeaking needs to be recorded at 11Khz 16bit Mono. At that sampling rate, a WAV files is approximately 1.3MB per minute. 10 minutes X 1.3MB would be 13 MB.
An MP3 (music) downloadable file : 2 to 5 MB
10 min Mp3 = 10 MB 

1/ HOmework Helper

what we need , here is the list


0/ we need to keep it simple first 
  - just helping homework by looking at problems sheets from schook (K-6 , collect what schools use )
  - and helping with concepts , showing with pre-requisites ,


1/ we scaned the home work papers , it scanned good in PDF and conversion is also good like 90% , learned as follows.
  - why can't we type Text in first place instead of  a) scan to PDF  b)  then convert to Text  c) then Edit text to remove errors .
 - Type ( in india ) and store them in XML file so that you can format what ever way you wanted . In India they put typed file on DropBox  sync folder . Provide a simple UI tools to author/Edit XML document see here
 http://www.syntext.com/products/serna-free/


1.2/  you need good content  ( see amazon saved wishlist  'word problem',  'ace calculus ' etc... ) to present content in the style many people liked ( you know from reviews) and easy 

How to Solve Word Problems in Algebra - we need this book , look inside the book 
   2 more then unknown    -> 2 +x
   5 Less than Unknown    ->  x -5 
  we need to have every math problem convert into by calling functons  and  will be used to 'categorize' the given problem . I. e when you get another problem like this it will clasify as this type and we draw Picture based on it ( using D3.com )


get all amazon wishlist books ( best reviews 20- 50 etc.. ) and summarize what you  want to follow based on combined of all books.

2/ we need this NLTK and his cook  book ..
http://text-processing.com/demo/tokenize/  - use this with this problem below 
 Ed had 22 more marbles than Doug. Doug lost 8 of his marbles at the playground. How many more marble did Ed have than Doug then?


book review: Overall the book is easy to read, has a huge set of sample recipes and feels very useful. 

3/ we need D3.org  library, use this initially for Prototype to do simple Animations  ( for demo htm5 browsers is fine , as of 2012 MAR IE9 will be production release )
3.2  Use  this Math Rendering of SQRT etc..

1.2 / Machine Learning ML





1/ PyML :  PyML shows lots of command line based .( this way  RapidMiner is much better all GUI done ...)
2/ Orange:  Orange seems clean , easy conceptually ,  screen Shots
   Review:  Shortest script for doing training, cross validation, algorithms comparison and prediction.
  • I found Orange the easiest tool to learn.


3/ RapidMiner



RapidMiner, R, and Excel were again the most popular tools: http://www.kdnuggets.com/2011/05/tools-used-analytics-data-mining.html
Review: RapidMiner is an open source statistical and data mining package written in Java.  This Reviewer seems good one with credentials ..

Tutorial Vidoes  , Videos and answers from author at bottom .
asr: wow RapidMiner GUI shows  'Train' and "Test' windows and comparing the prediction of 2 methods .
 - In MLpy shows lots of theory  , this shows GUI how to get results ..



good decision chart ,  all screen shots   ,
asr: it seems for starter , RapidMiner seems best , it has Time series Extension for CL futures data .
 b/ being JAVA you can get all those STOCastic/MACD  etc. many other indicators .
c/ MLpy , scipy may be good , but not lot of Data mining comapred to RapidMiner
d/ Rapidminer has commercial support so must be improving year by year.

RapidMiner vs. WEKA

The most similar data mining packages are RapidMiner and WEKA. There have many similarities:
  • Written in in Java.
  • Free / open source software with GPL license.
  • RapidMiner includes many learning algorithms from WEKA.
My first thought what that RapidMiner has everything that WEKA has, plus a lot of other functionality and is more polished
What kind of technology is Watson based on?
Watson is an application of advanced Natural Language Processing, Information Retrieval, Knowledge Representation and Reasoning, and Machine Learning technologies to the field of open-domain question answering. At its core, Watson is built on IBM's DeepQA technology for hypothesis generation, massive evidence gathering, analysis, and scoring.




Machine learning deals with designing and developing algorithms to evolve behaviors based on empirical data. One key goal of machine learning is to be able to generalize from limited sets of data (paraphrased from [1]). Russell and Norvig [2] lists machine learning as a specific capability, namely the ability to "adapt to new circumstances and to detect and extrapolate patterns".


What is UIMA?Unstructured Information Management applications are software systems that analyze large volumes of unstructured information in order to discover knowledge that is relevant to an end user. An example UIM application might ingest plain text and identify entities, such as persons, places, organizations; or relations, such as works-for or located-at.


Standford ML course complete set of VIDEOS :

2/ NLTK  Python Natural Language Processing



asr: summary , I did not real  'value creation ' in doing parsing  Web text and giving info.
 - yes for Siri kind of voices services , you can provide input , but google/apple/MS need big 'Repositories ' to buy not small one.
 - Quora is doing this kind of Wiki ( summary ) , I guess auto generated from  'All user posts ' for a given topic , look at this answer Wiki of Quora

http://ianozsvald.com/2011/01/30/review-for-python-text-processing-with-nltk-2-0-cookbook-packt-2010/
http://streamhacker.com/2010/12/15/python-text-processing-nltk-book-reviews/

books:
http://www.amazon.com/dp/0596516495/ref=rdr_ext_sb_ti_hist_1
http://www.amazon.com/Python-Text-Processing-NLTK-Cookbook/dp/1849516383/ref=pd_sim_b3
http://www.amazon.com/Programming-Collective-Intelligence-Building-Applications/dp/0596529325/ref=pd_sim_b2





2/ Text-to-Speech vs Human Narration for eLearning



AT&T TTS: 
 Rep said $5500 linux server license ( for 10 servers $2500 each), can support upto 30 to 40 simultaneous users.
 - see min. requirement is only 250 MB ram, I guess by having 10GB kind of RAM , you can have 30 simultaneous users supported.
 - see control tage, so they can speak math ... 
The AT&T Natural Voices TTS engine does a great job of synthesizing most text without special  instructions, but there may be special circumstances where you wish to fine-tune the pronunciation of  certain words or phrases.  The AT&T Natural Voices TTS engine allows users to mark up the text to be  spoken to include special control tags that change the way the text is pronounced.  The AT&T Natural  Voices TTS engine supports a subset of the SSML control tags
-------------
Sphinx4 - CMU voice recog. software  , Wiki
 Sphinx-4 is a state-of-the-art speech recognition system written entirely in the JavaTM programming language. It was created via a joint collaboration between the Sphinx group at Carnegie Mellon University, Sun Microsystems Laboratories, Mitsubishi Electric Research Labs (MERL), and Hewlett Packard (HP), with contributions from the University of California at Santa Cruz (UCSC) and the Massachusetts Institute of Technology (MIT).
Sphinx-4 started out as a port of Sphinx-3 to the Java programming language
Sphinx4 uses by these users 
Speech Recognition on Android : 

-----

product: E-learning of subjects 
  - see Khanacademy : our product only works of till k - 8 : 
 model basis: like Khan acadamy 10 minute concepts ( else user loose interest as Khan said).
 - less production cost because  TTS is used to create courses based on 'input files'

issues: 1/   for math  'Square root of 4 = 2 ' spelled badly even on AT&T  ( so need to find a solution to tweak for Math , check AT&T talk to the Rep. )
 2/  see why so many companies failed in this field 

asr what is missing in E-learning:
 1/   audio and Video , how do you know it , by adding it , it will be killer succeess ?
   - see KhanAcadamy , it successful based on it
2/  so why people come to your site than Khan ?
   Khan is good , bur Human Resource Intensive , if we make Khan equivalent with
  a) Text  To Speech TTS    b) interactive Video   ( with visuals esp. for WORD problems  by categorizing Word problems in to known algebra problems )
   - word Problems:  show the problem ask user if he know how to solve,  user says NO , give equations like x + 4 = y  and Y - x = 2  , can he solve this problem , if says yes  then show ' how to convert WORD problem into equation.
   - if answer is NO , then TAKE to how to solve those x +4 = y and Y -2 = 3 , take to that Problem ..

tech:

subjects:  chess , Math  k - 8 , specific  'Word problems',  probability etc..
  physics , chemistry  etc..

--------------
 - Here is site , they have ton of customers...
 http://www.dessci.com/en/
 Here is MathPlayer shown in IE ....( I down loaded IE plugin , they say work of FFox , but no Chrome )
http://www.dessci.com/en/products/mathtype/compare/mathplayer.htm

compare: Ed helper is doing simple HTML tables for Math , these are meant for Printing and doing Math
  http://www.edhelper.com/math/math_grade2_review_1.htm
----------------------

Why didn't you use human voice-over?


Results?  Acceptance by Students?

Again, the responses are somewhat self-evident:

A. Yes. The TTS technology coupled with the software allowed us to create e-learning material in about half the time as human voice over. The maintenance of the e-learning material takes 75% less time than maintaining material with human voice over. This allows us to create and maintain material much faster with less resources and without needing specialized resources that have voices specialized for recording.
We have produced courses for 6000 people in the company and we are getting good feedback: 80% are satisfied, 10% love it and 10% feel offended. My conclusion is that the voices are "good enough" for training applications.



http://elearningtech.blogspot.com/2010/09/text-to-speech-vs-human-narration-for.html


His work in social media, e-Learning and Performance Support has won awards and has led him into engagements at many Fortune 500 companies 

Resources:
 AT&T voices:  http://www.wizzardsoftware.com/att_desktop_overview.php
-  FREE: ( may not be as high quality as AT&T voices)  http://sourceforge.net/projects/freetts/
   FreeTTS is a speech synthesis engine written entirely in the Java(tm) programming language. FreeTTS was written by the Sun Microsystems Laboratories Speech Team and is based on CMU's Flite engine. FreeTTS also includes a partial JSAPI 1.0



Edit

L10n translation


Idea: to have a programatically extract existing software program 'Language text' hard coded and make it as Language bundles .
  - look at the example code at this URL  http://www.langbox.com/inter_e.html
 - you can use Google or Microsoft free translation APIs to programtically do L10n translation of 'resource bundles'.
     For example: Google MTMicrosoft TranslatorOpen-Tran,MyMemoryTranslate Toolkit TMApertiumTDA-Search, and more.  http://www.opentag.com/okapi/wiki/index.php?title=Rainbow


some open source tools in this area:
http://okapi.opentag.com/

Wednesday, October 19, 2011

Viable Software Product / Startups


Software development:  2 types here
  src:  http://www.slideshare.net/watchingwebsites/lean-analytics-for-startups

DropBox founder Slides:
http://www.slideshare.net/gueste94e4c/dropbox-startup-lessons-learned-3836587
http://www.slideshare.net/adamsmith1/from-zero-to-a-million-users-dropbox-and-xobni-lessons-learned
http://www.slideshare.net/missrogue/so-you-want-to-do-a-startup-eh  -- good one too

asr: see the DropBox founder slide:  this is how great PAIN relieving companies/products created in the history . If you plan any product/Site  Have a reference of this SLIDE and follow that FLow. Will the user experience same sequence of steps with your product.

 examples: Skype ( solved  Pricy phone bill problem esp. got popular in Asia )
   PayPal  ( person to person payment )
   google ( geeting what you want from web  instead of earlier Yahoo showing paid links first )