Tuesday, December 30, 2008
Article:Extending WSO2 Registry with Handlers
Wso2 has a registry--WSO2 Registry, which is a document and service registry with tagging and user comments among many other features. This article, Extending WSO2 Registry with Handlers, explains the registry and its extensibility mechanisms.
Wednesday, December 24, 2008
Monday, December 22, 2008
Resolving total-order using partial-orders
Resolving a total-order using partial-orders can be a tricky code to write. For an example, sorting can not be used because every pair can not be compared directly and doing it by enumerating through the list is also tricky. This is a common problem--among examples are Axis2 handler order resolution, build orders in ant/maven etc, and AI planning based use cases etc.
However, there is a standard way to solve this problem. The solution is creating a graph assigning dependencies as edges in a graph--which is a directed acyclic graph--and then doing a topology sort http://en.wikipedia.org/wiki/Topological_sort . The algorithm takes O(n) time and space where n is number of entities + number of rules, which is pretty reasonable.
However, there is a standard way to solve this problem. The solution is creating a graph assigning dependencies as edges in a graph--which is a directed acyclic graph--and then doing a topology sort http://en.wikipedia.org/wiki/Topological_sort . The algorithm takes O(n) time and space where n is number of entities + number of rules, which is pretty reasonable.
Sunday, December 21, 2008
Theoretical Computer Science Cheat Sheet
This is a nice cheat sheet on what they call "Theoretical Computer Science Cheat Sheet", http://www.tug.org/texshowcase/cheat.pdf. I have not heard of half of them--some we touched in the algo class. Once thing I learned there was, representing simple phenomenon---like the expected number of throws before getting N heads in a coin toss---very often lead to very complex math . So once in a while, when we--system guys--try to explain things we saw, those things could come handy.
Thursday, December 11, 2008
Multicore for e-science at the E-Science 2008
Right now an interesting panel is going on on the topic, and you can see it via the e-science web site. I am sure there will be archives as well, will post it later.
Tuesday, December 9, 2008
Live sessions @ E-Science 2008
There is e-science 2008 conference going on these days, in Indianapolis. However, what I wanted to note is that they are streaming all sessions live (http://live.escience2008.iu.edu/calendar/Calendar.html) , in real web2 fashion, with chat rooms for every sessions etc. Great to see the state of art is in real use, in addition to the fact that people can listen to interesting sessions even when they can not afford to be there in person.
Tuesday, December 2, 2008
Here Comes Everybody: The Power of Organizing Without Organizations
wikitravel.org is a free worldwide travel guide, and it look pretty nice. One nice thing is that it is created using so called crowd sourcing, just like Open source software or wikipedia. It is an example of many efforts that create high quality outputs by aggregating small outputs of a large group. I read a book called Here Comes Everybody: The Power of Organizing Without Organizations written by Clay Shirky ( Penguin Group 2008) and it discusses many interesting aspects of the power of collective wisdom and it dynamics.
One thing it says is that the reason for organizations to exists is that the communication overhead of arriving at agreements in a group (which needs Cn2 (combination) agreements with peer to peer negotiations that grow very fast with n), and typically organizations reduce the overhead by organization hierarchy or chain of command. However, Internet has changes the communications patterns (e.g. asynchronous messages, broadcasting, posting without knowing recipients) ; therefore, some tasks can be handled without formal organizations. There is lot more interesting stuff there.
One thing it says is that the reason for organizations to exists is that the communication overhead of arriving at agreements in a group (which needs Cn2 (combination) agreements with peer to peer negotiations that grow very fast with n), and typically organizations reduce the overhead by organization hierarchy or chain of command. However, Internet has changes the communications patterns (e.g. asynchronous messages, broadcasting, posting without knowing recipients) ; therefore, some tasks can be handled without formal organizations. There is lot more interesting stuff there.
Monday, November 17, 2008
Lincoln's letter to his son's teacher
After Obama made a reference to Lincoln in his interview, I read through "Lincoln's letter to his son's teacher" again, the first time been more than 10 years ago for a reciting contest at high school. Both ideas and English of this work is almost perfect, must read.
This is not a kind of topic I usually blog on, however, having read the letter, I could not resist.
This is not a kind of topic I usually blog on, however, having read the letter, I could not resist.
Thursday, November 13, 2008
Thursday, October 23, 2008
Book on RESTful PHP Web services
Samisa has done a Book on RESTful PHP Web services. That is the second book from Axis2 teams, and WSO2, the first been Quickstart Apache Axis2 by Deepal. Congratulations Samisa, having work on few papers, I can imagine what it take to write a book!!
Monday, October 13, 2008
Eating our own dog food: Wisdom of the Crowd for Rating the Papers
Could we use the idea of Wisdom of the Crowd (e.g. tagging, comments, ranking etc) with research papers? In a way, we already do so, using citations. However, citation is a slow porcesses, and at best, it take around six months to a citaton to appear. I believe, it is interesting to enable comments, tags, and recommendation, etc to enable more involved discussion. Like it or not, there is a lack of discussion between academia and industry (research labs of companies does not qulify as industry), and one reason been, people from the industry do not have time, energy, or intensive, to write a paper and go through the process of publishing it, even though they do have a comment, or improvement to a idea. However, given that comments are enabled, there is a better chance that more people comment. Of course, there will be a concern about quality of comments, but just like Wikipedia, the quality will prevail in the long run.
We are seen many posts on blogs lead to lengthy discussions on various aspects, and more often than not, research work has more than one aspects, and could benefit from more involved discussions. Therefore, features likes comments, tags would help. May be, one day we could argument the peer reviews using similar ideas, making the paper really peer reviewed by all the peers.
Furthermore, in my opinion we should be ashamed that, CS papers, despite being the state art of information processing, are very hard to search, and categorize. When you think of the papers, they do have well defined relationships in terms of citations. But, if I picked a paper now, how hard is it to understand provenance of it's idea. If we create a graph linked by citations, and weight them using something like page rank algorithm (Google algorithm to rank web pages), we can easily identify Hubs (both authoritative papers, and authors, and may be even groups), and important paths of development (provenance of ideas). I am sure this is already proposed somewhere elase, and maybe some tools already has it. But I think it is shame that ACM, or IEEE sites does not support it. We should use results of our own reserach, before expect other people to use them.
We are seen many posts on blogs lead to lengthy discussions on various aspects, and more often than not, research work has more than one aspects, and could benefit from more involved discussions. Therefore, features likes comments, tags would help. May be, one day we could argument the peer reviews using similar ideas, making the paper really peer reviewed by all the peers.
Furthermore, in my opinion we should be ashamed that, CS papers, despite being the state art of information processing, are very hard to search, and categorize. When you think of the papers, they do have well defined relationships in terms of citations. But, if I picked a paper now, how hard is it to understand provenance of it's idea. If we create a graph linked by citations, and weight them using something like page rank algorithm (Google algorithm to rank web pages), we can easily identify Hubs (both authoritative papers, and authors, and may be even groups), and important paths of development (provenance of ideas). I am sure this is already proposed somewhere elase, and maybe some tools already has it. But I think it is shame that ACM, or IEEE sites does not support it. We should use results of our own reserach, before expect other people to use them.
Saturday, October 11, 2008
Acceptance Rate, Impact Rank and CFPs for Conferences
Acceptance Rate and Conference Impact Rank are two measures of a conference, and a list of those ranking can be found in Networking Conferences Statistics and http://citeseer.ist.psu.edu/impact.html respectively. Also few good lists of Call for Papers can be found in following sites. Some are old, but one can Google for new version of those conferences.
- http://ft.ornl.gov/cfp/
- http://www.sigcomm.org/calls/papers/
- http://www.usenix.org/event/
- http://www.iaria.org/conferences.html
- http://i.cs.hku.hk/~scho/cfp.html
- http://www.ee.unsw.edu.au/~timm/netconf/
Case studies on Highly Scalable Systems
This site ( http://highscalability.com) links to what ever available data on large scale systems, starting from Google but going to lot of others. I found it interesting, as there is nothing like "really doing it!!", and I love to hear about first hand experience on scaling up!!.
Thursday, October 9, 2008
A Unix Trick to debugging With Asynchronous messaging
These days, I am playing with a large scale messaging system (broker network). Things great when it works, but when things start to go wrong with one of these things, you do not need to be there.
One big down side of Asynchronous messaging is it is so hard to debug, specially when messages jump from node to node, and your are not the author of the code (then you do not know the magic places where to put stdouts :( ).
Following is a little trick that helped me, it is not a silver bullet, but does give some comfort. You need to start with followings.
1. Log the message ID, or some unique ID with every log message. Usually with luck developers have already done this.
2. Set the log4j to print the time stamp first thing in each log statement
Assuming you were able to get all the logs to a same directory (my case I have NFS mounted across all nodes, so that was easy), following command will list all the things related to a given message, in the real time order, so you can walk though it and tell what happend. (sort simply sort them with log4j time stamps)
grep message-id *.log| sed -'s/.*\.log://'|sort
It is a simple command, but could be very useful. Also if you need to merge all the logs, in time order do cat *|sort. As you could guess, there are many variations of this. Actually maybe message system developer should put down some standard format for message sending, receiving and routing, which let people to write log mining code/scripts that can uncover problems.
If anyone knows some useful log mining tool, please! please! drop me a note :).
One big down side of Asynchronous messaging is it is so hard to debug, specially when messages jump from node to node, and your are not the author of the code (then you do not know the magic places where to put stdouts :( ).
Following is a little trick that helped me, it is not a silver bullet, but does give some comfort. You need to start with followings.
1. Log the message ID, or some unique ID with every log message. Usually with luck developers have already done this.
2. Set the log4j to print the time stamp first thing in each log statement
Assuming you were able to get all the logs to a same directory (my case I have NFS mounted across all nodes, so that was easy), following command will list all the things related to a given message, in the real time order, so you can walk though it and tell what happend. (sort simply sort them with log4j time stamps)
grep message-id *.log| sed -'s/.*\.log://'|sort
It is a simple command, but could be very useful. Also if you need to merge all the logs, in time order do cat *|sort. As you could guess, there are many variations of this. Actually maybe message system developer should put down some standard format for message sending, receiving and routing, which let people to write log mining code/scripts that can uncover problems.
If anyone knows some useful log mining tool, please! please! drop me a note :).
Wednesday, October 8, 2008
Learning/Teaching to do large scale parallel programming
With advent of the cloud, learning to do large scale parallel programming is becoming a useful skill. For an example, Google needs graduates to learn those things in the University. This is a old field, but has surfaced since the cloud. NSF did a workshop on the topic, 2008 NSF Data-Intensive Scalable Computing in Education Workshop. There some material, and pointers there.
As you would guess, Map-Reduce is the kind of the start point, but there are many others.
As you would guess, Map-Reduce is the kind of the start point, but there are many others.
Tuesday, October 7, 2008
Monday, October 6, 2008
Computing at Scale: Challenges & Opportunities
I was watching the video, Computing at Scale: Challenges & Opportunities, a panel at Google Faculty Summit. Here are few interesting points made.
They observe few trends/problems (ones caught my ears, not comprehensive)
- We are drowning in data - Data Intensive Computing, How to handle lot of Data (e.g. Telescope could generate 200GB/sec).
- Data Driven approach is becoming popular
- How to Program large scale systems? Patterns, Middleware and teaching students to programme using them?
- Storage and Computing power is becoming Cheaper, and they are going to be placed remotely.
- Need for multidisciplinary collaborations to solve problems (e.g. e-science problems)
- With cloud cost of 1000 cpus per day = 1 cpu for 1000 days - Prof Patterson's observation
- In large scale systems, no matter high reliable, H/W fails, and S/W has to handle it - observation at Google
- Animoto (Company running on EC2), was using about 50 nodes, but due to a face book app they had to handle 10X user base with in a week, and they were able to bump up their system to 3500 nodes using EC2. See here for details.
Few High level CS talks
Found few interesting talks/papers in the Computing Community Consortium web page, http://www.cra.org/ccc/resources.php. e.g. Computer Science: Past, Present and Future Ed Lazowska's SIGCSE Keynote, March 15, 2008
Subscribe to:
Posts (Atom)