Saturday, May 29, 2010

E-Science: What and Why?

E-Science facilitates scientific research by applications of Computer Science. Although it has roots in High Performance Computing and Super Computing, which focus on number crunching, it has evolved to a general role in last few years. Since last few years, E-Science has appeared in the spotlight and has attracted significant amount of funds and able to attract researchers likes of Jim Gray.

Years ago the role of computing in Research was number crunching and help scientists to keep track of their data. However, now computing has become an indistinguishable part of scientific research, and almost all research disciplines have major dependencies on Computer Science. Let us briefly look at some of those areas and reasons why computers play such an integral role in sciences.

Science said to be stand on two pillars: empirical methods and analytical methods. In the first, scientists uses data collected over sufficient period of time to find new trends and patterns of the nature. In the second, based on formal models of world and current knowledge, they try to derive new results through formal logic. More often than not, these two methods have been used in tandem, one helping the other. However, with the advent of computers, a third pillar--computations-- has arrived. Since many results derived from empirical and analytical results do not have closed from answers, scientists often solve such problems through simulations. For example, PDEs (partial Deferential Equations), which often resulting from many real life calculations, often do not have a closed from answers. Therefore, they had to be solved through numerical analysis. For example, the state of art weather models predict weather by simulating a model rather than solving them. There are such examples in all filed of engineering and physical sciences. This aspect covers most hpc use cases of E-Science.

On the other hand, most of scientific calculations are easily beyond single computers. For an example, high resolution weather predations can easily use 1000 CPUs, and space telescopes can easily generate tera bytes of data in relatively short time. Handling such problems require multiple computers and distributed system knowledge. Also, building efficient solutions and exploiting the parallel nature need high performance computing (HPC) and Parallel computing.

Furthermore, the reliance of computer science has forced scientists and graduate students from Sciences to learn computer science. Although some of those scientists have made significant contributions to computer science, often that is road block for many scientists to adopt IT in the fullest extent in their research. Consequently, making computing transparent, or in other words, building tools that allow scientists to perform sciences with minimal Computer knowledge, is another interesting challenge being tackled by e-science.

Moreover, efficient scientific research requires a high level of communications and collaboration among scientists. Although IT plays a significant role in that arena even now, there is a greater potential role which it can play. For example, IT has greatly simplified dissemination of scientific research, and has significantly reduce the time and effort required to conduct a literature survey. However, we still lack infrastructure to collaborate in ongoing basis, which allow scientists to collaboratively perform large experiments. More an more grand challenges require collaboration across multiple disciplines, and that increases the importance of such collaborations and consequently the importance of tools to enable such collaborations.

Finally, given the reduced cost of sensors and ubiquity of information technology, there are vast amount of data available to a researcher from the natural world. However, one of the challenges of our time is to learn how to make sense of that data, which is more or less the goal of science itself. In the world we live in, it is much easier to obtain data, but it is much harder to make sense of that data. Therefore, computer science can play a major role in enabling and streamlining the process of getting from data to knowledge, which include collecting raw data, generating meta data, archiving, searching, visualizing, generate information by processing, deriving knowledge from information, and preserving data for the future.

Current E-science includes traditional computational topics like Building Super computing, High Performance computing, Parallel programming, multi core and GPU programming, as well as more general topics like data intensive computing, processing systems like workflow systems, and large scale data storage systems. In general, E-Science tries to facilitate scientific discovery through applications of computer science, and it tries to do that in transparent manner as possible hiding details about CS as much as possible from the end users.

Given the significant interested by system researchers to in E-Science, it is interesting to inquire the reasons. The answer is two fold. On one hand, the amount funding available to pure computer science has greatly reduced, while the funding allocated for national wide cyberinfrastructures has greatly increased. On the other hand, E-Science brings in to focus very large scales, in terms of both computations and data. The resulting problems are challenging even to computer scientists, and the tools and systems we have are often inadequate handle such problems. Therefore, E-Science has continue to push boundaries of computer science.

There are multiple E-Science initiates both at U.S., as well as UK and Europe, each receiving millions of dollars and attracting top scientists. Furthermore, Microsoft research has made significant investments and have a major presence in E-Science. Furthermore, IEEE E-Science Conference will be holding its 6th annual conference this year. Among some of the venues are Annual IEEE E-Science Conference, Annual Super Computing Conference, Annual Teragrid conference, and Annual Microsoft E-Science workshop.

To summarize, computing has the potential to facilitate conduct of scientific research enabling humans to take giant leaps, and the E-Science is a filed of study whose goal is to make that a reality. It has attracted scientists from both sciences and computer science, has receives millions of dollars in funding, and currently running many multi-disciplinary research projects to build next generation research infrastructure.

No comments: