COPS and HPDC Best Paper Awards

The paper I presented at the COPS 2008 Workshop, “Social VPNs: Integrating Overlay and Social Networks for Seamless P2P Networking” won the workshop’s best paper award. This was joint work with Renato Figueiredo, Pierre St. Juste, and David Wolinsky.

The SocialVPN project uses social networking to build P2P VPNs. It makes it trivial to communicate, play games, share music, etc. with anyone in your social network with any IP-compatible software. You can find the slides here. Below, you will see a photo of me giving the talk:

Giving the SocialVPN Talk

In addition, a related work: “Improving Peer Connectivity in Wide-area Overlays of Virtual Workstations” by Arijit Ganguly, David Wolinsky, and Renato Figueiredo. and myself was presented at HPDC 2008 by Arijit Ganguly. This work also won the best paper award.

Congratulations to my co-authors!

Why Spam Filters Suck

I was asked by Brendan I. Koerner why spam filters suck. An excerpt from my answer appears in this month’s Wired magazine.

One thing they didn’t mention which I think bears repeating is why spam filtering is so hard. Here is what I said:

Spam filtering is an adversarial problem. You want your email spam free, but
spammers want you to read their messages. Most technological problems don’t
have this aspect. The spam problem is more similar to automated crime
fighting than building a better search engine, for instance. As text spam
filters have increased in accuracy we’ve seen spammers move to image (jpg and
gif) and pdf based spam.

Spam classification is hard. If you hired a human to do the job, you may not
get better results. Certainly some messages are easier for a human to
classify than a computer, but the reverse is also true. Bill Yerazunis,
author of the spam classifier CRM114, estimated his own classification
accuracy as 99.84%, which was worse than his software did:

http://crm114.sourceforge.net/wiki/doku.php?id=news

I suppose I was asked due to my research in spam filtering based on social email networks, work which has been in the news before (Slashdot, New Scientist, Nature News).

Brunet: a remarkably great P2P library

While I was a post-doc at UCLA, I started work on a library to implement P2P protocols, which we called Brunet. Unfortunately, Brunet has never been publicized as well as it should have been. This post is an attempt to bring some attention to it.

Brunet is a Free Software (GPL licensed) library for P2P networking written in C# and developed using Mono, but it also runs on Microsoft’s .Net platform. Here are the basic features of Brunet:

  1. Network transports are abstracted as Edge objects. We have TCP, UDP and TLS/SSL transports now.
  2. Completely distributed UDP NAT traversal (TCP NAT traversal is a to-do item).
  3. Implementation of Chord/Kleinberg/Symphony type ring topology for routing.
  4. DHT implementation.
  5. Both packet based and RPC based communication primitives. New protocols
    are easy to add using one or the other of these approaches.
  6. XML-RPC bridge to allow calling or serving RPC methods with XML-RPC clients or servers. We have examples of this using standard Python.
  7. Distributed tunneling system to maintain topology in the presence of some routing difficulties (untraversable NATs, BGP outages, firewalls, etc.).

We have deployed Brunet on PlanetLab for over two years. You can see information on some of the running nodes on our node statistics page.

Brunet is used as the basis for our IPOP (IP over P2P) project. IPOP builds a virtual IP network which allows nodes to communicate even if some of them are behind firewalls or NATs (as is often the case these days).

The WOW/Grid Appliance project uses IPOP and Brunet’s DHT to produce self-managing grids for large-scale, wide-area grid computing. You can download images of the grid appliance for many virtual machines/hypervisors to try them out yourself.

We encourage outside use of any of the above projects and we welcome code contributions to Brunet. If you are interested in any of the above projects, please join our ACISP2P Google Group. You can download the Brunet code using Mercurial. You can also browse the Brunet documentation online.

Publications

:

  1. P. Oscar Boykin, Jesse S. A. Bridgewater, Joseph S. Kong, Kamen M. Lozev, Behnam A. Rezaei, Vwani P. Roychowdhury, A Symphony Conducted by Brunet, arXiv:0709.4048v1
  2. A. Ganguly, A. Agrawal, P. O. Boykin, R. J. Figueiredo, Wow: Self-Organizing Wide Area Overlay Networks Of Virtual Workstations, Journal of Grid Computing, Vol. 5, No. 2, pp. 151-172 2007. [bib]
  3. Arijit Ganguly, Abhishek Agrawal, P. Oscar Boykin and Renato Figueiredo, “IP over P2P: Enabling Self-Configuring Virtual IP Networks for Grid Computing”, In Proceedings of the 20th IEEE International Parallel and Distributed Processing Symposium (IPDPS), Rhodes Island, Greece. [ppt talk, pdf talk]
  4. Arijit Ganguly, Abhishek Agrawal, P. Oscar Boykin, Renato Figueiredo, “WOW: Self-Organizing Wide Area Overlay Networks of Virtual Workstations”, In Proceedings of the 15th IEEE International Symposium on High Performance Distributed Computing (HPDC), pages 30-41. Paris.
  5. Arijit Ganguly, David Wolinsky, P. O. Boykin, Renato Figueiredo, “Decentralized Dynamic Host Configuration in Wide-Area Overlay Networks of Virtual Workstations”. In Workshop on Large-Scale and Volatile Desktop Grids (PCGrid), 03/2007

Heading back to the United States

I leave Israel tonight and will arrive in Jacksonville, FL tomorrow at 8am. It’s been a wonderful trip.

Thank you Yossi Weinstein for helping me get around Haifa.

Also, a very big thank you to Tal Mor for supporting my visit and for the invitation to the workshop.

I look forward to my next trip back to Israel.

Slides from Cooling Talk at Safed

Here are my slides for the talk I gave today: Cooling and Compression: Cooling as a Bridge Between Physics and Information Theory.

I hope these will be of use to others. I used Latex Beamer to produce the slides. The latex source for the slides is here.

Some of the results in this talk are new. I am preparing a paper on the subject. I welcome any comments/criticisms/suggestions.

Finding Bugs

The WOW project uses the Brunet P2P library developed in C# using Mono.

Recently, we noticed a memory leak on our nodes on planetlab. David Wolinsky identified exactly the bit of code causing the leak and reported a bug. The bug was fixed within a day and the SVN has the fix already.

Kudos to David for tracking down the bug (notice many people on the list said there was probably not really a bug). Kudos also to the Mono team for getting a fix in after the issue was clearly identified. This is an example of the power of Free Software. After finding the bug, David was able to build a version of mono that same day that had the bug fixed which we could deploy on planetlab. Try that with proprietary software.

Peer-to-peer Networking and Applications

I am on the editorial board for a new journal called “Peer-to-Peer Networking and Applications”. See the call for papers (pdf). Xuemin (Sherman) Shen is the editor-in-chief and the journal will be published by Springer.

Please make this new journal known to any researchers working in P2P networking. Manuscripts may be submitted at http://ppna.edmgr.com.

Security of Quantum Key Distribution.

Long ago, when I was a graduate student at UCLA, Tal Mor encouraged me to work on some quantum cryptography problems. Recently, the main fruit of that work has appeared in the Journal of Cryptology. This paper weighs in at 59 pages. Only my coauthors will understand what a great relief it is to finally have that paper completely finished. Congratulations to my other coauthors: Eli Biham, Michel Boyer, Tal Mor and Vwani Roychowdhury.

Percolation Search on Complex Networks

The journal version of the percolation search paper, entitled Scalable Percolation Search on Complex Networks has appeared in the Journal of Theoretical Computer Science. This is joint work with Nima Sarshar and Vwani Roychowdhury.

IPOP: IP over a P2P overlay.

A paper using P2P overlays for virtual IP networks will be presented at IPDPS 06 next month. This is joint work with Professor Renato Figueiredo and students Arijit Ganguly and Abhishek Agrawal.

The main problem we are addressing is virtual networks for grid computing. We want to enable virtual nodes (compute nodes running on Xen or VMWare to migrate across the physical network and keep all their virtual network connections active. We want nodes be able to be hosted behind NAT/Firewalls without changing any existing network policies (including reconfiguring the firewalls). We also want the network to self organize so that new nodes can be added with no administration of the virtual network. IPOP is our project to meet these goals.

IPOP uses the Brunet P2P library. Each node has a Brunet address which enables all other nodes to route packets to them. Brunet automatically handles NAT traversal (without any addiitonal servers or specially designated third parties) and makes sure all pairs of nodes can communicate, regardless of any firewalls and NATs which may be between them. Secondly, since Brunet addresses are independent of the underlying network addresses (such as TCP/IP or UDP/IP which may be transporting Brunet), the node can move to a different IP address, but keep its same Brunet address. This application layer mobility allows migration of virtual machines without assigning new virtual IP addresses.

This project views P2P primarily as paradigm of self-organization to build maintainence free information systems. We are also interested in using the IPOP/Brunet infrastructure to provide effient scheduling of grid resources and self-reconfiguration of grid resources. A second IPOP paper covering our initial work on migration and reconfiguration will appear in HPDC 06.

Intelligent Design is not Science.

According to federal judge John E. Jones III, intelligent design is not science; I agree completely.

There is a nice quote of the judge’s opinion in the New York Times article:
“To be sure, Darwin’s theory of evolution is imperfect,” Judge Jones wrote. “However, the fact that a scientific theory cannot yet render an explanation on every point should not be used as a pretext to thrust an untestable alternative hypothesis grounded in religion into the science classroom or to misrepresent well-established scientific propositions.”

Real Mutually Unbiased Bases (MUBs)

My co-authors Meera Sitharam, Mohamad Tarifi, and Pawel Wocjan and I put a new preprint on the Arxiv today: Real Mutually Unbiased Bases. This work considers MUBs over the reals. In this case, we can get interesting upper and lower bounds on the number of real MUBs in each dimension. The problem seems to be rather different than the complex case, due to the discrete nature of real MUBs.

Hopefully, this approach could give us new insight into finding bounds for complex MUBs.

Netmodeler is Free Software

I am one of the lead developers of the Netmodeler C++ library. This is the library that I and many of my former group-mates at UCLA’s Complex Network Group have developed to make calculations and simulations for various works, including: spam fighting, more spam fighting, load balancing, search in unstructured p2p networks, disaster management in complex networks, and finding modules in protein networks.

The Netmodeler library is GPL licensed Free Software. Detailed instructions for obtaining Netmodeler are given on the Netmodeler wiki page.

We welcome others to use the code (as long as they obey the GPL), and we welcome source code contributions (particularly any autoconf/automake gurus that could help us get the build system improved).

Back from DSN 2005

This is a quick note that I did go and make it back from DSN 2005. It was a nice trip to Yokohama, Japan. Please feel free to look at my slides on Reversible Fault Tolerant Logic. You can find the paper in the Arxiv:: Reversible Fault-Tolerant Logic.

Mutually Unbiased Bases (MUB) and Lie Algebra

Along with my co-authors Meera Sitharam, Pham Huu Tiep, and Pawel Wocjan, we have just posted a new paper to the Arxiv titled Mutually Unbiased Bases and Orthogonal Decompositions of Lie Algebras. This paper shows the connection between the problem of Mutually Unbiased Bases in quantum information, and the problem of orthogonal decompositions of Lie Algebras.

One new thing we learn from this connection is that any set of MUBs that are eigenvectors of a Monomial Matrix in dimension 6, then there can be at most 3 MUBs, which is equivalent to the “reduce-to-prime-power” approach. Interestingly, all known sets of MUBs are equivalent to eigenvectors of monomial matrices.

A New Kind of GNU/Linux

I would be very interested in a new kind of GNU/Linux distribution. For many people, maintaining a system is not something they would care to do. They do not want to be burdened with system administration or don’t have the skills to do the job properly. What I propose is something like the Linux Terminal Server Project for home users. I am not suggesting that the software run on a remote server and display locally, but that the systems boot from a remote NFS server. The software is run locally, but installed and maintained on a remote server.

Here is how the system works:

  • User obtains a boot-cd to start up his system (this is like an install disk).
  • Like a live-CD, the system automatically configures the hardware. But unlike a live-CD, the system makes use of the local disks.
    The first time the user starts the system, she is asked if she wants to use part of her disk to speed up performance (highly recommended), and if she would like to use part of the disk to keep her personal files. If the user says no, we are back into the normal live-CD mode, and forget everything, but if the user wants to use the disk, things can get interesting.
  • The disk is used for two things: caching a remote read-only NFS file system, and storing the users home directory. The CD contains the most recent snapshot of the remote NFS file system (or the most commonly used parth thereof).
  • The cache is initialized with the CD data.
  • From then on, the system runs all software from the server. An NFS caching mechanism makes sure not to transfer files if they are up to date in the local cache.

The user never installs any software, never does any updates, and never does any system administration. If there is some update that requires a reboot, a msg can be presented to the user to notify her to restart as soon as possible, but otherwise, such updates will take effect at the next boot.

This system is a dedicated desktop environment. As such, one can get rid of root completely.
Since hardware is configured automatically, and a huge selection of software is already installed, the user has no need or ability to do anything that requires root access (assuming we allow any local user to restart the computer).

This is apt-get without user intervention to get packages, or Click-N-Run except, just Run.

Clearly this is not for everyone, but for many home and corporate users, this system would be an easy to use, easy to maintain, cheap to deploy system. If it has all the Debian Packages, I would probably run it.

Networks and Politics

There is an interesting paper on the Arxiv: A network analysis of committees in the United States House of Representatives. This paper examines the voting records and committee structure of the House using tools from network analysis.

Interestingly, in the final figures of the paper, you can see the house turning from control from the left to control from the right as the the Democrats become more bipartisan and the Republicans become partisan.

Hopefully, this kind of mathematical analysis of government will help us to understand the mechanics better. This kind of work could lead to much better “voter sheet” information about candidates.

Spam Article at New Scientist

New Scientist is running an article on P2P Spam Filtering. This is based on the preprint Let Your CyberAlter Ego Share Information and Manage Spam (I know the title is absolutely insane, sorry…)

This is work I did with Vwani Roychowdhury and his group at UCLA: Joe Kong, Behnam Rezaei and Nima Sarshar.

Disaster Management in Scale-Free Networks

My co-authors and I have posted a new pre-print to the Arxiv:
Disaster Management in Scale-Free Networks: Recovery from and Protection Against Intentional Attacks.

This paper studies the effect of continuous attacks on scale-free networks (such as P2P networks), and how to recover from an attack. Since P2P systems are often billed as robust, it is important to consider not only random failures, but also attacks. We see that if the network is reactive, we can recover almost instantly from attacks.

Reversible Fault-Tolerant Logic

I recently put a new preprint on the Arxiv: Reversible Fault-Tolerant Logic. This is a joint work with Vwani Roychowdhury which considers fault tolerance techniques for reversible computing. We show that as long as the reversible gates have errors less than a threshold value, those errors may be held in check by using O(N \log^{4.75} N) noisy gates to simulate N noise-less gates.

This work also considers the case of locally connected architectures in 1-D and 2-D. There we show that the threshold decreases somewhat. Finally, we consider the entropy that must be removed from the system and see that the entropy saving aspect of reversible computing is lost unless the gate error rate is much lower than the threshold error rate.

I will present this work this summer at the International Conference on Dependable Systems and Networks.