Monday, February 13, 2012

How to Monetize Google's App Engine?

This is a 2-folded million-dollar question,

First of all, how does Google make money from it?
 (somehow, this kept reminding me of Sun's Java. Hopefully history has taught a lesson to the tech people :-).

On the other hand, would any outsiders profit from this offering? I believe there is a way.

Friday, February 10, 2012

How to Design/Evaluate a Product


  • Who is the user?
  • http://www.kintya.com/.shared/image.html?/photos/uncategorized/2008/08/16/pmdesigntemplate.png What are the customers’ goals?
  • What are the business goa
  • What are the gaps between existing solutions and the customer’s ideal solution?
  • What are the different product alternatives?
  • This Answers my puzzle about Google Finance

    Maybe Google PM can take a look at this article.

    I tried to switch to Google Finance. But, somehow, I stayed with Yahoo Finance even though I have switched almost all other services to Google.com. It could be just a human habit that is in effect. The reason talked about in above article may also contribute to it. 

    Thursday, February 9, 2012

    TCP Congestion Conbtrol mechanism

    1, slow start -  here.

    2, congestion avoidance - here.

    3, fast retransmit -  when three or more duplicate ACKs are received, the sender does not even wait for a retransmission timer to expire before retransmitting the segment (as indicated by the position of the duplicate ACK in the byte stream). This process is called the Fast Retransmit algorithm.


    4, fast recovery -  TCP sender has implicit knowledge that there is data still flowing to the receiver. Rather than start at a window of one segment as in Slow Start mode, the sender resumes transmission with a larger window, incrementing as if in Congestion Avoidance mode. This allows for higher throughput under the condition of only moderate congestion.



    Write a regular expression which matches email address

    \w+@\W+\.[\w]{3}

    Or

    ^(([A-Za-z0-9]+_+)|([A-Za-z0-9]+\-+)|([A-Za-z0-9]+\.+)|([A-Za-z0-9]+\++))*[A-Za-z0-9]+@((\w+\-+)|(\w+\.))*\w{1,63}\.[a-zA-Z]{2,6} 

    Wednesday, February 8, 2012

    Definition of Success

    "... the meaning of success has also changed for most people. No longer do people think of success in terms only in the vertical terms (for example in terms of promotion). Increasingly, people define success in their own terms, measured against their own particular set of goals and values in life. We call this psychological success. The good thing about success from the individuals point of view is while there is only one way to achieve vertical success (that of moving up), there are an infinite variety of ways of achieving psychological success."

    - From Allan R Cohen book “The portable MBA in Management”: 

    What Does Datacenter want?

    1, Accelerated Business Performance;

    2, Optimized Asset Utilization;

    3, Lower System Acquisition and Operation;

    4, Reduced IT Complexity

    Considering Factors when choosing a TOR switch


    - Throughput Gbps
    - Forwarding Rate Mmps
    - Latency
    - Buffer
    - Power
    - Stacking Technology vs VC
    - Pricing

    Tuesday, February 7, 2012

    Anatomy of a Journaling File System

    Figure 1. A typical journaling file system
    A typical journaling file system

    Note, Metadata refers to the managing structures for data on a disk. Metadata represents file creation and removal, directory creation and removal, growing a file, truncating a file, and so on. In Google's case, the metadata must contain these attributes - DC, Rack, Slot, Tablet, etc. Of course, some of them located in the GFS master, some resides in the modified Linux file system whatever it is.

    Monday, February 6, 2012

    PAE Primer plus Linux Src Code


    As RAM increasingly becomes a commodity, the prices drop and computer users are able to buy more. 32-bit archictectures face certain limitations in regards to accessing these growing amounts of RAM. To better understand the problem and the various solutions, we begin with an overview of Linux memory management. Understanding how basic memory management works, we are better able to define the problem, and finally to review the various solutions.
    This article was written by examining the Linux 2.6 kernel source code for the x86 architecture types.
    ===========================================================================
    Overview of Linux memory management
    32-bit architectures can reference 4 GB of physical memory (2^32). Processors that have an MMU (Memory Management Unit) support the concept of virtual memory: page tables are set up by the kernel which map "virtual addresses" to "physical addresses"; this basically means that each process can access 4 GB of memory, thinking it's the only process running on the machine (much like multi-tasking, in which each process is made to think that it's the only process executing on a CPU).
    The virtual address to physical address mappings are done by the kernel. When a new process is "fork()"ed, the kernel creates a new set of page tables for the process. The addresses referenced within a process in user-space are virtual addresses. They do not necessarily map directly to the same physical address. The virtual address is passed to the MMU (Memory Management Unit of the processor) which converts it to the proper physical address based on the tables set up by the kernel. Hence, two processes can refer to memory address 0x08329, but they would refer to two different locations in memory.
    The Linux kernel splits the 4 GB virtual address space of a process in two parts: 3 GB and 1 GB. The lower 3 GB of the process virtual address space is accessible as the user-space virtual addresses and the upper 1 GB space is reserved for the kernel virtual addresses. This is true for all processes.
                 
          +----------+ 4 GB          
          |          |               
          |          |               
          |          |               
          | Kernel   |               
          |          |               +----------+ 
          | Virtual  |               |          |
          |          |               |          |
          | Space    |               | High     |
          |          |               |          |
          | (1 GB)   |               | Memory   |
          |          |               |          |
          |          |               | (unused) |
          +----------+ 3 GB             +----------+ 1 GB
          |          |                  |          |
          |          |                  |          |
          |          |                  |          |
          |          |               | Kernel   |
          |          |               |          |
          |          |               | Physical |
          |          |               |          |
          |User-space|               | Space    |
          |          |               |         |
          | Virtual  |               |          |
          |          |               |          |
          | Space    |               |          |
          |          |               |          |     
          | (3 GB)   |               +----------+ 0 GB
          |          |                 
          |          |                 Physical 
          |          |                  Memory 
          |          |                 
          |          |                 
          |          |                 
          |          |                 
          |          |                 
          +----------+ 0 GB      
              
            Virtual        
            Memory         
    
    The kernel virtual area (3 - 4 GB address space) maps to the first 1 GB of physical RAM. The 3 GB addressable RAM available to each process is mapped to the available physical RAM.
    The Problem
    So, the basic problem here is, the kernel can just address 1 GB of virtual addresses, which can translate to a maximum of 1 GB of physical memory. This is because the kernel directly maps all available kernel virtual space addresses to the available physical memory.
    Solutions
    There are some solutions which address this problem:
    1. 2G / 2G, 1G / 3G split
    2. HIGHMEM solution for using up to 4 GB of memory
    3. HIGHMEM solution for using up to 64 GB of memory
    1. 2G / 2G, 1G / 3G split
    Instead of splitting the virtual address space the traditional way of 3G / 1G (3 GB for user-space, 1 GB for kernel space), third-party patches exist to split the virtual address space 2G / 2G or 1G / 3G. The 1G / 3G split is a bit extreme in that you can map up to 3 GB of physical memory, but user-space applications cannot grow beyond 1 GB. It could work for simple applications; but if one has more than 3 GB of physical RAM, he / she won't run simple applications on it, right?
    The 2G / 2G split seems to be a balanced approach to using RAM more than 1 GB without using the HIGHMEM patches. However, server applications like databases always want as much virtual addressing space as possible; so this approach may not work in those scenarios.
    There's a patch for 2.4.23 that includes a config-time option of selecting the user / kernel split values by Andrea Arcangeli. It is available at his kernel page. It's a simple patch and making it work on 2.6 should not be too difficult.
    Before looking at solutions 2 & 3, let's take a look at some more Linux Memory Management issues.
    Zones
    In Linux, the memory available from all banks is classified into "nodes". These nodes indicate how much memory each bank has. This classification is mainly useful for NUMA architectures, but it's also used for UMA architectures, where the number of nodes is just 1.
    Memory in each node is divided into "zones". The zones currently defined are ZONE_DMA, ZONE_NORMAL and ZONE_HIGHMEM.
    ZONE_DMA is used by some devices for data transfer and is mapped in the lower physical memory range (up to 16 MB).
    Memory in the ZONE_NORMAL region is mapped by the kernel in the upper region of the linear address space. Most operations can only take place in ZONE_NORMAL; so this is the most performance critical zone. ZONE_NORMAL goes from 16 MB to 896 MB.
    To address memory from 1 GB onwards, the kernel has to map pages from high memory into ZONE_NORMAL.
    Some area of memory is reserved for storing several kernel data structures that store information about the memory map and page tables. This on x86 is 128 MB. Hence, of the 1 GB physical memory the kernel can access, 128MB is reserved. This means that the kernel virtual address in this 128 MB is not mapped to physical memory. This leaves a maximum of 896 MB for ZONE_NORMAL. So, even if one has 1 GB of physical RAM, just 896 MB will be actually available.
    Back to the solutions:
    2. HIGHMEM solution for using up to 4 GB of memory
    Since Linux can't access memory which hasn't been directly mapped into its address space, to use memory > 1 GB, the physical pages have to be mapped in the kernel virtual address space first. This means that the pages in ZONE_HIGHMEM have to be mapped in ZONE_NORMAL before they can be accessed.
    The reserved space which we talked about earlier (in case of x86, 128 MB) has an area in which pages from high memory are mapped into the kernel address space.
    To create a permanent mapping, the "kmap" function is used. Since this function may sleep, it may not be used in interrupt context. Since the number of permanent mappings is limited (if not, we could've directly mapped all the high memory in the address space), pages mapped this way should be "kunmap"ped when no longer needed.
    Temporary mappings can be created via "kmap_atomic". This function doesn't block, so it can be used in interrupt context. "kunmap_atomic" un-maps the mapped high memory page. A temporary mapping is only available as long as the next temporary mapping. However, since the mapping and un-mapping functions also disable / enable preemption, it's a bug to not kunmap_atomic a page mapped via kmap_atomic.
    3. HIGHMEM solution for using 64 GB of memory
    This is enabled via the PAE (Physical Address Extension) extension of the PentiumPro processors. PAE addresses the 4 GB physical memory limitation and is seen as Intel's answer to AMD 64-bit and AMD x86-64. PAE allows processors to access physical memory up to 64 GB (36 bits of address bus). However, since the virtual address space is just 32 bits wide, each process can't grow beyond 4 GB. The mechanism used to access memory from 4 GB to 64 GB is essentially the same as that of accessing the 1 GB - 4 GB RAM via the HIGHMEM solution discussed above.
    Should I enable CONFIG_HIGHMEM for my 1 GB RAM system?
    It is advised to not enable CONFIG_HIGHMEM in the kernel to utilize the extra 128 MB you get for your 1 GB RAM system. I/O Devices cannot directly address high memory from PCI space, so bounce buffers have to be used. Plus the virtual memory management and paging costs come with extra mappings. 

    VIew Linux source code Lnx2.6.

    Google's Mentality


    Jedis build their own lightsabres 
                             (the MS Eat your own Dog Food)
    Parallelize Everything
    Distribute Everything (to atomic level if possible)
    Compress Everything (CPU cheaper than bandwidth)
    Secure Everything (you can never be too paranoid)
    Cache (almost) Everything
    Redundantize Everything (in triplicate usually)
    Latency is VERY evil

    Saturday, February 4, 2012

    Latency - 1

    Hardware latency mainly comes from,
        - pipeline instructions waiting to finish execution even though it is at the execute stage already. Reason is, most microprocessors can only do 1 or 2 instructions per clock cycle
        - memory load delays

    Microprocessor Charts

    Design styles diagram

    Friday, February 3, 2012

    Why need Cell Number?

    In this country, everyone is assigned a SSN when she was born. It could be the ID, plus extra small piece of information,  for the devices she owns.

    What about privacy? Some ways of permutation, etc....

    Wednesday, February 1, 2012

    Disruption-Tolerant Network (DTN)


    disruption-tolerant network (DTN)

           A disruption-tolerant network (DTN) is a network designed so that temporary or intermittent communications problems, limitations and anomalies have the least possible adverse impact. There are several aspects to the effective design of a DTN, including: 
    • The use of fault-tolerant methods and technologies.
    • The quality of graceful degradation under adverse conditions or extreme traffic loads.
    • The ability to prevent or quickly recover from electronic attacks.
    • Ability to function with minimal latency even when routes are ill-defined or unreliable.
    Fault-tolerant systems are designed so that if a component fails or a network route becomes unusable, a backup component, procedure or route can immediately take its place without loss of service. At the software level, an interface allows the administrator to continuously monitor network traffic at multiple points and locate problems immediately. In hardwarefault tolerance is achieved by component and subsystemredundancy.
    Graceful degradation has always been important in large networks. One of the original motivations for the development of the Internet by the Advanced Research Projects Agency (ARPA) of the U.S. government was the desire for a large-scale communications network that could resist massive physical as well as electronic attacks including global nuclear war. In graceful degradation, a network or system continues working to some extent even when a large portion of it has been destroyed or rendered inoperative.
    Electronic attacks on networks can take the form of viruses, worms, Trojans, spyware and other destructive programs or code. Other common schemes include denial of serviceattacks and malicious transmission of bulk e-mail or spam with the intent of overwhelming network servers. In some instances, malicious hackers commit acts of identity theft against individual subscribers or groups of subscribers in an attempt to discourage network use. In a DTN, such attacks may not be entirely preventable but their effects are minimized and problems are quickly resolved when they occur. Servers can be provided with antivirus software and individual computers in the system can be protected by programs that detect and remove spyware.
    As networks evolve and their usage levels vary, routes can change, sometimes within seconds. This can cause temporary propagation delays and unacceptable latency. In some cases, data transmission is blocked altogether. Internet users may notice this as periods during which some Web sites take a long time to download or do not appear at all. In a DTN, the frequency of events of this sort is kept to a minimum.

    Open Storage Network