Wednesday, January 7, 2009

Computer Architecture - A Quantitative Approach (Book Excerpt)

Computer Architecture - A Quantitative Approach (Book Excerpt) by John L. Hennessy and David A. Patterson

computer architecture Book price (Amazon.com)

I am very lucky to have studied computer architecture under Prof. David Patterson at U.C. Berkeley more than 20 years ago. I enjoyed the courses I took from him, in the early days of RISC architecture. Since leaving Berkeley to help found Sun Microsystems, I have used the ideas from his courses and many more that are described in this important book.

The good news today is that this book covers incredibly important and contemporary material. The further good news is that much exciting and challenging work remains to be done, and that working from Computer Architecture: A Quantitative Approach is a great way to start.

The most successful architectural projects that I have been involved in have always started from simple ideas, with advantages explainable using simple numerical models derived from hunches and rules of thumb. The continuing rapid advances in computing technology and new applications ensure that we will need new similarly simple models to understand what is possible in the future, and that new classes of applications will stress systems in different and interesting ways.

The quantitative approach introduced in Chapter 1 is essential to understanding these issues. In particular, we expect to see, in the near future, much more emphasis on minimizing power to meet the demands of a given application, across all sizes of systems; much remains to be learned in this area.

I have worked with many different instruction sets in my career. I first programmed a PDP-8, whose instruction set was so simple that a friend easily learned to disassemble programs just by glancing at the hole punches in paper tape! I wrote a lot of code in PDP-11 assembler, including an interpreter for the Pascal programming language and for the VAX (which was used as an example in the first edition of this book); the success of the VAX led to the widespread use of UNIX on the early Internet.

The PDP-11 and VAX were very conventional complex instruction set (CISC) computer architectures, with relatively compact instruction sets that proved nearly impossible to pipeline. For a number of years in public talks I used the performance of the VAX 11/780 as the baseline; its speed was extremely well known because faster implementations of the architecture were so long delayed. VAX performance stalled out just as the x86 and 680x0 CISC architectures were appearing in microprocessors; the strong economic advantages of microprocessors led to their overwhelming dominance. Then the simpler reduced instruction set (RISC) computer architectures—pioneered by John Cocke at IBM; promoted and named by Patterson and Hennessy; and commercialized in POWER PC, MIPS, and SPARC—were implemented as microprocessors and permitted highperformance pipeline implementations through the use of their simple registeroriented instruction sets. A downside of RISC was the larger code size of programs and resulting greater instruction fetch bandwidth, a cost that could be seen to be acceptable using the techniques of Chapter 1 and by believing in the future CMOS technology trends promoted in the now-classic views of Carver Mead. The kind of clear-thinking approach to the present problems and to the shape of future computing advances that led to RISC architecture is the focus of this book.

Chapter 2 (and various appendices) presents interesting examples of contemporary and important historical instruction set architecture. RISC architecture—the focus of so much work in the last twenty years—is by no means the final word here. I worked on the design of the SPARC architecture and several implementations for a decade, but more recently have worked on two different styles of processor: picoJava, which implemented most of the Java Virtual Machine instructions—a compact, high-level, bytecoded instruction set—and MAJC, a very simple and multithreaded VLIW for Java and media-intensive applications. These two architectures addressed different and new market needs: for lowpower chips to run embedded devices where space and power are at a premium, and for high performance for a given amount of power and cost where parallel applications are possible. While neither has achieved widespread commercial success, I expect that the future will see many opportunities for different ISAs, and an in-depth knowledge of history here often gives great guidance—the relationships between key factors, such as the program size, execution speed, and power consumption, returning to previous balances that led to great designs in the past.

Chapters 3 and 4 describe instruction-level parallelism (ILP): the ability to execute more than one instruction at a time. This has been aided greatly, in the last 20 years, by techniques such as RISC and VLIW (very long instruction word) computing. But as later chapters here point out, both RISC and especially VLIW as practiced in the Intel itanium architecture are very power intensive. In our attempts to extract more instruction-level parallelism, we are running up against the fact that the complexity of a design that attempts to execute N instructions simultaneously grows like N2: the number of transistors and number of watts to produce each result increases dramatically as we attempt to execute many instructions of arbitrary programs simultaneously. There is thus a clear countertrend emerging: using simpler pipelines with more realistic levels of ILP while exploiting other kinds of parallelism by running both multiple threads of execution per processor and, often, multiple processors on a single chip. The challenge for designers of high-performance systems of the future is to understand when simultaneous execution is possible, but then to use these techniques judiciously in combination with other, less granular techniques that are less power intensive and complex.

In graduate school I would often joke that cache memories were the only great idea in computer science. But truly, where you put things affects profoundly the design of computer systems. Chapter 5 describes the classical design of cache and main memory hierarchies and virtual memory. And now, new, higher-level programming languages like Java support much more reliable software because they insist on the use of garbage collection and array bounds checking, so that security breaches from "buffer overflow" and insidious bugs from false sharing of memory do not creep into large programs. It is only languages, such as Java, that insist on the use of automatic storage management that can implement true software components. But garbage collectors are notoriously hard on memory hierarchies, and the design of systems and language implementations to work well for such areas is an active area of research, where much good work has been done but much exciting work remains.

Java also strongly supports thread-level parallelism—a key to simple, powerefficient, and high-performance system implementations that avoids the N2 problem discussed earlier but brings challenges of its own. A good foundational understanding of these issues can be had in Chapter 6. Traditionally, each processor was a separate chip, and keeping the various processors synchronized was expensive, both because of its impact on the memory hierarchy and because the synchronization operations themselves were very expensive. The Java language is also trying to address these issues: we tried, in the Java Language Specification, which I coauthored, to write a description of the memory model implied by the language. While this description turned out to have (fixable) technical problems, it is increasingly clear that we need to think about the memory hierarchy in the design of languages that are intended to work well on the newer system platforms. We view the Java specification as a first step in much good work to be done in the future.

As Chapter 7 describes, storage has evolved from being connected to individual computers to being a separate network resource. This is reminiscent of computer graphics, where graphics processing that was previously done in a host processor often became a separate function as the importance of graphics increased. All this is likely to change radically in the coming years—massively parallel host processors are likely to be able to do graphics better than dedicated outboard graphics units, and new breakthroughs in storage technologies, such as memories made from molecular electronics and other atomic-level nanotechnologies, should greatly reduce both the cost of storage and the access time. The resulting dramatic decreases in storage cost and access time will strongly encourage the use of multiple copies of data stored on individual computing nodes, rather than shared over a network. The "wheel of reincarnation," familiar from graphics, will appear in storage.

Chapter 8 provides a great foundational description of computer interconnects and networks. My model of these comes from Andy Bechtolsheim, another of the cofounders of Sun, who famously said, "Ethernet always wins."More modestly stated: given the need for a new networking interconnect, and despite its shortcomings, adapted versions of the Ethernet protocols seem to have met with overwhelming success in the marketplace. Why? Factors such as the simplicity and familiarity of the protocols are obvious, but quite possibly the most likely reason is that the people who are adapting Ethernet can get on with the job at hand rather than arguing about details that, in the end, aren’t dispositive. This lesson can be generalized to apply to all the areas of computer architecture discussed in this book.

One of the things I remember Dave Patterson saying many years ago is that for each new project you only get so many "cleverness beans." That is, you can be very clever in a few areas of your design, but if you try to be clever in all of them, the design will probably fail to achieve its goals—or even fail to work or to be finished at all. The overriding lesson that I have learned in 20 plus years of working on these kinds of designs is that you must choose what is important and focus on that; true wisdom is to know what to leave out. A deep knowledge of what has gone before is key to this ability.

And you must also choose your assumptions carefully. Many years ago I attended a conference in Hawaii (yes, it was a boondoggle, but read on) where Maurice Wilkes, the legendary computer architect, gave a speech. What he said, paraphrased in my memory, is that good research often consists of assuming something that seems untrue or unlikely today will become true and investigating the consequences of that assumption. And if the unlikely assumption indeed then becomes true in the world, you will have done timely and sometimes, then, even great research! So, for example, the research group at Xerox PARC assumed that everyone would have access to a personal computer with a graphics display connected to others by an internetwork and the ability to print inexpensively using Xerography. How true all this became, and how seminally important their work was!

In our time, and in the field of computer architecture, I think there are a number of assumptions that will become true. Some are not controversial, such as that Moore’s Law is likely to continue for another decade or so and that the complexity of large chip designs is reaching practical limits, often beyond the point of positive returns for additional complexity. More controversially, perhaps, molecular electronics is likely to greatly reduce the cost of storage and probably logic elements as well, optical interconnects will greatly increase the bandwidth and reduce the error rates of interconnects, software will continue to be unreliable because it is so difficult, and security will continue to be important because its absence is so debilitating.

Taking advantage of the strong positive trends detailed in this book and using them to mitigate the negative ones will challenge the next generation of computer architects, to design a range of systems of many shapes and sizes.

Computer architecture design problems are becoming more varied and interesting. Now is an exciting time to be starting out or reacquainting yourself with the latest in this field, and this book is the best place to start. See you in the chips!



Computer technology has made incredible progress in the roughly 55 years since the first general-purpose electronic computer was created. Today, less than a thousand dollars will purchase a personal computer that has more performance, more main memory, and more disk storage than a computer bought in 1980 for 1 million dollars. This rapid rate of improvement has come both from advances in the technology used to build computers and from innovation in computer design.

Although technological improvements have been fairly steady, progress arising from better computer architectures has been much less consistent. During the first 25 years of electronic computers, both forces made a major contribution; but beginning in about 1970, computer designers became largely dependent upon integrated circuit technology. During the 1970s, performance continued to improve at about 25% to 30% per year for the mainframes and minicomputers that dominated the industry.

The late 1970s saw the emergence of the microprocessor. The ability of the microprocessor to ride the improvements in integrated circuit technology more closely than the less integrated mainframes and minicomputers led to a higher rate of improvement—roughly 35% growth per year in performance.

This growth rate, combined with the cost advantages of a mass-produced microprocessor, led to an increasing fraction of the computer business being based on microprocessors. In addition, two significant changes in the computer marketplace made it easier than ever before to be commercially successful with a new architecture. First, the virtual elimination of assembly language programming reduced the need for object-code compatibility. Second, the creation of standardized, vendor-independent operating systems, such as UNIX and its clone, Linux, lowered the cost and risk of bringing out a new architecture.

These changes made it possible to successfully develop a new set of architectures, called RISC (Reduced Instruction Set Computer) architectures, in the early 1980s. The RISC-based machines focused the attention of designers on two critical performance techniques, the exploitation of instruction-level parallelism (initially through pipelining and later through multiple instruction issue) and the use of caches (initially in simple forms and later using more sophisticated organizations and optimizations). The combination of architectural and organizational enhancements has led to 20 years of sustained growth in performance at an annual rate of over 50%.

The effect of this dramatic growth rate has been twofold. First, it has signifi- cantly enhanced the capability available to computer users. For many applications, the highest-performance microprocessors of today outperform the supercomputer of less than 10 years ago.

Second, this dramatic rate of improvement has led to the dominance of microprocessor-based computers across the entire range of the computer design. Workstations and PCs have emerged as major products in the computer industry. Minicomputers, which were traditionally made from off-the-shelf logic or from gate arrays, have been replaced by servers made using microprocessors. Mainframes have been almost completely replaced with multiprocessors consisting of small numbers of off-the-shelf microprocessors. Even high-end supercomputers are being built with collections of microprocessors.

Freedom from compatibility with old designs and the use of microprocessor technology led to a renaissance in computer design, which emphasized both architectural innovation and efficient use of technology improvements. This renaissance is responsible for the higher performance growth, a rate that is unprecedented in the computer industry. This rate of growth has compounded so that by 2001, the difference between the highestperformance microprocessors and what would have been obtained by relying solely on technology, including improved circuit design, was about a factor of 15.

Figure 1.1 Growth in microprocessor performance since the mid-1980s has been substantially higher than in earlier years as shown by plotting SPECint performance. This chart plots relative performance as measured by the SPECint benchmarks with base of one being a VAX 11/780. Since SPEC has changed over the years, performance of newer machines is estimated by a scaling factor that relates the performance for two different versions of SPEC (e.g., SPEC92 and SPEC95). Prior to the mid-1980s, microprocessor performance growth was largely technology driven and averaged about 35% per year. The increase in growth since then is attributable to more advanced architectural and organizational ideas. By 2001 this growth led to a difference in performance of about a factor of 15. Performance for floating-point-oriented calculations has increased even faster.

In the last few years, the tremendous improvement in integrated circuit capability has allowed older, less-streamlined architectures, such as the x86 (or IA-32) architecture, to adopt many of the innovations first pioneered in the RISC designs. As we will see, modern x86 processors basically consist of a front end that fetches and decodes x86 instructions and maps them into simple ALU, memory access, or branch operations that can be executed on a RISC-style pipelined processor. Beginning in the late 1990s, as transistor counts soared, the overhead (in transistors) of interpreting the more complex x86 architecture became negligible as a percentage of the total transistor count of a modern microprocessor.

This text is about the architectural ideas and accompanying compiler improvements that have made this incredible growth rate possible. At the center of this dramatic revolution has been the development of a quantitative approach to computer design and analysis that uses empirical observations of programs, experimentation, and simulation as its tools. It is this style and approach to computer design that is reflected in this text.

Sustaining the recent improvements in cost and performance will require continuing innovations in computer design, and we believe such innovations will be founded on this quantitative approach to computer design. Hence, this book has been written not only to document this design style, but also to stimulate you to contribute to this progress.



Chapter 5 - I/O And Consistency of Cached Data

Because of caches, data can be found in memory and in the cache. As long as the CPU is the sole device changing or reading the data and the cache stands between the CPU and memory, there is little danger in the CPU seeing the old or stale copy. I/O devices give the opportunity for other devices to cause copies to be inconsistent or for other devices to read the stale copies. Figure 5.46 illustrates the problem, generally referred to as the cache-coherency problem.

The question is this: Where does the I/O occur in the computer-between the I/O device and the cache or between the I/O device and main memory? If input puts data into the cache and output reads data from the cache, both I/O and the CPU see the same data, and the problem is solved. The difficulty in this approach is that it interferes with the CPU. I/O competing with the CPU for cache access will cause the CPU to stall for I/O. Input will also interfere with the cache by displacing some information with the new data that is unlikely to be accessed by the CPU soon. For example, on a page fault the CPU may need to access a few words in a page, but a program is not likely to access every word of the page if it were loaded into the cache. Given the integration of caches onto the same integrated circuit, it is also difficult for that interface to be visible.

The goal for the I/O system in a computer with a cache is to prevent the stale data problem while interfering with the CPU as little as possible. Many systems, therefore, prefer that I/O occur directly to main memory, with main memory

FIGURE 5.46 The cache-coherency problem. A' and B refer to the cached copiesof A and B in memory. (a) shows cache and main memory in a coherent state. In (b) we assume a write-back cache when the CPU writes 550 into A. Now A' has the value but the value in memory has the old, stale value of 100. If an output used the value of A from memory, it would get the stale data. In (c) the I/O system inputs 440 into the memory copy of B, so now B, in the cache has the old, stale data acting as an I/O buffer. If a write-through cache is used, then memory has an upto-date copy of the information, and there is no stale-data issue for output. (This is a reason many machines use write through.) Input requires some extra work. The software solution is to guarantee that no blocks of the I/O buffer designated for input are in the cache. In one approach, a buffer page is marked as noncachable; the operating system always inputs to such a page. In another approach, the operating system flushes the buffer addresses from the cache after the input occurs. A hardware solution is to check the I/O addresses on input to see if they are in the cache; to avoid slowing down the cache to check addresses, sometimes a duplicate set of tags are used to allow checking of I/O addresses in parallel with processor cache accesses. If there is a match of I/O addresses in the cache, the cache entries are invalidated to avoid stale data. All these approaches can also be used for output with write-back caches. More about this is found in Chapter 6.

The cache-coherency problem applies to multiprocessors as well as I/O. Unlike I/O, where multiple data copies are a rare event-one to be avoided whenever possible-a program running on multiple processors will want to have copies of the same data in several caches. Performance of a multiprocessor program depends on the performance of the system when sharing data. The protocols to maintain coherency for multiple processors are called cache-coherency protocols and are described in Chapter 8.

5.10 Putting It All Together the Alpha AXP 21064 Memory Hierarchy

Thus far we have given glimpses of the Alpha AXP 21064 memory hierarchy; this section unveils the full design and shows the performance of its components for the SPEC92 programs. Figure 5.47 gives the overall picture of this design.

Let's really start at the beginning, when the Alpha is turned on. Hardware on the chip loads the instruction cache from an external PROM. This initialization allows the 8-KB instruction cache to omit a valid bit, for there are always valid instructions in the cache; they just might not be the ones your program is interested in. The hardware does clear the valid bits in the data cache. The PC is set to the kseg segment so that the instruction addresses are not translated, thereby avoiding the TLB.

One of the first steps is to update the instruction TLB with valid page table entries (PTEs) for this process. Kernel code updates the TLB with the contents of the appropriate page table entry for each page to be mapped. The instruction TLB has eight entries for 8-KB pages and four for 4-MB pages. (The 4-MB pages are used by large programs such as the operating system or data bases that will likely touch most of their code.) A miss in the TLB invokes the Privileged Architecture Library (PAL code) software that updates the TLB. PAL code is simply machine language routines with some implementation-specific extensions to allow access to low-level hardware, such as the TLB. PAL code runs with exceptions disabled, and instruction accesses are not checked for memory management violations, allowing PAL code to fill the TLB.

Once the operating system is ready to begin executing a user process, it sets the PC to the appropriate address in segment segO. We are now ready to follow memory hierarchy in action: Figure 5.47 is labeled with the steps of this narrative. The page frame portion of this address is sent to the TLB (step 1), while the 8-bit index from the page offset is sent to the direct-mapped 8-KB (256 32-byte blocks) instruction cache (step 2). The fully associative TLB simultaneously searches all 12 entries to find a match between the address and a valid PTE (step 3). In addition to translating the address, the TLB checks to see if the PTE demands that this access result in an exception. An exception might occur if either this access violates the protection on the page or ifthe page is not in main memory. If there is no exception, and if the translated physical address matches the tag in the instruction cache (step 4), then the proper 8 bytes of the 32-byte block are furnished to the CPU using the lower bits of the page offset (step 5), and the instruction stream access is done.

A miss, on the other hand, simultaneously starts an access to the second-level cache (step 6) and checks the prefetch instruction stream buffer (step 7). If the desired instruction is found in the stream buffer (step 8), the critical 8 bytes are sent to the CPU, the full 32-byte block of the stream buffer is written into the instruction cache (step 9), and the request to the second-level cache is canceled. Steps 6 to 9 take just a single clock cycle.

If the instruction is not in the prefetch stream buffer, the second-level cache continues trying to fetch the block. The 21064 microprocessor is designed to work with direct-mapped second-level caches from 128 KB to 8 MB with a miss penalty between 3 and 16 clock cycles. For this section we use the memory system of the DEC 3000 model 800 Alpha AXP. It has a 2-MB (65,536 32-byte blocks) second-level cache, so the 29-bit block address is divided into a 13-bit tag and a 16-bit index (step 10). The cache reads the tag from that index and if it matches (step 11), the cache returns the critical 16 bytes in the first 5 clock cycles and the other 16 bytes in the next 5 clock cycles (step 12). The path between the first- and second-level cache is 128 bits wide (16 bytes). At the same time, a request is made for the next sequential 32-byte block, which is loaded into the instruction stream buffer in the next 10 clock cycles (step 13).

The instruction stream does not rely on the TLB for address translation. It simply increments the physical address of the miss by 32 bytes, checking to make sure that the new address is within the same page. If the incremented address crosses a page boundary, then the prefetch is suppressed.

If the instruction is not found in the secondary cache, the translated physical address is sent to memory (step 14). The DEC 3000 model 800 divides memory into four memory mother boards (MMB), each of which contains two to eight SIMMs (single inline memory modules). The SIMMs come with eight DRAMs for information plus two DRAMs for error protection per side, and the options are single- or double-sided SIMMs using I-Mbit, 4-Mbit, or 16-Mbit DRAMs. Hence the memory capacity of the model 800 is 8 MB (4 x 2 x 8 x I x 1/8) to 1024 MB (4 x 8 x 8 x 16 x 2/8), always organized 256 bits wide. The average time to transfer 32 bytes from memory to the secondary cache is 36 clock cycles after the processor makes the request. The second-level cache loads this data 16 bytes at a time.

Since the second-level cache is a write-back cache, any miss can lead to the old block being written back to memory. The 21064 places this "victim" block into a victim buffer to get out of the way of new data (step 15). The new data are loaded into the cache as soon as they arrive (step 16), and then the old data are written from the victim buffer (step 17). There is a single block in the victim buffer, so a second miss would need to stall until the victim buffer empties.

Suppose this initial instruction is a load. It will send the page frame of its data address to the data TLB (step 18) at the same time as the 8-bit index from the page offset is sent to the data cache (step 19). The data TLB is a fully associative cache containing 32 PTEs, each of which represents page sizes from 8 KB to 4 MB. A TLB miss will trap to PAL code to load the valid PTE for this address. In the worst case, the page is not in memory, and the operating system gets the page from disk (step 20). Since millions of instructions could execute during a page fault, the operating system will swap in another process if there is something waiting to run.

Assuming that we have a valid PTE in the data TLB (step 21), the cache tag and the physical page frame are compared (step 22), with a match sending the desired 8 bytes from the 32-byte block to the CPU (step 23). A miss goes to the second-level cache, which proceeds exactly like an instruction miss.

Suppose the instruction is a store instead of a load. The page frame portion of the data address is again sent to the data TLB and the data cache (steps 18 and 19), which checks for protection violations as well as translates the address. The physical address is then sent to the data cache (steps 21 and 22). Since the data cache uses write through, the store data are simultaneously sent to the write buffer (step 24) and the data cache (step 25). As explained on page 425, the 21064 pipelines write hits. The data address of this store is checked for a match, and at the same time the data from the previous write hit are written to the cache (step 26). If the address check was a hit, then the data from this store are placed in the write pipeline buffer. On a miss, the data are just sent to the write buffer since the data cache does not allocate on a write miss.

The write buffer takes over now. It has four entries, each containing a whole cache block. If the buffer is full, then the CPU must stall until a block is written to the second-level cache. If the buffer is not full, the CPU continues and the address of the word is presented to the write buffer (step 27). It checks to see if the word matches any block already in the buffer so that a sequence of writes can be stitched together into a full block, thereby optimizing use of the write bandwidth between the first- and second-level cache.

All writes are eventually passed on to the second-level cache. If a write is a hit, then the data are written to the cache (step 28). Since the second-level cache uses write back, it cannot pipeline writes: a full 32-byte block write takes 5 clock cycles to check the address and 10 clock cycles to write the data. A write of 16 bytes or less takes 5 clock cycles to check the address and 5 clock cycles to write the data. In either case the cache marks the block as dirty.

If the access to the second-level cache is a miss, the victim block is checked to see if it is dirty; if so, it is placed in the victim buffer as before (step 15). If the new data are a full block, then the data are simply written and marked dirty. A partial block write results in an access to main memory since the second-level cache policy is to allocate on a write miss...

____________________________________________

Older Post:



Tuesday, December 23, 2008

MySQL Developer's Library (Book Excerpt) by Paul DuBois

MySQL Developer's Library (Book Excerpt) by Paul DuBois

mysql developers library paul dubois Book price (Amazon.com)

Introduction to MySQL and SQL: This chapter provides an introduction to the MySQL relational database management system (RDBMS), and to the Structured Query Language (SQL) that MySQL understands. It lays out basic terms and concepts you should understand, describes the sample database we'll be using for examples throughout the book, and provides a tutorial that shows you how to use MySQL to create a database and interact with it.

Begin here if you are new to databases and perhaps uncertain whether or not you need one or can use one.You should also read the chapter if you don't know anything about MySQL or SQL and need an introductory guide to get started. Readers who have experience with MySQL or with database systems might want to skim through the material. However, everybody should read the section "A Sample Database" because it's best if you're familiar with the purpose and contents of the database that we'll be using repeatedly throughout the book.

How MySQL Can Help You: This section describes situations in which the MySQL database system is useful. This will give you an idea of the kinds of things MySQL can do and the ways in which it can help you. If you don't need to be convinced about the usefulness of a database system-perhaps because you've already got a problem in mind and just want to find out how to put MySQL to work helping you solve it-you can proceed to "A Sample Database."

A database system is essentially just a way to manage lists of information. The information can come from a variety of sources. For example, it can represent research data, business records, customer requests, sports statistics, sales reports, personal hobby information, personnel records, bug reports, or student grades. However, although database systems can deal with a wide range of information, you don't use such a system for its own sake. If a job is easy to do already, there's no reason to drag a database into it just to use one. A grocery list is a good example:You write down the items to get, cross them off as you do your shopping, and then throw the list away. It's highly unlikely that you'd use a database for this. Even if you have a palmtop computer, you'd probably use its notepad function for a grocery list, not its database capabilities.

The power of a database system comes in when the information you want to organize and manage becomes voluminous or complex so that your records become more burdensome than you care to deal with by hand. Databases can be used by large corporations processing millions of transactions a day, of course. But even small-scale operations involving a single person maintaining information of personal interest may require a database. It's not difficult to think of situations in which the use of a database can be beneficial because you needn't have huge amounts of information before that information becomes difficult to manage. Consider the following situations:

  • Your carpentry business has several employees.You need to maintain employee and payroll records so that you know who you've paid and when, and you must summarize those records so that you can report earnings statements to the government for tax purposes. You also need to keep track of the jobs your company has been hired to do and which employees you've scheduled to work on each job.
  • You run a network of automobile parts warehouses and need to be able to tell which ones have any given part in their inventory so that you can fill customer orders.
  • As a toy seller, you're particularly subject to fad-dependent demand for items that you carry.You want to know what the current sales trajectory is for certain items so that you can estimate whether to increase inventory (for an item that's becoming more popular) or decrease it (so you're not stuck with a lot of stock for something that's no longer selling well).
  • That pile of research data you've been collecting over the course of many years needs to be analyzed for publication, lest the dictum "publish or perish" become the epitaph for your career.You want to boil down large amounts of raw data to generate summary information, and to pull out selected subsets of observations for more detailed statistical analysis.
  • You're a popular speaker who travels the country to many types of assemblies, such as graduations, business meetings, civic organizations, and political conventions.You give so many addresses that it's difficult to remember what you've spoken on at each place you've been, so you'd like to maintain records of your past talks and use them to help you plan future engagements. If you return to a place at which you've spoken before, you don't want to give a talk similar to one you've already delivered there, and a record of each place you've been would help you avoid repeats.You'd also like to note how wen your talks are received. (Your address "Why I Love Cats" to the Metropolitan Kennel Club was something of a dud, and you don't want to make that mistake again the next time you're there.)
  • You're a teacher who needs to keep track of grades and attendance. Each time you give a quiz or a test, you record every student's grade. It's easy enough to write down scores in a gradebook, but using the scores later is a tedious chore. You'd rather avoid sorting the scores for each test to determine the grading curve, and you'd really rather not add up each student's scores when you determine final grades at the end of the grading period. Counting each student's absences is no fun, either.
  • The organization for which you are the secretary maintains a directory of members. (The organization could be anything-a professional society, a club, a repertory company, a symphony orchestra, or an athletic booster club.) You generate the directory in printed form each year for members, based on a word processor document that you edit as membership information changes.
You're tired of maintaining the directory that way because it limits what you can do with it. It's difficult to sort the entries in different ways, and you can't easily select just certain parts of each entry (such as a list consisting only of names and phone numbers). Nor can you easily find a subset of members, such as those who need to renew their memberships soon-if you could, it would eliminate the job of looking through the entries each month to find those members who need to be sent renewal notices.

Also, you'd really like to avoid doing all the directory editing yourself, but the society doesn't have much of a budget, and hiring someone is out of the question.You've heard about the "paperless office" that's supposed to result from electronic record-keeping, but you haven't seen any benefit from it. The membership records are electronic, but, ironically, aren't in a form that can be used easily for anything except generating paper by printing the directory!

These scenarios range from situations involving large amounts to relatively small amounts of information. They share the common characteristic of involving tasks that can be performed manually but that could be performed more efficiently by a database system.

What specific benefits should you expect to see from using a database system such as MySQL?

____________________________________________

Older Post:



Monday, December 15, 2008

The Huffington Post Complete Guide to Blogging (Book Excerpt)

The Huffington Post Complete Guide to Blogging (Book Excerpt) - The editors of the Huffington Post and Arianna Huffington

The Huffington Post Complete Guide to Blogging Book price (Amazon.com)

What Is a Blog? A blog at its most fundamental level is simply a "web log." That is, a regularly updated account of events or ideas posted on the web.

But calling blogs mere updated web diaries is a bit like calling poetry a pleasant arrangement of words on a page. There is an art to this. Those of us who work at HuffPost believe we are fortunate enough to be present at the advent of a new form of human communication — one that is more interactive, more democratic, and just more fun than what has come before.

Blogs can bring down a Senate majority leader. They can show what a presidential candidate talks about in unguarded moments. They can provide stay-at-home parents with a little space to rant about the tragedy of colic (or maybe share updates on a local environmental issue — and Brad Pitt — during naps). They cut out the gatekeepers of information and shorten the news cycle. They give companies new ways to communicate with customers and shareholders — and give customers and shareholders new ways to make their voices heard. Blogging gives you a feeling of satisfaction that writing a letter to the editor, or a letter to the "customer care" department of a corporation, cannot match. The public nature of blogs means that any of the billion people on this planet who own or have access to a computer can read what any of the rest of us is saying. That's true even if what we're saying is about a niche (for instance, issues germane to the mini off-road buggy community) that in the past would have gotten us labeled as freaks. In fact, because the potential audience is so huge, there is space for just about every topic you can imagine.

It is this mix of the high and low, the personal and the political, that makes blogs so fascinating and so important in an open society. When we launched HuffPost in 2005, we knew we liked blogs, but even we underestimated how head over heels we'd fall. "Blogging is definitely the most interesting thing I've done as a writer, and I've been writing full-time since the late seventies," Carol Felsenthal, author of Clinton in Exile: A President Out of the White House and a HuffPost blogger, tells us. "I used to walk my dog, Henry, first thing in the morning. Now I'm often at my computer writing a post while Henry looks at me and wonders what happened to the good old days when his owner was compulsive but not hyper-compulsive."

It's the informality and the immediacy that make blogging addictive for many of us. No editor stands between us and the public. This leads to a lot of rumors and other fluff going up on the web. But it's also enormously liberating. You can put all kinds of ideas out there. "My thoughts don't all have to be fully baked," says Marci Alboher, who writes the "Shifting Careers" column and blog for The New York Times. She posts an idea and sees what her readers think. "They help me solve the problem and let me know if I'm going down the right path. It helps me figure out what the issues are very quickly."

It is this multidirectional conversation — giving all of us a platform, expanding the scope of news, and making it a shared enterprise between producers and consumers — that makes blogs so revolutionary. We have a lot of fun blogging. We're writing this book because we're pretty sure you will too.

____________________________________________

Older Post:



Wednesday, December 3, 2008

The Best Ways to Secure Your Business Information

The Best Ways to Secure Your Business Information

Effective data security starts with assessing what information you have and identifying who has access to it. Understanding how personal information moves into, through, and out of your business and who has—or could have—access is essential to assessing security vulnerabilities. You can determine the best ways to secure the information only after you’ve traced how it flows.

Inventory all computers, laptops, flash drives, disks, home computers, and other equipment to find out where your company stores sensitive data. Also inventory the information you have by type and location. Your file cabinets and computer systems are a start, but remember: your business receives personal information in a number of ways—through Web sites, from contractors, from call centers, and the like. What about information saved on laptops, employees’ home computers, flash drives, and cell phones? No inventory is complete until you check everywhere sensitive data might be stored.

Track personal information through your business by talking with your sales department, information technology staff, human resources office, accounting personnel, and outside service providers. Get a complete picture of:

Who sends sensitive personal information to your business. Do you get it from customers? Credit card companies? Banks or other financial institutions? Credit bureaus? Other businesses?

How your business receives personal information. Does it come to your business through a Web site? By e-mail? Through the mail? Is it transmitted through cash registers in stores?

What kind of information you collect at each entry point. Do you get credit card information online? Does your accounting department keep information about customers’ checking accounts?

Where you keep the information you collect at each entry point. Is it in a central computer database? On individual laptops? On disks or tapes? In file cabinets? In branch offices? Do employees have files at home?

Who has—or could have—access to the information. Which of your employees has permission to access the information? Could anyone else get a hold of it? What about vendors who supply and update software you use to process credit card transactions? Contractors operating your call center?

Different types of information present varying risks. Pay particular attention to how you keep personally identifying information: Social Security numbers, credit card or financial information, and other sensitive data. That’s what thieves use most often to commit fraud or identity theft.

____________________________________________

Older Post:



Monday, November 24, 2008

Commercial Web Hosting and P2P Bandwidth

Commercial Web Hosting and P2P Bandwidth

In the commercial web hosting, since the hosting corporation usually owns all the servers that host the content and the network links between them, the bandwidth required to duplicate the web content and the storage overhead needed to hold the webpages are usually not the primary concerns. This is also true for certain restrictive web hosting applications, such as YouServ, which is a solution to share files and web pages of individual users through standard web protocols on the intranet of a corporation. Existing research on distributive web hosting usually focuses on improving the response time of the server, such as the server placement strategy and direction of the web request to the proper server.

However, this is not the case with a consumer P2P network, where both the network bandwidth and the storage capacity is at a premium for the peers, and the P2P web hosting application is competing with other applications for such resources. Therefore, it is necessary to develop technologies that may improve the webhosting reliability and serving bandwidth while reducing the network bandwidth used to host the web site.

____________________________________________

Older Post:



Sunday, November 16, 2008

iBrain - Gary Small and Gigi Vorgan (Book Excerpt)

iBrain - Gary Small and Gigi Vorgan (Book Excerpt)

iBrain Book price (Amazon)

The people who are crazy enough to think they can change the world are the ones who do. - Steve Jobs, CEO of Apple

You're on a plane packed with other business people, reading your electronic version of the Wall Street Journal on your laptop while downloading files to your BlackBerry and organizing your PowerPoint presentation for your first meeting when you reach New York. You relish the perfect symmetry of your schedule, to-do lists, and phone book as you notice a woman in the next row entering little written notes into her leather-bound daily planner book. You remember having one of those . . . What? Like a zillion years ago? Hey lady! Wake up and smell the computer age.

You're outside the airport now, waiting impatiently for a cab along with a hundred other people. It's finally your turn, and as you reach for the taxi door a large man pushes in front of you, practically knocking you over. Your briefcase goes flying, and your laptop and BlackBerry splatter into pieces on the pavement. As you frantically gather up the remnants of your once perfectly scheduled life, the woman with the daily planner book gracefully steps into a cab and glides away.

The current explosion of digital technology not only is changing the way we live and communicate but is rapidly and profoundly altering our brains. Daily exposure to high technology—computers, smart phones, video games, search engines like Google and Yahoo—stimulates brain cell alteration and neurotransmitter release, gradually strengthening new neuralpathways in our brains while weakening old ones. Because of the current technological revolution, our brains are evolving right now—at a speed like never before.

Besides influencing how we think, digital technology is altering how we feel, how we behave, and the way in which our brains function. Although we are unaware of these changes in our neural circuitry or brain wiring, these alterations can become permanent with repetition. This evolutionary brain process has rapidly emerged over a single generation and may represent one of the most unexpected yet pivotal advances in human history. Perhaps not since Early Man first discovered how to use a tool has the human brain been affected so quickly and so dramatically.

Television had a fundamental impact on our lives in the past century, and today the average person's brain continues to have extensive daily exposure to TV. Scientists at the University of California, Berkeley, recently found that on average Americans spend nearly three hours each day watching television or movies, or much more time spent than on all leisure physical activities combined. But in the current digital environment, the Internet is replacing television as the prime source of brain stimulation. Seven out of ten American homes are wired for high-speed Internet. We rely on the Internet and digital technology for entertainment, political discussion, and even social reform as well as communication with friends and co-workers.

As the brain evolves and shifts its focus toward new technological skills, it drifts away from fundamental social skills, such as reading facial expressions during conversation or grasping the emotional context of a subtle gesture. A Stanford University study found that for every hour we spend on our computers, traditional face-to-face interaction time with other people drops by nearly thirty minutes. With the weakening of the brain's neural circuitry controlling human contact, our social interactions may become awkward, and we tend to misinterpret, and even miss subtle, nonverbal messages. Imagine how the continued slipping of social skills might affect an international summit meeting ten years from now when a misread facial cue or a misunderstood gesture could make the difference between escalating military conflict or peace.

The high-tech revolution is redefining not only how we communicate but how we reach and influence people, exert political and social change, and even glimpse into the private lives of co-workers, neighbors, celebrities, and politicians. An unknown innovator can become an overnight media magnet as news of his discovery speeds across the Internet. A cell phone video camera can capture a momentary misstep of a public figure, and in minutes it becomes the most downloaded video on YouTube. Internet social networks like MySpace and Facebook have exceeded a hundred million users, emerging as the new marketing giants of the digital age and dwarfing traditional outlets such as newspapers and magazines.

Young minds tend to be the most exposed, as well as the most sensitive, to the impact of digital technology. Today's young people in their teens and twenties, who have been dubbed Digital Natives, have never known a world without computers, twenty-four-hour TV news, Internet, and cell phones—with their video, music, cameras, and text messaging. Many of these Natives rarely enter a library, let alone look something up in a traditional encyclopedia; they use Google, Yahoo, and other online search engines. The neural networks in the brains of these Digital Natives differ dramatically from those of Digital Immigrants: people—including all baby boomers—who came to the digital/computer age as adults but whose basic brain wiring was laid down during a time when direct social interaction was the norm. The extent of their early technological communication and entertainment involved the radio, telephone, and TV.

As a consequence of this overwhelming and early high-tech stimulation of the Digital Native's brain, we are witnessing the beginning of a deeply divided brain gap between younger and older minds—in just one generation. What used to be simply a generation gap that separated young people's values, music, and habits from those of their parents has now become a huge divide resulting in two separate cultures. The brains of the younger generation are digitally hardwired from toddlerhood, often at the expense of neural circuitry that controls one-on-one people skills. Individuals of the older generation face a world in which their brains must adapt to high technology, or they'll be left behind— politically, socially, and economically.

Young people have created their own digital social networks, including a shorthand type of language for text messaging, and studies show that fewer young adults read books for pleasure now than in any generation before them. Since 1982, literary reading has declined by 28 percent in eighteen- to thirty-four-year-olds. Professor Thomas Patterson and colleagues at Harvard University reported that only 16 percent of adults age eighteen to thirty read a daily newspaper, compared with 35 percent of those thirty-six and older. Patterson predicts that the future of news will be in the electronic digital media rather than the traditional print or television forms.

____________________________________________

Older Post:



Wednesday, November 5, 2008

Things You Should Know About Shared Hosting

Things You Should Know About Shared Hosting
by Daren Albom

People who are building a website will always come to a point where he or she needs to get a web hosting service to host the website. Confusions strike in when they find out there are different type of hosting including shared hosting, dedicated hosting and reseller hosting. People confuse wondering which type of hosting should they go with. This article will tell you more about shared hosting and its specialty in order to give you a better idea in your decision making.

For your information, shared hosting like what the name speaks is shared among subscribers. For example when two people sign up for a shared hosting, they will host in the same physical server sharing the same IP address however their accounts are separated using software. Each of them will not be able to access the other account. So it looks like the account is independent. The service providers have kind of partitioned the storage space into two slots for two different accounts. Same theory applies when the same server has 1,000 subscribers.

You will wonder how the computer identify the correct website when internet users want to access the website hosted in shared hosting since there is only one IP address. When a user requests a website by typing the web address in to browser, it will send the request to the server through IP address and the catch is hostname of the website will be sent together as part of the request too. Therefore with IP address and the hostname the browser can determine which website you want to access. Shared hosting definitely has its strength and weakness.

Strength: shared hosting has very strong stand in pricing. As many people are sharing the same physical server the cost has been distributed evenly. The monthly subscription fee can be as low as $4.95/month and this price is easily available everywhere. This definitely fulfills the cost effectiveness of a website. Although shared hosting is cheaper than other type of hosting that does not mean it has low quality. Instead it is an ideal hosting for many webmasters. They build websites on shared hosting to run their business. Of cause the quality still rely on the proper management of hosting providers. As discussed in most of the articles, the reputation of a hosting provider is very important in this case. Thus I always go with big hosting companies like Hostgator, Lunarpages and Bluehost. Their organized management is powerful enough to maintain the quality of their shared hosting.

Weaknesses: users share the server among themselves. Sharing is still sharing. When there is irresponsible user in the sharing group, the way the fellow uses the hosting will jeopardize other accounts too. For example if a website that has very big traffic volume is hosted, the access of other website in the same server might be affected because that particular website has drawn most of the resources of the server. Although you do not know which website or account that is causing the problem, the negative effect is still visible. Therefore in this case a good hosting company will kick in and warn the users on the usage. Shared hosting has no dedicated IP for your website too. Therefore if the Domain Name Server is down (your domain name is not functional due to whatever reasons) you will not be able to access your website using just IP address like http://127.0.0.1 because the server will confused not knowing which website to show up. Also it is not possible to have dedicated SSL to your website because SSL works on IP. Since your IP is shared, the same SSL scripting will be shared among other subscribers too. However SSL sharing does not mean it will degrade the security of SSL but instead you have less control on the script.

Despite the weaknesses of shared hosting, it remains the most common hosting among webmasters. I believe this is mostly because the economic price of shared hosting. It makes web hosting affordable and is able to host decent websites. In fact shared hosting has not much difference from a dedicated hosting (dedicated hosting uses dedicated server to host your website with dedicated IP) except for the dedicated IP, dedicated server and an expensive price (could be $100+/month). So in order to host a website or a blog or even a business website, if you are not targeting a website as big as Wikipedia or YouTube, shared hosting is recommended.

____________________________________________

Older Post: