View 23380A_1455916.PDF datasheet online --- IC-ON-LINE

Datasheet File OCR Text:

publication# 23380 rev: a amendment/ 0 issue date: april 2000 ? copyright 2000 advanced micro devices, inc. all rights reserved. sdram performance monitors with the lan?sc520 microcontroller application note by daniel mann and james magro this application note describes the concept, design considerations, functions, and use of the sdram performance monitors with the lan?sc520 microcontroller. overview the lan?sc520 microcontroller includes two non- intrusive adaptive digital element (addie) performance monitor resources for monitoring a variety of sdram controller parameters. features of the performance monitors include: n non-intrusive (hardware) n probability resolution of 0.39% n less than 4% error n monitoring of the following sdram controller pa- rameters: C write buffer hits (implying merge or collapse) C read merge hits (from the write buffer) C write buffer full occurrence C read buffer hits C sdram page and bank misses n two addie resources that can be configured to concurrently monitor any two sdram controller parameters registers the addie performance monitors are controlled by the performance monitor control registers, as shown in ta b l e 1 . refer to appendix a, sdram performance monitor registers for more information about the performance monitor registers. table 1. performance monitor control registers (memory mapped) register name mmcr offset address function performance monitor control 44h selects parameter to monitor for both addie 0 and addie 1. performance monitor data 0 48h returns addie 0 performance data. performance monitor data 1 49h returns addie 1 performance data.
2 sdram performance monitors with the lan?sc520 microcontroller application note block diagram the addie performance monitors are integrated into the sdram controllers subsystem as shown in figure 1. the performance steering logic allows the appropriate signals to be routed from the various com- ponents of the sdram controllers sub-system based on the users configuration. each addie is capable of functioning independently. the performance monitor is implemented with two addie devices with multiplexor circuitry. the two addie devices select the parameter to be monitored with a configuration register. figure 1. block diagram of sdram subsystem with addie performance monitors addie performance monitoring parameters the performance monitors are capable of indepen- dently monitoring the following parameters: n write buffer hits (implying merge or collapse) n read merge hits (from the write buffer) n write buffer full occurrence n read buffer hits n sdram page and bank misses each parameter is explained below in detail. write buffer hit monitoring the lansc520 microcontrollers sdram controller contains a 32-rank write buffer with each rank providing four bytes of write data storage. combined, these ranks provide up to eight cache lines of write data storage for transfers initiated by either the am5 x 86? cpu, pci host bridge (on behalf of a pci master for writing data to sdram), or gp bus dma. a write transfer initiated by either the am5 x 86 cpu or host bridge master can be a single 32-bit dword or a burst of up to four dwords during a single tenure. each dword write transaction can be either four bytes or less. a dword write transfer of three bytes or less is referred to as a partial dword. the am5 x 86 cpu does not burst par- tial dwords and only bursts write data during a cache copy-back or write-back. however, the pci host bridge can burst transfer with all complete dwords, all par- tial dwords, or a combination of both. gp bus dma write transfers are never larger than two bytes. the lansc520 microcontrollers write buffer supports standard fifo buffering and also supports a write data merge and collapse feature. write merging occurs when a sequence of individual writes are merged into a single dword. :ulwh%xiihu wb_hit wb_merge wb_full rb_hit bank_page_hit 5hdg%xiihu 6'5$0&rqwuroohu 3huirupdqfh6whhulqj $'',( $'',( 3huirupdqfh0rqlwru &rqiljxudwlrq 6'5$0&rqwuroohu6xe6\vwhp
sdram performance monitors with the lan?sc520 microcontroller application note 3 merging implies that the same byte location is not writ- ten more than once. for example, four individual byte transfers from address 0, 1, 2, and 3 are merged in the write buffer to form a single dword, thus converting four independent byte transfers into a single dword transfer to sdram. collapsing is similar to merging with the exception that the same byte locations can be written more than once. for example, an am5 x 86 cpu cache snoop, that is a result of a pci host bridge burst of four dword transfer to sdram, first requires a cache line write- back (as a result of a hit in the am5 x 86 cpus write- back cache). first, the cache line write-back data is written to the write buffer, followed by the pci host bridge transfer to the same four dword addresses. because the write buffer supports collapsing, the cache line that was written to the write buffer by the cpu due to a write-back is over-written with the write data from the pci host bridge transfer, thus collapsing on the caches write-back data. instead of eight dword transfers to sdram, the collapse feature of the write buffer only requires four dword transfers to sdram. the write buffer is intended to de-couple the masters write traffic from incurring the overhead associated with sdram refresh cycles and page/bank misses. in addition to possibly reducing the total number of write transactions to sdram, the merge/collapse feature helps to assemble independent partial dword transfers into complete dwords, so the overhead associated with error correcting code (ecc) read- modify-write cycles can be reduced. read-modify-write transfers are required for ecc support when a partial dword write occurs to sdram. complete dword writes do not require a read-modify-write function. to provide the merge and collapse features, the write buffer incorporates a content-addressable memory (cam). the cam performs a look-up of dword addresses that currently exist within the write buffer with an address presented by the am5 x 86 cpu, pci host bridge, or gp bus dma that requests the write transfer. during a cpu or pci host bridge burst transfer, each dword address during the transfer is searched within the cam. either of the two performance monitor resources can be configured to provide a hit average of the number of write buffer hits that occurred during either an am5 x 86 cpu, pci host bridge, or gp bus dma write transfer. a hit implies that a merge or collapse of write data occurred. each dword write transfer to the write buffer, either complete or partial, is monitored independently of each transfer being a single dword or a burst of two, three, or four dwords. a ratio of write buffer hit/miss is provided by the performance monitors. write buffer hit monitoring analysis the write buffer provides a write data merge and col- lapse feature where a write of data to a dword that currently exists in the write buffer can be merged or col- lapsed into the same location in the buffer. this feature is used to reduce the overall number of consecutive write transactions to the same location in sdram. this feature also attempts to merge partial dword write transfers into complete dwords in the effort to reduce the number of ecc read-modify-write cycles to sdram. either performance monitor can be used to provide a write buffer hit average. this information can be used to specify an optimal watermark setting for the write buffer. a higher watermark setting allows more write data to accumulate in the write buffer before write- backs are initiated to sdram. this provides a greater chance for data merging and collapsing to take place and lessens the occurrences of writes from the write buffer interrupting the read accesses. a lower watermark setting causes the write buffer to request sdram accesses with fewer dwords in the buffer. this causes the write buffer to possibly interfere with read accesses, but lessens the occurrence of overflowing the write buffer during complete dword transfers when write data merging and collapsing is less likely to occur. a lower hit average with a high watermark setting might imply that most write data is a complete dword, thus merging and collapsing is not occurring as often. in this condition, a lower watermark setting is recommended to prevent write buffer full occurrence. a higher hit average with a low watermark setting might imply that a high occurrence of write data merging or collapsing exists. in this condition, a higher watermark setting is recommended to allow the write buffer to absorb the write activity, without interrupting read accesses. write buffer read merge monitoring as mentioned in write buffer hit monitoring analysis, the write buffer merges and collapses data. because the write buffers cam makes the merge and collapse features during write transfers possible and only a single entry will exist for any given dword address that currently exists in the write buffer, data read merging from the write buffer is possible. this implies that a read request by either the am5 x 86 cpu, pci host bridge, or gp bus dma to an address that currently exists in the write buffer does not require the write buffer to first be flushed to sdram to maintain data coherency prior to the read request being satisfied. standard first-in first-out (fifo) buffering techniques without a cam function require that the buffer be flushed prior to every read because the data, in the buffer that pertains to the data being requested during the read, is not known. instead of the possible
4 sdram performance monitors with the lan?sc520 microcontroller application note large overhead associated with a write buffer flush prior to every read request, the write buffer performs a read merge data from the write buffers contents as data is returned from sdram. read merge occurs when a read cycle hits a dirty dword that currently exists in the write buffer and the read data returned from sdram is replaced, or merged with existing bytes from the write buffer. read merging from the write buffer occurs at the byte resolution. for example, an am5 x 86 cpu cache snoop that is a result of a pci host bridge read request requires a cache line write-back that is a result of a hit in the am5 x 86 cpus write-back cache. first, the cache line write-back data is written to the write buffer, followed by the pci host bridge read request to sdram, possibly to more than one of the same four dword addresses written during the cache line write-back. the cache line write-back data is written to the write buffer, but the read request from the pci host bridge can be satisfied by the sdram controller prior to the cache line write data being actually written to sdram. because the write buffer supports read merging, the sdram data is replaced by the more recent data from the write buffer that was just written during the cache line write-back. this feature enables the sdram controller to service the demand of read requests overwrite cycles, without the penalty of flushing the write buffer first. either of the two performance monitor resources can be configured to provide a read merge average of the number of dword read transfers during either an am5 x 86 cpu, pci host bridge, or gp bus dma read request that results in a read merge from the write buffer. a read merge implies that at least one of the four bytes within a dword resulted in a read merge during each dword of a read request. each dword of a read transfer is monitored independently of each read transfer being a single dword, or a burst of two, three, or four dwords. a ratio of write buffer read-merge/no- read-merge is provided by the performance monitors. write buffer read merge monitoring analysis conventional write buffer architectures do not provide the read merge function, and require the entire buffer to be flushed to sdram if a read access occurs to any data that currently exists in the buffer. when this occurs, the read access incurs the overhead associated with the write-back of all buffer contents to sdram instead of just the data that is needed to maintain data coherency. however, because the lansc520 microcontrollers write buffer provides merging and collapsing, this forced flush on the occurrence of a read access that currently exists in the write buffer is not necessary. the read access continues around the associated write data, and the more current write data is merged in from the write buffer as the read data is being returned to the requesting master from sdram. either performance monitor can be configured to pro- vide a read merge average. this average can be used to determine if the read-around-write feature of the write buffer provides performance advantages. write buffer full monitoring the write buffer is intended to de-couple the masters write traffic from incurring the overhead associated with sdram refresh cycles, page bank misses, and ecc read-modify-write cycles as a result of incomplete dword writes. when the write buffer is not full, write data is posted in zero wait states. however, due to the various delays mentioned above, data can be posted to the write buffer faster than data is written to sdram. this can result in a full write buffer such that no more data can be posted until data is written from the write buffer to sdram. the write buffers watermark setting can play a role in how often the write buffer becomes full. a higher water- mark setting causes data to sit in the write buffer longer, possibly enabling a write data merge or col- lapse to occur, but delays requesting the sdram for service for a write-back of write-buffered data to sdram. a lower watermark setting requests sdram service when less data is stacked into the write buffer. either of the two performance monitor resources can be configured to provide a write buffer full average of the number of write attempts by either an am5 x 86 cpu, pci host bridge, or gp bus dma write request that experienced the write buffer in a full state. a write buffer full state implies that an attempt to write a dword into the write buffer was stalled. each dword of a write transfer is monitored independently of each write transfer being a single dword, or a burst of two, three, or four dwords. a ratio of write buffer full/not-full is provided by the performance monitors. write buffer full monitoring analysis the write buffer provides 32 dwords of storage for master write accesses to sdram. it is intended to absorb the overhead associated with page and bank misses during write cycles and ecc read-modify-write cycles. the write buffer experiences a full condition when master accesses write data into the write buffer at a faster rate than data can be removed from the write buffer into sdram. when full, master accesses will experience delays associated with write accesses to sdram. either performance monitor can be used to monitor the write buffer full average, which can be used to determine the best write buffer watermark setting and also provide information regarding write overhead associated with page and bank misses, and ecc read- modify-write. the write buffers watermark settings are
sdram performance monitors with the lan?sc520 microcontroller application note 5 provided for the user to throttle the write request interruptions to sdram. because read-around-write accesses are provided when the write buffer is enabled, the watermark setting is provided to specify how full the write buffer can become before requesting service to write data back to sdram. the write buffer requests sdram access when the configured watermark is reached. the write buffer provides four watermark settings. a higher watermark can be used to delay write accesses from interfering with read accesses by stalling write activity to sdram, which also provides a greater chance of data write merging and collapsing where a large amount of partial dword writes are expected. a low watermark setting can be used when the expected result is that most writes to sdram are complete dwords, thus a reduced chance of write buffer merging and collapsing and the ecc read-modify-write cycles are not necessary . the user should configure the write buffers watermark setting to a position that yields the lowest write buffer full average. read buffer hit monitoring the lansc520 microcontrollers sdram controller contains eight 4-byte read data buffers. when combined, these buffers comprise the read buffer and are designed to hold two cache lines of data that are returned from sdram for transfers initiated by either the am5 x 86 cpu, pci host bridge (on behalf of a pci master), or gp bus dma. during any read request to sdram, an entire cache line is always read into the read buffer, independent of each read transfer being a single dword, or a burst of two, three, or four dwords. to maintain data coherency, any write transfer to sdram on behalf of any master or the write buffer that matches a cache line in the read buffer, results in both cache lines of the read buffer being invalidated. when the read prefetch feature is enabled and the read request is a burst type (i.e., two or more dwords), the entire cache line of the request and the cache line following the requested cache line is also fetched from sdram. the cache line following a single dword read request is never prefetched; however, the remainder of the requested dwords cache line is stored in the read buffer. for example, if the am5 x 86 cpu or pci host bridge requests a single dword of data from sdram, the sdram controller fetches the entire cache line of data on behalf of this single dword read request. the requested dword is forwarded to the requesting master, with the requested dword and remaining dwords of that cache line being stored in the read buffer. any future accesses to the same cache line within the read buffer result in read buffer hits. either of the two performance monitor resources can be configured to provide a read buffer hit average of the number of dword read transfers during either an am5 x 86 cpu, pci host bridge, or gp bus dma read request that results in a hit to the read buffer. a read buffer hit implies that at least one of the four bytes within a dword resulted in a read buffer hit. the performance monitor counts read buffer hits on the basis of an atomic read request during the same bus tenure, independent of burst length (i.e., complete cycle, regardless of the amount of read data requested during the same burst tenure). therefore, each read request is monitored, instead of each dword transferred, during that read request tenure. this occurs because the read buffer always stores an entire cache line of read data from sdram, independent of the number of dwords requested during the read request. a read request of two, three, or four dwords that hit within the read buffer are counted by the performance monitors as only one hit to the read buffer because the read buffer always maintains an entire cache line of data, and a hit of one dword during a read burst request is guaranteed to hit the remaining data in that cache line during that same bus tenure. this occurs instead of unfairly counting four read buffer hits during a burst of four dwords requested because the entire cache line would result in a hit. four independent read requests of one dword each result in four independent read buffer hits by the performance monitor because each read transfer is an individual read request. a ratio of read buffer hit/miss is provided by the performance monitors. read buffer hit monitoring analysis the lansc520 microcontrollers sdram controller maintains two cache lines of read data within the read buffers. these buffers are always enabled. for any read request, an entire cache line of data is stored in the read buffer. when the sdram controllers read prefetch feature is enabled and the read request is a burst type (i.e., two or more dwords), the entire cache line that contains the requested data and the cache line following the requested cache line are also fetched from sdram. the read prefetch feature is intended to accelerate sdram read accesses if read requests are sequential, so that data is supplied to the master while the next cache line from sdram is concurrently prefetched, relying on sequential accesses. the read buffers prefetch feature relies on long tenure of a single requesting master, which can be affected by the enable state of the am5 x 86 cpus cache, program flow, mastership changes, etc. however, like all buffering techniques that speculatively prefetch, the anticipated prefetched line might not be used, possibly resulting in overhead associated with the unused prefetch. either performance monitor can be used to provide a read buffer hit average. this information can be used to provide feedback to determine if the read prefetch
6 sdram performance monitors with the lan?sc520 microcontroller application note feature of the read buffer should be enabled. a high read buffer hit average when read prefetching is enabled implies that the read prefetch data is being used frequently. a low read buffer hit average with the read prefetch feature enabled can imply that the read prefetched data is not being utilized; thus, disabling the read prefetch feature can be desirable. sdram page and bank miss monitoring sdram devices support either two or four internal banks. the page width of the internal banks are defined by the devices symmetry. the lansc520 microcontrollers sdram controller supports sdram devices with 8-, 9-, 10-, or 11-bit column address widths resulting in either 1-, 2-, 4-, or 8-kb page width for the lansc520 microcontrollers 32-bit data bus width. overhead is associated with opening an internal bank. therefore, the more often an open internal bank is utilized, the greater the overall performance. an internal banks page is left open after each access. a page miss occurs when a master access is made to an internal bank page that is not open within that same internal bank. the penalty incurred results in a delay associated with closing the currently open page and opening the requested page of the requested internal bank. a bank miss occurs when a master accesses an internal bank where no current open pages exist; for example, after a refresh cycle. the penalty incurred results in a delay associated with opening the requested page. either of the two performance monitor resources can be configured to provide a page and bank miss average of the number of read, write, or write buffer transfers resulting in a page bank miss to sdram. the performance monitor scores sdram page and bank accesses on the basis of an atomic request, during the same bus tenure, independent of burst length (i.e., complete cycle regardless of the amount of data requested during the same burst tenure.) therefore, each request is monitored, instead of each dword transferred, during that requested cycles tenure. this occurs because after an access, the page within each bank of the sdram devices remains open, independent of the number of dwords requested during the cycle request. an am5 x 86 cpu, pci host bridge, or gp bus dma request of two, three, or four dwords that miss within an open page are counted by the performance monitors as only one miss to the page because the remaining dwords during a burst request are guaranteed to result in a page hit during that same bus tenure. thus, the first access is counted independently of the amount of data transferred in that single cycles bus tenure. this occurs instead of unfairly counting one page bank miss and three following page bank hits during a burst of four dwords because the remaining three transfers always result in a page hit. four independent read or write requests of one dword each result in four independent hit/misses by the performance monitor because each read or write transfer is an individual request. the write buffer always writes single dwords; therefore, each write buffer write is counted independently. this occurs because the write-buffer write-backs are a single dword and provide a read-around-write function. for example, one dword of an entire cache line written into the write buffer is written into sdram while the remaining dwords of the same cache line remain in the write buffer. if a higher priority read request demands access to sdram over the write buffers access, the dynamics of the page bank relationship are changed, resulting in a different page bank miss occurrence over the same scenario where a write burst occurs, but with the write buffer disabled. a ratio of page-bank-miss/page-bank-hit is provided by the performance monitors. sdram page and bank miss analysis the overall performance of sdram is directly impacted by the overhead associated with a page or bank miss. the more often an access occurs to an open page (spatially local to that page), the faster data is returned to the requesting master during a read access or written to sdram during a master write access. master accesses that are sequential will hit within an open sdram page and yield higher sdram access performance than master accesses that result in heavy thrashing of the pages. many system parameters and configurations contribute to the dynamics of sdram page and bank misses. some of these are program flow, am5 x 86 cpu cache enable, am5 x 86 cpu cache write-through vs. write- back, number of gp bus dma channels active, and number of pci masters and their burst size. for example, read accesses initiated by the am5 x 86 cpus prefetcher are typically sequential until the program flow changes as a result of a program branch, and can utilize an open sdram page more frequently. am5 x 86 cpu write accesses tend to be directly program- dependent and not predictable and can result in sdram page thrashing. these dynamics change when the am5 x 86 cpus cache is enabled and when the dynamics are in write-though or write-back mode. pci read transfers are linear, and those that request a large burst utilize an open page. even though the dynamics associated with program flow and master accesses heavily dictate the page and bank miss rates, the user has control over some sdram parameters that can lessen the impact associated with system dynamics. these dynamics are as follows: n adjustable page widths by selecting devices with either 8-, 9-, 10-, or 11-bit column addresses
sdram performance monitors with the lan?sc520 microcontroller application note 7 n adjustable count of internal banks supported within the sdram devices, either two or four internal banks n adjustable refresh rates n adjustable write buffer watermarks when the write buffer is enabled either of the two performance monitor resources can be configured to provide a page and bank miss average. this information can be used to provide feedback on which sdram controller configuration or sdram device architecture results in the best overall sdram page performance. because the sdram controller supports sdram device architectures yielding a page width from 1 kb to 8 kb, performance monitoring can determine that sdram devices with a larger page organization yield better page performance than a device with a smaller page size. the lansc520 microcontroller provides page size selection via the sdram controllers configuration registers. however, the sdram device dictates the page size, and the sdram controller must be configured to exactly match that devises page size. the number of internal banks supported can also affect the overall page performance of the device. since sdram devices support either two or four bank archi- tectures, the performance monitor can be used to determine which sdram internal bank architecture yields better page performance. an sdram refresh cycle closes all pages of all banks within the sdram devices. as a result, any access to a bank after a refresh occurs will result in a bank miss, thus incurring the associated overhead penalty. therefore, the refresh rate directly impacts system sdram performance. a faster refresh rate closes the sdram pages more often than a slower rate. the refresh rate is dictated by the sdram device and is a function of the number of rows contained in the device. some devices may allow for a slower refresh rate because the column width is wide. the performance monitor can be used to select the sdram controllers refresh rate to determine the effects of the refresh cycle on page performance. keep in mind that a device can be over-refreshed, but never under-refreshed without the risk of loosing data integrity. when the write buffer is enabled, read accesses have priority over writes, thus providing a mechanism for satisfying read requests faster. the write buffer and the associated watermark setting can change the sdram page dynamics over the current condition and setting if the write buffer is enabled and strict access ordering to the sdram is preserved. this implies that posted writes that influence the page state of the devices have not yet occurred to the sdram. thus, a subsequent read access can utilize a page that is currently open instead of possibly experiencing a closed page due to the write access, if the write buffer is not enabled. the write buffers watermark setting has a direct impact on when data is written to sdram. a low watermark setting causes the write buffer to request a write-back earlier than a higher watermark setting. the performance monitor can be used to monitor the page performance based on the write buffers enable state or watermark settings. software considerations a software handler is required to configure the perfor- mance monitors, read the monitors performance data, and calculate the average percentage of the occur- rence of the parameters being monitored. the addie devices provide an 8-bit value that represents the over- all average. this hexadecimal value must be divided by 255 (which represents the maximum value) to deter- mine the percentages. reset the performance monitors are reset during a power-on system reset event or programmable reset event. as a result of these system reset events, all monitoring infor- mation is lost. however, the configuration register that selects the parameter to be monitored is maintained through a programmable reset. the performance mon- itors do not support a manual reset feature. initialization both performance monitors are disabled with a system reset. they are enabled by accessing a configuration register that selects the parameter that is to be monitored. when the parameter being monitored is altered by accessing a configuration register, some time is necessary for the addie to begin to track the newly configured parameter. disabling the performance monitor causes the monitor status to freeze and retain its value prior to being disabled. non-intrusive performance monitoring most desktop processors are equipped with perfor- mance monitoring counters. these counters permit processor performance parameters to be monitored and measured, which is useful for performance tuning. current techniques typically utilize two counters, which simultaneously record the occurrence of pre-specified events. when one of the counters overflows, counting stops and an interrupt is generated. post-processing software is used to analyze the gathered data. this section describes a new technique for gathering and analyzing performance data with a microprocessor or microcontroller. the technique avoids the limitations imposed by fixed-size counters which eventually over- flow. this method is less intrusive and more suitable for monitoring a wide range of performance parameters.
8 sdram performance monitors with the lan?sc520 microcontroller application note performance monitoring counters typically two large counters (i.e., 40 bits or more) are provided for event counting. these counters can be read and written from within the register address space. the counters can be configured to measure parameters such as the number of data reads that hit in the cache. in this case, the second counter can be programmed to record the number of actual data reads performed. the ratio of these two numbers gives the cache hit rate for reads. when one of the counters reaches its limit, the overflow signal can be used to stop all counting and generate an interrupt. the software interrupt handler then records the counter values and completes post data processing and any other support work necessary. the size of the counters is important. the larger the counter, the less frequently an interrupt is generated. such interrupts are undesirable as they intrude into normal processor operation. a larger counter also results in greater data averaging. any temporary fluctuation in cache hit rate is not observed, and this might be required. before performance monitoring can be accomplished, an interrupt handler must be installed to process the counter overflow. of course, overflow can be avoided by the use of extremely large counters. however, such a technique can be expensive to implement, unreliable, or fail to produce the desired statistical analysis. data integration the type of performance data being measured is random in nature, such as the cache-hit rate or the number of cycles to read memory. these parameters vary during program execution. measuring random data enables a precise value to be generated for the data examined. these precise values appear as averages, such as the average cache-hit rate. measured performance parameters are a good esti- mate of future performance. actual performance at any instant may vary widely from the measured estimate. the typical use of two large counters does not measure this deviation. for example, at each memory access, an on-chip cache can successfully provide the required data. the sequence of hit and miss data can be represented by a simple 1 or 0 bit stream, the probability of a 1 being the probability of a hit occurring. a stochastic addie can be used to integrate the probability stream and deter- mine the relevant probability. figure 2 shows a possible addie configuration. a counter is compared with a random number; if the counter is bigger than the random number, a 1 is gen- erated. large counter values are more likely to produce a 1 output from the comparator than small counter values. figure 2. block diagram of the addie the counter output is compared with the 1/0 data stream typethe cache-hit information. these two stochastic data streams are compared to determine which one has the highest probability of being 1. determining this probability appears difficult; however, only an xor gate is required. when the data streams random number comparator counter and xor clock 1 if count > or = random number input data stream 0 1 0 1 comparator 0 1 0 1 input data 0 1 0 1 xor output down up count action
sdram performance monitors with the lan?sc520 microcontroller application note 9 differ, a difference in probability exists. this information is fed back into the data stream to increase or decrease the counter value. consequently, with each new comparison, the counter is adjusted to produce a probability stream (from the comparator) which matches the input data stream. the addie device effectively integrates the probability stream. the result is that the probability stream is con- verted into a digital value that is held in the counter, which represents the probability of the parameter being measured. this method of measuring probability is very different from the dual-counter method. no potential overflow exists, and an overflow interrupt handler is not required. the counter can be read at any time to give a measure of the current probability. 8-bit addie as the number of bits used by the counter and random number generator increase, the probability resolution is improved. for example, an 8-bit counter provides for a probability resolution of 0.39% (1/255). however, increasing the resolution slows down the integration process. this results in a greater number of samples being required before a good estimate of probability can be obtained. figure 3 shows an 8-bit addie used to measure a probability of 25%. figure 3. 8-bit addie measuring 25% (64/255) 160 80 240 320 480 400 560 640 720 0 0 30 90 60 150 210 255 180 800 120 measured value input test data 240 100% probability: 100% = 255/255 data samples
10 sdram performance monitors with the lan?sc520 microcontroller application note the addie starts with an initial value of 50%. as the input data is sampled, the addie's counter value moves toward the expected value. after about 500 samples, the addie closely tracks the input data stream. this indicates that after as little as 500 sam- ples, the counter is read to produce an estimate of the performance parameter being measured. the data shown in figure 3 indicates the average of over 250 inputs and addie streams. keeping this ratio small helps one to observe temporary fluctuations in both the generated test data and addie response. figure 4 shows the 8-bit addie used to measure a 78% probability. the 8-bit counter was initially at a value of 128, which represents a probability of 50%. after about 500 samples, the measuring instrument converges on the desired value. figure 4. 8-bit addie measuring 78% (200/255) 200 100 300 400 600 500 700 800 0 0 30 90 60 150 210 255 180 900 120 measured value input test data 240 100% probability: 100% = 255/255 data samples
sdram performance monitors with the lan?sc520 microcontroller application note 11 12-bit addie the time taken by the addie's counter to move from its current position to the desired value is dependent on the number of bits used by the counter. an 8-bit counter takes only 256 clock pulses to increment through its full range. a 12-bit counter takes 4096 clock pulses to cover its complete range. the extra 4 bits of probability resolution requires 16 times more clock pulses to scan its complete counting range. this condition directly relates to the number of samples required to measure a performance parameter. greater resolution requires a greater number of samples to achieve a measurement with similar results. figure 5 shows a 12-bit addie used to measure a 78% probability. the counter was initiated with a 50% probability. the results show that about 10,000 samples where required before the input and measured probability streams converge. the larger addie offers greater measurement resolu- tion. however, for measuring on-chip performance parameters, an 8-bit counter appears adequate. the smaller addie provides the benefit of more quickly converging on the required value and better tracking local fluctuations of the input data stream. figure 5. 12-bit addie measuring 78% (3200/4095) 4,000 2,000 6,000 8,000 12,000 10,000 14,000 0 0 400 1200 800 2000 2800 3600 2400 1600 measured value 3200 100% probability: 100% = 4095/4095 data samples input test data
12 sdram performance monitors with the lan?sc520 microcontroller application note parameter values as probabilities the performance parameters described above are easy to represent by a single bit, for example, cache-hit information. however, other parameters need more than one bit, such as the number of cycles needed to access external memory or the number of clock pulses while the pipeline is stalled. these parameters must be converted into a pulse stream consisting of several bits. measuring performance parameters is not a function of the lansc520 microcontroller. however, the example shown in figure 6 can be used to measure a perfor- mance parameter that ranges between the value 0 and 4 and requires four bits. figure 6. performance parameters for example, if a parameter is measured to be three cycles, then three bits of the 4-bit pulse stream must be set. this pulse stream is serially clocked into the addie. there is no synchronous requirement for presenting data to the addie. new parameter data can be sampled at any time and clocked into the addie convertor. different parameters have differing value ranges. the example in figure 6 has a value range of 0 to 4; another parameter might have a value range 0 to 16. all parameters are not required to be restricted to the same data range. adjusting the number of data bits applied to each parameters data range efficiently utilizes the addie's restricted probability resolution. an 8-bit addie can be used to measure a parameter with the value range of 0 to 4, then later used to measure a parameter with the data range of 0 to 16. in all cases, the addie determines the probability of the data stream being 1. for a value range of 0 to 4, a probability value of 75% represents a measured value of 3 (100% being 4). for example, when generating a sample value 3, any three bits can be set. the order is not important. how- ever, dispersing the setting of the bits can have a small advantage in reducing the variance in counter operation. random number generation the addie operation requires a random number gen- erator. a pseudo-random number generator is ideal for this task. a maximum length (m-sequence) can be pro- duced by feeding back selected stages of an n-stage shift register. the required stages are modulo-2 com- bined and used to produce the input signal for the first stage. for an example of a random number generator, refer to figure 7 on page 13. the addie can be prompted to converge on the required value by using random numbers that correlate. this condition is achieved by selecting consecutive stages of the shift register and inverting the most significant bit (msb). hence, the msb of the current random number is inverted and becomes the bit occurring just before the msb of the next random number. the test data shown in figure 7 on page 13 uses a 31-bit shift register with feedback taken from stages 3 and 31. the top eight or 12 bits were used to form the required random number. a smaller (less than 31-bit) shift register can also be used. 0 1 0 0 0 1 0 0 0 0 0 1 1 1 1 1 1 2 3 4 parameter values four bits
sdram performance monitors with the lan?sc520 microcontroller application note 13 figure 7. random number generator sharing an addie an addie is not required to be allocated to each parameter being measured. a more practical system uses a single addie that feeds a bit stream via an input multiplexer. this enables a single parameter to be selected for measurement shown in figure 8. figure 8. system block diagram the register-controlling parameter selection and the addie counter should be accessible by the processor; they might be mapped into a register, i/o, or memory address space. additionally, these registers should be accessible from the on-chip software development port (sdp). via the sdp, a host computer can connect with a target system and unobtrusively examine perfor- mance parameters. an instrument is not required for the target system's application or operating system software, and an interrupt handler is not required. after setting the select register, an 8-bit addie requires that about 500 or more samples be gathered before the addie is assumed to have tracked the new input data stream. to assist with this, a sample counter is used and is reset when the register selecting parameter input is updated. the sample counter is examined via the sdp before accessing the addie counter. least significant bit shift clock not xor 1 3 31 shift direction input sample counter addie multiplexor on-chip software development port (sdp) select multiple performance parameters
14 sdram performance monitors with the lan?sc520 microcontroller application note
sdram performance monitors with the lan?sc520 microcontroller application note a-1 appendix a sdram performance monitor registers the synchronous dram controller, which includes the write buffer and read buffer, contains two performance monitor resources. these performance monitors are designed using an adaptive digital element (addie). any of the following addie monitors can be used to monitor the following parameters: sdram page misses, write buffer hits, write buffer full, write buffer read merge hits and read buffer hits. table 2 shows the performance monitor registers. table 2. performance monitor registers with mmcr offsets register name mmcr offset performance monitor control 44h performance monitor data 0 48h performance monitor data 1 49h
a-2 sdram performance monitors with the lan?sc520 microcontroller application note performance monitor control register mmcr offset 44h memory-mapped (preserved on programmable reset) register description this register selects the source to monitor for each of the two addie machines. bit definitions 7 6 5 4 3 2 1 0 bit reserved addie1_sel reserved addie0_sel reset 00000000 r/w rsv r/w rsv r/w bit name function 7 reserved reserved this field should be written to a 0 for normal system operation. 6C4 addie1_sel addie 1 selector these bits select the source to monitor using addie 1 000 = addie 1 disabled 001 = sdram page-bank misses 010 = write buffer hits 011 = write buffer read merge hits 100 = write buffer full 101 = read buffer hits 110 = reserved 111 = reserved 3 reserved reserved this field should be written to a 0 for normal system operation. 2C0 addie0_sel addie 0 selector these bits select the source to monitor using addie 0 000 = addie 0 disabled 001 = sdram page-bank misses 010 = write buffer hits 011 = write buffer read merge hits 100 = write buffer full 101 = read buffer hits 110 = reserved 111 = reserved
sdram performance monitors with the lan?sc520 microcontroller application note a-3 performance monitor data 0 register mmcr offset 48h memory-mapped ( not preserved on programmable reset) register description this register returns the performance monitor data for addie machines 0. bit definitions 7 6 5 4 3 2 1 0 bit addie0_data[7:0] reset10000000 r/w r bit name function 7C0 addie0_data addie 0 data these bits return the addie 0 performance data. this hexadecimal value represents the monitor value. to determine percentages, divide by 255 and multiply by 100. the resulting percentage represents the hit or miss percentage of the parameter being monitored.
a-4 sdram performance monitors with the lan?sc520 microcontroller application note performance monitor data 1 register mmcr offset 49h memory-mapped ( not preserved on programmable reset) register description this register returns the performance monitor data for addie machines 1. bit definitions trademarks amd, the amd logo, and combinations thereof, and lan are trademarks of advanced micro devices, inc. product names used in this publication are for identification purposes only and may be trademarks of their respective companies . disclaimer the contents of this document are provided in connection with advanced micro devices, inc. ("amd") products. amd makes no repre sentations or warranties with respect to the accuracy or completeness of the contents of this publication and reserves the right to make c hanges to specifi- cations and product descriptions at any time without notice. no license, whether express, implied, arising by estoppel or other wise, to any intel- lectual property rights is granted by this publication. except as set forth in amd's standard terms and conditions of sale, amd assumes no liability whatsoever, and disclaims any express or implied warranty, relating to its products including, but not limited to, th e implied warranty of merchantability, fitness for a particular purpose, or infringement of any intellectual property right. amd's products are not designed, intended, authorized or warranted for use as components in systems intended for surgical impla nt into the body, or in other applications intended to support or sustain life, or in any other application in which the failure of amd's p roduct could create a situation where personal injury, death, or severe property or environmental damage may occur. amd reserves the right to discont inue or make changes to its products at any time without notice. 7 6 5 4 3 2 1 0 bit addie1_data [7:0] reset10000 000 r/w r bit name function 7C0 addie1_data addie 1 data these bits return the addie 1 performance data. this hexadecimal value represents the monitor value. to determine percentages, divide by 255 and multiply by 100. the resulting percentage represents the hit or miss percentage of the parameter being monitored.

▲Up To Search▲

Price & Availability of 23380A

	To Download 23380A Datasheet File
If you can't view the Datasheet, Please click here to try to view without PDF Reader .