"An MMA Lag Correlator Design" R. Escoffier November 6, 1995 # NRAO MILLIMETER ARRAY MEMO SERIES NO. 146 # National Radio Astronomy Observatory Charlottesville, Virginia ### A MMA LAG CORRELATOR DESIGN R. Escoffier November 6, 1995 This memo describes the design of a lag-type correlator for the MMA. The lag design approach used here is not meant to select it as the MMA standard but just to present a practical design to which future designs can be compared. The final decision on a MMA correlator architecture should be made later during the initial phases of an actual design project. The correlator design is based on the MMA specification resulting from the October 1995 MMA meeting in Tucson, seen below: - 40 antennas - 8 4-GHz samplers per antenna (16-GHz bandwidth/antenna) - 30 KM maximum baseline (to allow for future outrigger stations) 1024 lags per baseline at a 2-GHz bandwidth The design below does not distinguish between continuum and spectral line observations. The samplers and correlators can be configured in factors of 2 in performance from all 8 samplers working at maximum bandwidths to a single sample per antenna working at the narrowest bandwidth. A conservative design approach using 125 MHz interconnect technology is contemplated. Higher speed correlator chips and interconnections could be considered in the future to see if such an change would be cost effective but, for now, a well understood technology is assumed (it is essentially the technology used in the VLA correlator). The design is based on a hypothetical correlator chip which is based in turn on the 1024-lag correlator chip used in the GBT spectrometer. It represents a conservative projection of what should be possible and affordable when the MMA is funded. ### I) Block Diagram The block diagram of the proposed MMA correlator is seen in Figure 1. Eight 4-GHz samplers are available for each antenna. Variable digital delay lines with up to $320-\mu sec$ of delay adjustment are provided for each sampler. The output of each active sampler is used to fill large RAMs in the memory system shown in Figure 1. This RAM memory has a variety of functions regarding efficient use of the correlator chips in wide bandwidth operation and in mode versatility. The conventional technique in correlator design, where the correlator chips run at a lower clock rate than the samplers, uses a two-dimensional array of correlator chips. The output of each high speed sampler is split into n parallel data streams, each one nth the sample rate and each carrying every nth sample. To insure that every sample in the parallel output of one sampler is correlated with every sample in the parallel output of a second sampler, an nXn correlator chip array per baseline is used. In this design, however, RAM memory is proposed to reduce the IC requirements to a one-dimensional array of correlator chips per baseline. This approach is similar to that used in the GBT spectrometer. The memory is driven by the 32 wide parallel output of a sampler (where each output carries every 32nd sample) and re-arranges the sampler to produce 32 outputs where a given output carries short (1 msec) time segments of contiguous samples. A switching matrix is used to provide all of the mode versatility required. This includes reducing the number of active samplers from 8 to 4, 2 or 1 per antenna and/or discarding samples to reduce the effective sample rate of a given sampler. As the aggregate data rate of the samplers goes down, the switching system will re-route the samples from active samplers to the correlator chips to optimize the number of lags for the observation being conducted. The correlators seen in Figure 1 are formed into 40 by 40 arrays to correlate the outputs of the 40 station MMA. The total number of correlators required by this design would thus be 40 X 40 X 8 X 32 to accommodate the 8 samplers per antenna and the 32 125-MHz parallel outputs of each 4-GHz sampler. The lag length of each correlator will be determined by the spectral line specification of the array. Not shown in the block diagram of Figure 1 is the requirement for a long-term accumulator (LTA). The correlator chips themselves will provide short-term accumulation (from 1 to 16 msec). This relative long integration time provided by the correlator chips should allow the LTA to be made with high density (and inexpensive) dynamic RAMs. It is assumed that several integrations bins will be built into the LTA structure (for signal/reference/calibration, etc. and to support 90 deg phase switching for sideband separation). ### II) Samplers Eight 4-GHz samplers are available for each MMA antenna. By the time the MMA correlator is designed, it is assumed that there will be several approaches available for the design of this part of the correlator. Either 3-level or 4-level sampling could be contemplated with the correlator chip and the sampler itself being the only parts of the design significantly affected by this decision. Integral to a 4-GHz sampler would be a 1-to-32 serial-to-parallel conversion stage allowing the sampler to use a 125 MHz output rate (actually, two such stages, one for each sampler bit, are required). The output of the sampler system for one antenna would hence be 8 $\times$ 32 $\times$ 2 signals with a 125 MHz clock rate. A given signal line from a sampler would carry a bit from every 32nd sample. A 125-MHz signal will be distributed to each sampler for system timing and an internal 4-GHz VCO locked to this reference to provide the sample clock. The fractional bit part of the station delay will be provided by injecting a programmable DC offset into the phase lock loop phase detector. Delay resolution of 1/16 of the 4 GHz sample period, or finer, can thus be generated. ### III) Delay Lines There will be 524,288 bit of RAM associated with the output of each 4-GHz sampler, yielding a delay range of 131 $\mu$ sec. Inexpensive DRAM will be used in this stage where possible. Since RAM addressing can only adjust the delay in steps of 32 samples, some additional logic will be required to obtain the final delay resolution of 1 bit (in addition to the fractional bit delay provided in the sampler). ### IV) Memory and Switching Matrix The memory cards illustrated in the block diagram seen in Figure 1 will convert the 32-wide parallel sampler output (with each output carrying every 32nd sample) into 32 parallel outputs of a different format. The samples from the 32 sampler outputs will be written into a large memory in time order and read from the RAM as 32 parallel outputs each carrying a short time segment of contiguous samples. One way to think of the memory function is to consider it as if it were a circular array of 32 super fast 131,072-bit FIFO memories. A high-speed sampler writes samples into a given FIFO at its 4-GHz sample rate until the FIFO is full and then proceeds to the next FIFO around the 32 FIFO loop. On the FIFO output, each FIFO drives a correlator chip which it supplies with samples, originally taken at 4-GS/S, at the chip 125 MHz clock rate. Thus, each correlator chip will see a 1-msec time segment of contiguous samples, then a time discontinuity, followed by another segment of contiguous samples. Correlation must be blanked while the discontinuity passes through the chip leading to a small inefficiency. For full versatility, two memories are required for each 4-GHz sampler. Each axis of the 40 antenna X 40 antenna correlator array requires one memory card per antenna. The two axes of the correlator array are driven by the prompt and delayed memory card outputs in Figure 1 (the prompt signal represents the input that drives all correlators in a block and the delayed signal represents the signal that goes down the shift register of the correlator block). When a sample rate of less that 4 GHz is used, fewer that 32 inputs to the RAMs are required and the 32 outputs of the RAMs can be used to generate additional lags. RAM addressing in the "delayed memory" of Figure 1 can be used to generate these larger lags allowing full digital versatility of the correlator with minimal number of switching stages. For another example of the use of the memory card, suppose only two samplers per antenna were active in a given observation. The correlator chips normally used by the inactive samplers will be used to increase the number of lags by having each active sampler drive its own memory cards plus an inactive sampler's memory cards. This possibility means that the correlator chips need not be cascaded together to produce more lags. As stated above, the delayed memory can, by offset RAM addressing, instantaneously generate the lag offset required by the higher lag correlator chips. ### V) Correlators This correlator is designed around a proposed correlator chip. A block diagram of this chip is seen in Figure 2. The chip is proposed to be a 4 X 8 array of 128-lag correlators that operate at a clock rate of 125 MHz. A little bit of multiplexing on the chip would probably make the switching matrix easier to design, but this aspect of the design has not been pursued much at this point. The ability to break each 128-lag correlator into two 64-lag correlators to support polarization observation will probably be necessary. The total number of correlators required by this design is 40 X 40 X 32 X 8 or 409,600 128-lag correlators. By placing 32 such correlators on the chip in a small array, the number of chips required goes down to a more practical number of 12,800 chips. Even with this number of chips, 200 to 400 correlator cards will be required for the MMA correlator. The correlator chip seen in Figure 2 represents a factor of 2 increase in integration level from the 1024-lag correlator chip being used in the GBT spectrometer (assuming a 3-level by 3-level correlator). The GBT chips have 1024 3-level correlators, 1024 32-bit integrators and 1024 32-bit secondary storage registers for results read-out. Cutting the short term integrator to 12 or 16 bits while increasing the total number of lags to 4096 results in an increase of the integration level of the chip by a factor of about 2. A higher speed correlator chip might be cost effective but would require a more expensive signal interconnect technology. One compromise might be to double the speed of the correlator chip but keep the data input rate at 125 MHz by putting 2-into-1 mux stages on the chips. This would halve the number of correlator chips required and would still allow use of a relative easy interconnect technology. ### VI) Long-Term Accumulation A long-term accumulator design should be fairly straightforward. The one to several milli-second integration capacity of the correlator chips should allow high density and low cost DRAMs to be used here. The LTA and the correlator switching networks can be designed for very rapid switching between modes. The fundamental memory cycle of 1 msec can be carried through to other parts of the system such that the system should have the ability to switch from full bandwidth continuum to spectral line, for example, many times a second. Additional integration/storage space can be put into the LTA to do essentially simultaneous wide band and narrow band observations. ### VII) Performance Straight factors of 2 trade-off between bandwidth and frequency resolution are made easy by the use of the memory cards. Because these cards can generate large lags by RAM addressing, the correlator arrays need not be interconnected. It might be advantageous to cascade the 128-lag correlator segments on the correlator chips with switching stages, but the correlator chips or matrices themselves need not be cascaded to increase the frequency resolution. As the bandwidth is halved, the number of lags available for a given sampler doubles and the frequency resolution improves by factors of 4 until the bandwidth (per sampler) goes below 62.5 MHz. After this point, factors of 2 improvement will occur unless recirculation is built into the correlator. The table below gives some of the performance parameters to be expected from this correlator design: ### A) 8 active samplers per antenna (no polarization cross products): | total bandwidth | lags/IF | frequency resolution/IF | | |-------------------------|---------|-------------------------|--| | 16 GHz | 128 | 15.625 MHz | | | 8 GHz | 256 | 3.9062 MHz | | | 4 GHz | 512 | 0.9765 MHz | | | 2 GHz | 1024 | 0.244 MHz | | | 1 GHz | 2048 | 61.035 KHz | | | 500 MHz | 4096 | 15.258 KHz | | | 250 MHz (oversampling) | 4096 | 7.629 KHz | | | 125 MHz (oversampling) | 4096 | 3.814 KHz | | | 62.5 MHz (oversampling) | 4096 | 1.907 KHz | | ### B) 8 active samplers per antenna (with polarization cross products): | total bandwidth | lags/product | frequency resolution/IF | |-------------------------|--------------|-------------------------| | 8 GHz | 64 | 31.25 MHz | | 4 GHz | 128 | 7.8125 MHz | | 2 GHz | 256 | 1.953 MHz | | 1 GHz | 512 | 0.488 MHz | | 500 MHz | 1024 | 122.070 KHz | | 250 MHz | 2048 | 30.517 KHz | | 125 MHz (oversampling) | 2048 | 15.258 KHz | | 62.5 MHz (oversampling) | 2048 | 7.629 KHz | | 31.2 MHz (oversampling) | 2048 | 3.814 KHz | ### C) 4 active samplers per antenna (no polarization cross products): | total bandwidth | lags/IF | frequency resolution/IF | | |-------------------------|---------|-------------------------|--| | 8 GHz | 256 | 7.8125 MHz | | | 4 GHz | 512 | 1.953 MHz | | | 2 GHz | 1024 | 0.488 MHz | | | 1 GHz | 2048 | 122.070 KHz | | | 500 MHz | 4096 | 30.517 KHz | | | 250 MHz | 8192 | 7.629 KHz | | | 125 MHz (oversampling) | 8192 | 3.814 KHz | | | 62.5 MHz (oversampling) | 8192 | 1.907 KHz | | ## D) 4 active samplers per antenna (with polarization cross products): | total bandwidth | lags/product | frequency resol | ution/IF | |-------------------------|--------------|-----------------|----------| | 4 GHz | 128 | 15.625 M | | | 2 GHz | 256 | 3.906 M | Hz | | 1 GHz | 512 | 0.976 M | Hz | | 500 MHz | 1024 | 244.140 K | Hz | | 250 MHz | 2048 | 61.035 K | Hz | | 125 MHz | 4096 | 15.258 K | Hz | | 62.5 MHz (oversampling) | 4096 | 7.629 K | Hz | | 31.2 MHz (oversampling) | 4096 | 3.814 K | Hz | ### E) 2 active samplers per antenna (no polarization cross products): | total bandwidth | lags/IF | frequency resolution/IF | | |-------------------------|---------|-------------------------|-----| | 4 GHz | 512 | 3.906 | MHz | | 2 GHz | 1024 | 0.976 | MHz | | 1 GHz | 2048 | 244.140 | MHz | | 500 MHz | 4096 | 61.035 | MHz | | 250 MHz | 8192 | 15.258 | KHz | | 125 MHz | 16384 | 3.814 | KHz | | 62.5 MHz (oversampling) | 16384 | 1.907 | KHz | | 31.2 MHz (oversampling) | 16384 | 0.953 | KHz | ### F) 2 active samplers per antenna (with polarization cross products): | total bandwidth | lags/product | frequency resolution/IF | |-------------------------|--------------|-------------------------| | 2 GHz | 256 | 7.8125 MHz | | 1 GHz | 512 | 1.953 MHz | | 500 MHz | 1024 | 0.488 MHz | | 250 MHz | 2048 | 122.070 KHz | | 125 MHz | 4096 | 30.517 KHz | | 62.5 MHz | 4096 | 7.629 KHz | | 31.2 MHz (oversampling) | 4096 | 3.814 KHz | | 15.6 MHz (oversampling) | | 1.907 KHz | ### G) 1 active sampler per antenna: | total bandwidth | lags/IF | frequency resolution/IF | | |-------------------------|---------|-------------------------|-----| | 2 GHz | 1024 | 1.953 | MHz | | 1 GHz | 2048 | 0.488 | MHz | | 500 MHz | 4096 | 122.070 | KHz | | 250 MHz | 8192 | 30.517 | KHz | | 125 MHz | 16384 | 7.629 | KHz | | 62.5 MHz | 32768 | 1.907 | KHz | | 31.2 MHz (oversampling) | 32768 | 0.953 | KHz | | 15.6 MHz (oversampling) | 32768 | 0.476 | KHz | In addition to the modes shown above, mixed modes (where one sampler samples a wide bandwidth and another sampler on the same antenna samples a narrow bandwidth) and subarrays will be easily accommodated by this design. Very high resolution modes can be provided (down to a few Hz) either by generating very long lags using the narrowest filter above and performing very long FFTs, or by supplying very narrow filters. Digital filters will probably provide some of the narrowest bandwidths above but to get down to extremely narrow bandwidths would require a large filter (many taps for an FIR filter). The final solution to providing very high resolution might be a compromise between filter bandwidth and FFT length. ### VIII) Estimated Size and Power Requirement The 16-GHz bandwidth system described above would require 320 4-GHz samplers and around 800 PC cards of the 6-U to 9-U EURO card size. This would require around 10 racks for the samplers and 16 racks for the correlators. Power dissapation in the 200 to 400 KW range should be expected. An 8-GHz system would cut all of the figures above in half. One construction approach would be to build an 8-GHz system first and duplicate this system for the full 16-GHz array later. By far the most difficult design problem this correlator will present is in the signal cabling. One matrix of 40 X 40 correlators for a 4-GHz sampler will require 5120 125-MHz cables driving 51200 loads. To this total a factor of 4 or 8 must be applied for the full 8- or 16-GHz system. If a future increase in spectral line resolution is deemed desirable for the MMA, a design philosophy that allows for higher performance correlator chips that would be plug-in replacements for the chip envisioned above should be considered.