The focus of the project is the development of a demonstrator of a receiver with an optical 40 Gb/s input (InP-pin photodiode and CMOS preamplifier) and 4x10 Gb/s demultiplexed electrical outputs. The electronic system will operate at quarter rate and must not comprise external components (except a quartz reference). Because a parallel deployment of the resulting system on a single chip for parallel data transmission is projected, all components of the clock-data-recovery loop must be fully digital and will therefore be implemented in CMOS technology. The final goal will be to start a product integration of an optical/electrical conversion module having a 40 Gb/s optical input and 4x10 Gb/s electrical outputs.Besides the zma, the following groups are involved in this project:
The following issues must be regarded for the design of the individual links (each of them is called “link macro” because it should finally be implemented as an intellectual property (IP) block for use in microprocessor designs) when the goal of parallel implementation of several links on the same die must be achieved:
- Each link requires its own clock recovery function at the receiving side since delay and jitter on the individual link lines can independently vary and are not a-priori predictable. The receiver requires a function to align and de-scramble the data of the individual lines, but this function is not part of this project.
- Only one oscillator (or a very limited number of such) with a single external element (a quartz crystal) must provide the reference signal for the processing core and all the serial links. It is not permissible to implement each link with its own oscillator and PLL because substrate coupling would cause unpredictable injection locking across the several oscillators. In the better case, all oscillators would lock to one common frequency. In the worst case, a chaotic system with no predictable frequency would arise.
- The link circuitry must be implemented completely with standard digital CMOS technology to be compatible with the processor cores. Use of SiGe bipolar technology is no option due to the cost disadvantage of the additional masks.
- No external and the least number of internal passive components must be used to save chip area, not to spend additional pins and to save mounting cost. Therefore, a low-frequency passive loop filter is not an option for a clock-recovery PLL. This is in contrast to the requirements of SONET-type clock-data recovery circuits which are typically used for a single link only.
- Power and area consumption must be low. The total power budget for a single link can be estimated as
PDC<10 mW /(Gb/s/link)
where a “link” comprises the transmitter (Tx) and the receiver (Rx). For 40 Gb/s, this means a total DC power consumption of 400 mW for a link. Since the receiver is more complex and must have better sensitivity than the transmitter, we can use an upper limit of 250 mW as the goal for the DC power consumption of the receiver at 40 Gb/s. For lower data rates, PDC has to be down-scaled accordingly.
- Currently, implementation is performed in 90 nm bulk CMOS technology since this momentarily is the most mature and stable but still advanced technology option. Later, a transfer to 65 nm technology might be taken into consideration.
Based on the design issues, we work out a circuit topology shown in the following block diagram.
Project Highscore - Basic Block Diagram
Since we decided for a quarter rate architecture, eight sampling latches sample this 40 Gb/s electrical data stream. Doing this, only the sampling block of the sampling latches has to operate at full speed, but the latch phase can be four times longer (50 ps) than in a full rate system. Furthermore, the quarter rate architecture directly introduces a demultiplexing by a factor of four. Thus, the samplers deliver four data and for edge bits at 10 Gb/s. The samplers have to be triggered by clock signals spaced by 12.5 ps. Hence, the eight output signals of the samplers have a skew of 12.5 ps, as well.
The alignment of the four data and four edge bits to one single clock signal as well as a further reduction of the data rate by a factor of two is carried out in the aligning 8:32 demultiplexer. The 2.5 Gb/s signals are now slow enough to be processed by synthesizable standard cell CMOS logic.
In the CMOS logic block, the edge detection is carried out which results in the early and/or late information, respectively. In our first implementation, majority voting and a simple averaging loop filter are used to generate the control signals for the phase rotator. These control signals tell the phase rotator if it has to shift the phase up or down, depending on the early/late information.
The output signal of the phase rotator is generated by a weighted interpolation of a signal pair having a phase difference of 45°. Modifying the weights of the signals changes the phase of the output signal within a range of the phase limited by the phase of the two signals generating this output signal. Since the phase rotator must be able to generate a phase rotating by 360°, eight signals are required. These signals come from an additional DLL running at 10 GHz. In our implementation the phase rotator has got 104 steps which leads to minimum phase step of 1/26 UI at 40 GHz. The output signal of the phase rotator controls the DLL which generates the trigger signals for the samlers.
Contact : Dr. Alex Huber, IME