

A 128-channel Time-to-Digital Converter (TDC) inside a Virtex-5 FPGA on the GANDALF module

This article has been downloaded from IOPscience. Please scroll down to see the full text article. 2012 JINST 7 C03008

(http://iopscience.iop.org/1748-0221/7/03/C03008)

View the table of contents for this issue, or go to the journal homepage for more

Download details: IP Address: 132.230.72.177 The article was downloaded on 27/04/2012 at 14:21

Please note that terms and conditions apply.

PUBLISHED BY IOP PUBLISHING FOR SISSA MEDIALAB



RECEIVED: October 31, 2011 ACCEPTED: January 11, 2012 PUBLISHED: March 6, 2012

TOPICAL WORKSHOP ON ELECTRONICS FOR PARTICLE PHYSICS 2011, 26–30 September 2011, VIENNA, AUSTRIA

# A 128-channel Time-to-Digital Converter (TDC) inside a Virtex-5 FPGA on the GANDALF module

# M. Büchele,<sup>1</sup> H. Fischer, M. Gorzellik, F. Herrmann, K. Königsmann, C. Schill and S. Schopferer

Albert-Ludwigs-Universität Freiburg, Physikalisches Institut Hermann-Herder-Str. 3, 79104 Freiburg, Germany

*E-mail:* maximilian.buchele@cern.ch

ABSTRACT: The GANDALF 6U-VME64x/VXS module has been developed for the digitization and real time analysis of detector signals. To perform different applications such as analog-todigital or time-to-digital conversions, coincidence matrix formation, fast pattern recognition and trigger generation, this module comes with exchangeable analog and digital mezzanine cards. Based on this platform, we present a 128-channel TDC which is implemented in a single Xilinx Virtex-5 FPGA using a shifted clock sampling method. In contrast to common TDC concepts, the input signal is sampled by 16 equidistant phase-shifted clocks. A particular challenge of the design is the minimum skew routing of the input signals to the sampling flip-flops. We present measurement results for the differential nonlinearity and the time resolution of the TDC readout system.

KEYWORDS: Digital signal processing (DSP); Trigger concepts and systems (hardware and software); Front-end electronics for detector readout; Digital electronic circuits

<sup>&</sup>lt;sup>1</sup>Corresponding author.

# Contents

| 1 | The GANDALF module                        | 1 |
|---|-------------------------------------------|---|
|   | 1.1 The digital mezzanine card            | 1 |
| 2 | The 128-channel time-to-digital converter | 2 |
|   | 2.1 Shifted clock sampling                | 2 |
|   | 2.2 16-bin TDC design                     | 3 |
|   | 2.3 FPGA implementation                   | 3 |
| 3 | Measurement results                       | 5 |
| 4 | Conclusion and outlook                    | ( |

#### 1 The GANDALF module

GANDALF [1, 2] is a 6U-VME64x/VXS carrier board which has been designed to cope with a variety of readout tasks in high energy and nuclear physics experiments. The module can host up to two mezzanine cards (figure 1). Currently 8-channel analog (AMC) and 64-channel digital input or output mezzanine cards (DMC) are available. To allow for high speed serial data exchange to or from dedicated detector frontend modules, an optical link mezzanine card is currently under development.

GANDALF comprises two Xilinx Virtex-5 FPGAs as well as 144-Mbit QDRII+ and 4-Gbit DDR2 memory extensions. The main FPGA (Virtex-5 SX95T) [3] allows for various data processing purposes, receiving the mezzanine cards' signals, whereas the second FPGA (Virtex-5 LX30T) [3] is used for memory control and data output. Both FPGAs are connected via eight differential high speed Aurora lanes with a total bandwidth of 25 Gbit/s per direction. Backplane link cards assure a dead-time free data readout of the GANDALF module, e.g. by the S-Link [4] or Ethernet protocol. Furthermore the USB2.0 port on the front panel or the VME64x bus in read mode can likewise be used for data output.

Different applications have been realized so far, for example GANDALF equipped with AMCs as a transient recorder [5, 6] or with DMCs a 64-channel mean-timer [7], 128 channel scaler or pattern generator and as in the following described a 128-channel TDC.

#### 1.1 The digital mezzanine card

The digital mezzanine card hosts two 32-channel VHDCI connectors and can handle 64 differential input signals (LVDS or LVPECL). These signals are routed to the user I/O of the SX95T FPGA using differential buffers. The overall additive jitter of the signal path including buffers and FPGA inputs is below 20 ps RMS. In addition one NIM input and two NIM outputs per DMC are provided via LEMO connectors. With different placement of the components at PCB assembly time, a LVDS



**Figure 1**. The picture shows the GANDALF module equipped with digital mezzanine cards. An optical receiver for a trigger and time synchronization system is provided by the center mezzanine card. Board configuration and monitoring is done using the VME64x interface. The VXS interface allows inter-board communication.

output card can be produced using the same PCB. Combining two DMCs per GANDALF module, versatile 128-channel I/O applications within the FPGA fabric are applicable.

# 2 The 128-channel time-to-digital converter

The design objectives of the GANDALF time-to-digital converter were to implement 128 TDC channels inside a single Virtex-5 SX95T FPGA for time-of-flight measurements. For this purpose a time resolution better than 100 ps is required. Dead-time free digitization, multi-hit capability and adequate hit buffer memory are mandatory. Furthermore a trigger matching unit has to be included in the FPGA logic to check the stored data time stamps for correlations in time to the experiment trigger. This allows for passing only hits within a programmable time window around the trigger signal to the output bus and thus to reduce the overall data transfer rate.

### 2.1 Shifted clock sampling

In a trivial TDC concept, one would just sample the input signal with a single flip-flop. In this case, the TDC bin width equals the clock period and is therefore limited to the maximum clock frequency of 500 MHz for the Virtex-5 FPGA. Better time resolution is achieved by subdividing the clock period. In the delayed data sampling (DDS) concept, the input signal is routed through a delay line and the delayed signals are then sampled with flip-flops using one common clock. Another way is the so called shifted clock sampling (SCS), where the same input signal is sampled with flip-flops



Figure 2. TDC concepts: delayed data sampling (left) and shifted clock sampling (right).

clocked by a set of equidistant phase-shifted clocks (figure 2). Whereas the DDS method needs just one sampling clock, allocating logic components with uniform propagation delays in the FPGA is not a trivial task. With the dedicated carry-chains, high-resolution TDCs have been implemented in FPGAs so far, but the delay is fixed to approximately 30 ps [8]. This is actually the main drawback of the DDS method, because the logic consumption for 128 TDC channels would exceed by far the device resources.

#### 2.2 16-bin TDC design

The TDC in this project is based on the SCS method using 16 equidistant phase-shifted clocks. To process the output of the sampling flip-flops, the different clock domains have to be synchronized first. This is done reading the output register in four partitions (figure 3). The hit searching algorithm then checks the partition's bit pattern for transitions from '0s' to '1s' or vice versa, depending whether the algorithm is configured leading and/or trailing edge. As hits can only be detected in a partition, the sampling flip-flops located on partition borders have to be read into both adjacent partitions to avoid loss of hits that might occur on these borders.

Whenever a hit is detected on an input signal, the time information is calculated from course counter value, partition number and bitswap position within the partition and stored in a hit buffer RAM for processing and read out. Timestamps of incoming triggers are measured with sampling clock period precision as well and are transferred to a trigger FIFO. The trigger time information is then processed by the trigger matching unit which selects the hits within a programmable time window and transfers them to the output FIFO. Hits with a time stamp older than the trigger latency are deleted from the hit buffers. All buffers are built from the dedicated 36 Kb block memory available in the Virtex-5 FPGA.

To simplify the data collection as well as the FPGA implementation process, F1-blocks are introduced, each combining eight channels as shown in figure 4. Additionally, the same data format as the existing hardware based on the TDC-F1 chip [9, 10] can be used. Finally, the data of 16 F1-blocks is sent to the data acquisition system using the S-Link interface.

#### 2.3 FPGA implementation

The accuracy of the digitization process is limited by the linearity of the TDC bins. The imperfections arise for instance from the phase shift error of the clocks used in the SCS algorithm. Eight clocks are generated by two PLL's and distributed via global clock nets across the FPGA. Eight more clocks are produced by locally inverting the clock signal using the clock inverter in every



**Figure 3**. 16 bin TDC design with four partitions. Squares indicate the 16 sampling flip-flops, one half each rising-edge or falling-edge triggered. Numbers refer to the corresponding clock.



**Figure 4**. F1-block consisting of eight TDC channels. Data selected by the trigger matching units is concentrated into a single S-Link FIFO interface. The S-Link FIFOs of 16 F1-blocks are read consecutively and data is transmitted by S-Link or Ethernet to a central data acquisition system.



Figure 5. Channel with minimum (left) and maximum (right) DNL.

Virtex-5 Slice. The deviations caused by the sampling clocks are very well controlled by the clock management facilities provided in the FPGA. Because the implementation tools do not allow to influence the routing of specific connections, the imperfections caused by the routing skew of the input signal to the sampling flip-flops are more difficult to control. A minimum skew routing was achieved by means of proper placement of the TDC register together with adequate timing constraints. Once the optimal configuration was found, the results could be preserved and duplicated using relative placement macros (RPM).

In a first step, area constraints for every F1-block were defined. In order to meet the design requirements, each F1-block was implemented separately. Thanks to incremental design reuse [11], the implementation results could be saved using design partitions. All F1-block partitions were then imported into the final design together with the remaining logic.

#### **3** Measurement results

The TDC's functionality was tested using a second GANDALF module with LVDS output cards to generate test pulses for 128 channels. Furthermore a DAQ with S-Link readout and a trigger control system (TCS) was installed. The accuracy of the time-to-digital conversion is given by the differential nonlinearity (DNL), which is determined using statistical code-density tests. Therefore the timestamps of a large number of random hits is measured and the number of hits falling in each time bin is filled in a histogram. As the expected number of events in every bin is known, the normalized histogram gives a direct measure of the TDC bin widths. Figure 5 shows the result for the channel with minimum and maximum DNL.

To determine the time resolution of the TDC, the time difference between two hits with a fixed delay is measured and repeated many times. The delay length is then sweeped in steps of approximately 20 ps over a range of at least the sampling clock period. The values obtained show a characteristic behaviour with minima whenever the delay length is a multiple of the TDC bin width (figure 6). In the ideal case, the minima would be zero and the maxima are equal to 0.5 LSB [12]. The time resolution is defined as the root mean square value of the standard deviation curve as a function of the measured time interval.



**Figure 6**. Left: RMS of delay measurements between consecutive hits on the same channel as fuction of the time interval length (exemplary channel). Right: Time resolution of the 128 channel TDC determined from the measurement results as shown in left picture.



**Figure 7**. Time resolution measured for hits between different channels. Each measurement point represents the mean value of 128 channels. Error bars show the standard deviation of all channels.

In many applications it is necessary to compare timestamps from different channels. For this reason, time resolution measurements using hits from different channels were carried out for time intervals over a dynamic range of around 2.5  $\mu$ s (figure 7).

#### 4 Conclusion and outlook

A 128-channel TDC has been successfully implemented inside a single Virtex-5 FPGA on the GANDALF module. The TDC is based on a shifted clock sampling algorithm using 16 equidistant phase-shifted clocks. The design uses around 43% of the flip-flops and 27% of the LUTs available in the device. The device utilization is therefore quite moderate, allowing further logic e.g. 128-channel scaler for rate measurements to be added into the same design. The measurements in section 3 were performed using a clock frequency of 388.8 MHz. This results in a TDC bin width of 160 ps. As the time resolution was determined from two time-stamp measurements, the accuracy of the GANDALF 128-channel TDC is better than  $0.6 \cdot 160 \text{ ps} / \sqrt{2} = 68 \text{ ps}$ .

Future work concentrates on the implementation of the TDC logic in low-cost FPGAs for large scale applications in drift detector readout.

#### Acknowledgments

This research project is supported by the Bundesministerium für Bildung und Forschung (BMBF) and the European Community Research Infrastructure Integrating Activity under the FP7 Study of Strongly Interacting Matter (HadronPhysics2, Grant Agreement number 227431).

## References

- [1] S. Bartknecht et al., *Development of a 1GS/s high-resolution sampling ADC system*, *Nucl. Instr. Meth.* A 623 (2010) 507.
- [2] S. Bartknecht et al., Development and Performance Verification of the GANDALF High-Resolution Transient Recorder System, IEEE Trans. Nucl. Sci. 58 (2011) 1456.
- [3] Xilinx Inc., Virtex-5 Family Overview, DS100 (2009).
- [4] H.C. van der Bij et al., S-LINK, a data link interface specification for the LHC era, IEEE Trans. Nucl. Sci. 44 (1997) 398.
- [5] F. Herrmann, *Development and Verification of a High Performance Electronic Readout Framework for High Energy Physics*, Ph.D. thesis, Albert-Ludwigs-Universität, Freiburg (2011).
- [6] S. Schopferer, *Entwicklung eines hochauflösenden Transientenrekorders*, Diploma Thesis, Albert-Ludwigs-Universität, Freiburg (2009).
- [7] J. Bieling et al., *Implementation of mean-timing and subsequent logic functions on an FPGA*, submitted to *Nucl. Instr. Meth.* A (2011) [arXiv:1109.4735v1].
- [8] Xilinx Inc., DC and Switching Characteristics, DS202 (2010).
- [9] G. Braun et al., F1 An Eight Channel Time-to-Digital Converter Chip for High Rate Experiments, hep-ex/9911009.
- [10] H. Fischer, et al., Implementation of the dead-time free F1 TDC in the COMPASS detector readout, Nucl. Instr. Meth. A 461 (2001) 507.
- [11] Xilinx Inc., Incremental Design Reuse with Partitions, XAPP918 (2007).
- [12] F. Baronti et al., On the differential nonlinearity of time-to-digital converters based on delay-locked-loop delay lines, IEEE Trans. Nucl. Sci. 48 (2001) 2424.