Friday, 12 April 2013

Reducing crosstalk effects



Si and crosstalk are more so a wire-dominated effect. Some of the things that the designer should keep in mind while resolving crosstalk and si effects are:
1)      Reduce congestion: The more the routing congestion, the more is the si effect. So, the cells need to be either down sized or up sized based on the case. If the driver is weak and is driving a fanout of cells at long distances, these can become crosstalk victims. Hence these drivers should be upsized.  All low frequency signals like scan, reset and other static signals are common victims of this behavior. At the same time, if the driver is driving a fanout of cells sitting close by and if the driver is having a high drive strength, then it can become a potential aggressor. So, it needs to be down sized.
2)      Spacing: As we know the crosstalk capacitance is an indirect function of spacing. The more the spacing between the sheets of metal (routes in this case), lesser is the capacitance and so the amount of voltage that will be coupled from the aggressor to the victim will reduce, resulting in lesser si effects.
3)      Moving the routes: Since the crosstalk is more so a coupling effect, having more aggressors is often a bigger problem than having one (this is not with the assumption of the timing windows not overlapping). Hence, the nets can be moved so that the victim has just 1 aggressor instead of 2, reducing the quantum of crosstalk capacitance.
4)      Avoiding long parallel routes: Since capacitors in parallel add the capacitance, having long parallel routes will only increase the capacitance. Hence it is better to switch the net route layers between different metal layers.
5)      Shielding: The capacitance to ground is assumed to be ideal. So, if the transition value for the victim is not bad, then shielding with the ground net makes sure that the effects of crosstalk between the victim and the original aggressors are nullified.
6)      Repeater stitching/buffering: This is often the best fix if there is enough slack on the timing path. This avoids long routes and hence the coupling cap is reduced. Also, with staging there are many benefits like improved transition time, better IR and EM and less noise.

Solving hold time violations



In most scenarios, the winner in hold failure resolution is the increase of the datapath delay without affecting the setup time. If this is not possible, then the negative skew component is increased/positive skew component is reduced keeping in mind the setup timing. But, this is not normally practiced.

Also, note that unlike the setup time resolution, the hold failures have nothing to do with the RTL and hence the resolution has to happen at the PD stage of the design.

Increasing the datapath delay:
1)      This is achieved through addition of buffers at the timing arcs of cells which are part of the timing paths that are failing hold but not setup. Normally, the designer would query the setup timing on each of the input pins of cells (not just the timing arcs that are part of the failing hold timing path) and if enough slack is seen, then the hold buffers can be added at these points. The exact number of hold buffers added depends on the quantum of the failure. The library team would give the appropriate buffer sizes and buffers characterized to ensure that the delay for these cells does not cause damage at the slow corners.
2)      For marginal hold failures that don’t need a buffer/cell addition, then detours through a routing tool like the one supported by ICC can be used. But, the designer needs to be cautious of the si components acting on the nets as at times, detours can have a positive secondary crosstalk effect, hence making the signal transition faster. This will reduce the net delay hence worsening hold. So, for si hold violations, the best fix in route detours is to have the aggressor information. For ba hold violations, route detours adds delay and fixes the hold violation.
3)      Managing the AOCV derate values: The derate values applied should have gone through lot of experimentation. The derates applied may be too optimistic leading to hold violations. The derates should ideally be PVT corner dependent.
4)      Cell swaps: If the cells in the path are not failing setup timing, then the cells can be swapped to hvt. The hvt cells have inherent low drive capability and so offer more delay.
Clock skewing:
This is not suggested as the CTS will be built keeping the setup timing in mind. But, if there is enough slack on the launch side of the path, then additional CTS inverters can be added, hence inducing negative skew and fixing hold.

Solving Setup time violations



Somehow the datapath delay needs to be reduced to meet the setup timing requirement. If this is not possible, then the other option that the designer can try is to skew the clock so that the clock edge progresses and hence capture happens at the next clock edge.
An elaborate listing of the possible fixes are:
1)      Better placement: The timing path should be as linear as possible. There shouldn’t be any zig zag placement of the cells. This kind of haphazard placement means that the routes will be long leading to more delays.
2)      Better transitions/less capacitance: The timing path must be well structured. This means that there shouldn’t be cells driving loads seated at long distances. This leads to bad tran/increased cap on the nets and hence delay increases. At appropriate stages, buffers/repeaters have to added to ensure better tran value. Note that in latest process technologies (28nm and below), the cell delay is almost comparable to the net delay. Hence, an addition of an extra stage of buffers/repeaters reduces the effect of bad tran on the nets by half.
3)      Better optimizations: The tools are themselves intelligent to optimize the timing path better. But, in some cases the tool may not honour the timing weights and so some paths may not be optimized well. The cells may either be of very high drive strength/or poor drive strength. If the cells are of high drive strength and the preceding stage is not able to drive this load, it leads to a cap violation and so delay worsens. On the other hand if the cell is of low drive strength, then the tran on the driven net worsens and so delay increases. Hence the choice of cells should ideally be dependent on timing path criticality and placement of cells.
4)      Logic re-structuring: Tools may perform a good physical synthesis, but at times the designer himself has to restructure the logic. For example an AND gate may provide a delay of 20ps while the combination of NAND and NOT gates may give a delay of just 10ps. Some of the redundant logic added/inferred by the tool too can be eliminated. Unnecessary buffers, complex gates (like AND OR combinations for a simple mux etc) can be eliminated.
5)      Logic replication: Gates/flops can be duplicated to cater to a large fanout. For example, if there is a flopped version of a reset signal going to 100 AND gates, then the flop can be replicated (assuming that the D side has enough slack). This ensures that there are now 100 new timing paths instead of 1 and so the slack margins will improve significantly.
6)      Proper constraints: Constraints has a direct impact on placement of cells in a timing path, choice of cells by the tool during physical synthesis and the timing path group optimization. Interface paths need proper constraints to ensure the IO paths are neither over optimized or less optimized.
Path groups: The critical range and the weight set on each of the major timing path (like reg2reg, io_to_io, input_to_reg, reg_to_output) should be done carefully. If the internal timing closure is the priority, then a high weight can be set.
7)      Cell swaps: If there is difficulty in achieving any better optimization, then the cells can be swapped to lvt. With this the channel in the gates will be formed sooner and hence they switch faster, improving the delays.
8)      Addition of latches: Addition of negative level latches at before launch flop in a failing reg2reg path (both flops posedge) or positive level latch (in case of both flops being negedge) allows us to borrow half cycle and meeting setup.
What is also followed at some places is the swap of the flop failing setup with a positive/negative level sensitive latch to borrow time. But care should be taken so that data is not missed/X is propagated.

Clock skewing:
Positive skew relaxes setup so if there is enough slack on the launch clock path, then the capture clock path can be delayed, relaxing setup. This is normally followed after the CTS is optimized and one round of hold fixes is done.

AOCV derates:
The derates applied should be PVT corner specific and should be done with care.