网站公告列表

  没有公告

加入收藏
设为首页
联系站长
您现在的位置: 61IC中国电子在线 >> 技术文库 >> 嵌入式 >> 文章正文
  [组图]Low Power CTS using Astro       ★★★ 【字体:
Low Power CTS using Astro
作者:61IC    文章来源:本站原创    点击数:    更新时间:2006-9-29    

Abstract
)P}jn c3Y*w W%sGuest  Traditional Astro CTS focus on the insertion delay and clock skew to build clock trees, which is often adequate to meet designers’ requirements, but for low power design, we are more interested in reducing total power consumption with accepted skew and delay penalty. At this point we tried Astro advanced CTS control and did a set of experiments to study it, finally we found that we can reduce total power consumption effectively by adopting CTS control other than default options, to weaken the skew and delay penalty we use double-spacing for clock nets routing.
*z^/x#bQ3rLh J&dGuestEDA中国门户网站I6R0r] h"oH p sQ
Table of contentEDA中国门户网站Y,a }7Q:m I[-Z
  1.IntroductionEDA中国门户网站Q'q'e W&S
  2.Power calculation theory
Ie%t0_"sf,]Guest  3.Parameters for user control
Q,nraTBSGuest  4.User control CTS experimentsEDA中国门户网站"m(iS.U vqJ,M|
   Parameters selection method
7ZFNaD%r R)K"bGuest   Buffers selection for CTS
4i'P(J(nST&b_!A/]/d2NGuest   Test case selection
*x n*vO5PIGuest   Test results
6iVU7w ?6a4hGuest   Conclusion for user control CTSEDA中国门户网站/u V}J*G
  5.Double-spacing clock nets routingEDA中国门户网站CV$m3FL,y3N
  6.Summary and suggestionEDA中国门户网站y(Q2z|-\_-iB

G1RxN'`5rL\i1GGuestIntroduction
6DswN _\ Dt!z.e q"@Guest  Reducing on-chip power consumption has become a critical challenge for the nanotechnology era. The traditional trade-offs between performance and area is compounded now by the addition of power into the equation. Problems related to power consumption are not only applicable to the battery powered, handheld and mobile applications, but everything targeting 90nm and beyond, where the power influences designs not only in terms of time to market, but also for cost and reliability.EDA中国门户网站$}YXq!O#OD B&p

W-x*b_7VsIGuest  Results on table 1-1 is a good example which shows us clock tree consumes more than 35% of total power, the data is got from one of our SMIC18 1P5M designs, analyzed by AstroRail. The ratio can be larger when designs go to deeper submicron or frequency of the design turn higher, so there’s no doubt it’s necessary to reduce total clock power during different stages of our designs. Although primary power consumption reduction can often be realized by method such as clock gating taken by front end designers, we can still reduce it during CTS.
:^x7bU#v1@Guest  During this research, we try to reduce the clock tree power consumption at two different stages,
4k/l7i g A!dD$[VGuest  1)Got a power friendly clock buffer structure and distribution through different kinds of control on the CTS parameters.EDA中国门户网站+O p \ ~&RyX
  2)Lower the clock net capacitance by routing clock net using double spacing to recover skew and delay curve.
R3K gGxTL B/}GuestEDA中国门户网站L3d/^2x1Fp
      
N!|0Q8d;dz4Q/E/SGuest

Power calculation theory

To accurately calculate the whole chip power consumption, there is several different ways to input the net switch activity information, include scheme, SAIF and VCD file, because there is no SAIF and VCD file for the test cases, we choose the scheme format as the switching activity input.EDA中国门户网站 Vb1DY-?0v@ B
From version 2004.06, AstroRail can accept switch activity by clock via command:EDA中国门户网站}0W#pQOP
defineNetSwitchingActivityByClock (geGetEditCell) “clock name” “net name” (switching probability)
kba U f(LQGuestAstroRail can get the clock frequency from sdc file, and take 2X clock frequency as clock toggle rate. For signal net, AstroRail will detect which clock domain it belongs to, and assign a switch activity according to the net switch probability refer to clock frequency. EDA中国门户网站0Gc"EG'n+k
We set all of the signal net probability to “0”, assume that there is no other signals toggle except clock nets, EDA中国门户网站!D:HW@ m6u9US
defineNetSwitchingActivityByClock (geGetEditCell) “.*” “.*” 0
6b%MO8l?3IGuestIn this way, we can get the total dynamic power on clock nets.

Parameters for user controlEDA中国门户网站r%?9i.j0j*f
EDA中国门户网站 Z;L,pRf6t
According to the description in Astro user guide, a set of parameters can be used to control CTS, target fanout, target transition and target load capacitance; max fanout, max transition, max capacitance and maximum buffer levels per gate. Each parameter has a default value, the default values of the parameters usually get good skew and insertion delay, but for some special case/requirement, we may need to adjust the parameter to get a special achievement.

The target parameters control the target of the CTS, CTS can meet the target from both lower and upper direction, however the max parameters control the logical DRC of the clock tree, if the values exceed the max constraints, Astro should think it’s a violation.

Usually the target value is smaller than the max value, but when there is conflict between the max value and the target value, the later will be adjust, the relationship between the adjusted target values and max values can be described below:EDA中国门户网站V6I$pv-l-uf![
Target fanout chosen = min( target fanout, 70% of max fanout)EDA中国门户网站Nu t Yn
Target load chosen = min(target load, 80% of max load)EDA中国门户网站N-ey7xkW Q
Target transition chosen = min(target transition, 80% of max transition)
eMGG$l FGuestBesides, another parameter also affects tool’s selection for final targets,
q7Y0e,uk3|6_&AGuestaxSetIntParams “acts” “CTA” 1EDA中国门户网站!P[9u&d g*X!dJ
the default value of CTA is 1, it means that tool will automatic find appropriate parameters by analyzing the design’s property, in spite of user changed the parameter setting or not. If one want to manually control the CTS parameter, turning off CTA is mandatory.

The script we used to control the CTS is as following:
E9dSz,Y s5l:QGuest(axSetIntParam "acts" "debug mode" 1)EDA中国门户网站Ce\*Pz x+[8fb
(axSetIntParam "acts" "CTA" 1)
Z a*h4^S#O0HGuest(axSetIntParam "acts" "target fanout" 1000)EDA中国门户网站[(ZX9~1K{6v.fH1[ }
(axSetRealParam "acts" "target: load capacitance" 0.2)
b}-X;FWI_4uZGuest(axSetRealParam "acts" "target: transition delay rise" 1000)EDA中国门户网站})q[.X(T_:c
(axSetRealParam "acts" "target: worst transition delay fall" 1000)EDA中国门户网站Jar"Q$q1r2c
(axSetRealParam "acts" "target: worst transition delay rise" 1000)EDA中国门户网站 E1|;](D |ar*J4L
(axSetRealParam "acts" "target: best transition delay rise" 1000)EDA中国门户网站 qrL$W'y P
(axSetRealParam "acts" "target: best transition delay rise" 1000)
o L[QE.E"d qGuest(axSetRealParam "acts" "target: transition delay fall" 1000)
o[P%K9[BGuestastClockOptionsEDA中国门户网站ViiK@Ea$u
formButton "Clock Common Options" "ConstraintSubForm"EDA中国门户网站;bo%de,Q/Nwo
setFormField "Clock Common Options" "Maximum Fanout" "64"
Jw|e {YGuestsetFormField "Clock Common Options" "Maximum Transition Delay" "0.5"EDA中国门户网站-{o#W;HT8r#u)~3v
setFormField "Clock Common Options" "Maximum Load Capacitance" "0.6"
u+oc%o'FcpGuestsubFormHide "Clock Common Options" 3
uNcY!B l0bGuestsetFormField "Clock Common Options" "Clock Nets" "clk"EDA中国门户网站6q6?em A?T"E@
setFormField "Clock Common Options" "Buffers/Inverters" EDA中国门户网站P8jXV Sa~G
“BUFCLKHD20X,BUFCLKHD16X,BUFCLKHD12X,BUFCLKHD8X,BUFCLKHD4X,BUFCLKHD3X,BUFCLKHD2X,BUFCLKHD1X"EDA中国门户网站]2W"p v;Y8r W:?
formOK "Clock Common Options"
Tl4pt.L(ao/g};nGuestEDA中国门户网站([!t2Q Ko;vU
           
5X,P\6p'gwJ7yIGuestEDA中国门户网站ZBn?b^
User control CTS experiments

1. Parameters selection method

  As about seven parameters will affect Astro CTS results, we can set max values according to designs’ character, for the selection of target values, we must abandon some of the values: fanout, as design go to physical design stage, it has been taken placed by capacitance, so we dropped target fanout here, but for skew considering, too large fanout is not acceptable, we need constraint it by max fanout; target load capacitance, interact with target transition, but can calculate by target transition for given buffer, so we also dropped it. Thus we select target transition as the variable for this test, constraint the results by set max values, relax the other two target values.

2. Buffers selection for CTS

  Generally, we use all clock buffers to build CTS for better skew, what about the results if we control buffers selection? We are not clearing about it. So we also add it in our test.EDA中国门户网站OkY^Sq/\8_D
        
$V B9~'OTGuest  Table 2-1 shows buffers used in the clock tree built by Astro, the buffer with the largest size dominate total buffers used, so we can control the largest driven-strength buffer given to Astro to realize buffer selection for CTS, for example, select 4X and all buffers smaller than 4X for 4X test, select 12X and buffers from 1X to 12X for 12X test.EDA中国门户网站 {@OGAD,SQD
EDA中国门户网站kN.NG$Y&t+pf{
1. Test case selection

  To check if the results can be repeated in other design, I selected two designs to do the test, below list global information of the two designs:
*`!N|"Bua5?"jz @;iY;MSGuest EDA中国门户网站 s(R[vNw
Snapshot of two design’s layout pictures is,
9z(J'E&B6@GxGuest     
"F{{^c:`5nGuest1. Test results

  First time I sweep target transition time from 0.15 – 1.05, do test for 20X and 4X largest buffer, set max transition 2, max capacitance 2, max fanout 100, relax target fanout and target capacitance to 1000, get results as figure 2-1,EDA中国门户网站-EWa-h-}9zH
EDA中国门户网站hq:t tp3~ug$g u G
   
H R:jmPj1HGuest  As we know increase target transition will introduce more power consumption, but meanwhile total area decline means power reduction, whose effect will be dominated? Let’s see power comparison results below:EDA中国门户网站G/}VX v
EDA中国门户网站4oH$e,J-U q\
   
;Q-L3v/U%Z5xGuest  For acute area and total transistor width reduction, total power decreases when target transition increases, but the improvement is so limited despite of more than %50 area reduction. If we are not caring about delay increasing, using small buffer as 4X is very useful to reduce total clock power consumption. what’s more, skew curve fluctuates stably at 4X buffer.
\-^(d u&?,X,afGuest  Also check the results at 4X buffer using case 2, figure 2-2,EDA中国门户网站G {rp:U

xI\6j g?Guest   
SYHd0qIY}Guest
VRz Z p/XK,aGuestand power comparison table 2-3,
.v7fV B ? IA.~GuestEDA中国门户网站fC$@[L_5aO
        EDA中国门户网站!qC8zKg3Sd
  Area curve acted just as case 1, but skew curve turned not stable, seems its action relies on clock cells’ design method, so we can’t take any conclusion here. Power comparison results on table 2-3 shows limited power reduction, only one glad thing is that the reduction of total transistor width, this will be very useful for reduce leakage power in deeper nanometer design.

  For area can’t reflect power consumption clearly, we add total transistor width in the results list, Now check transistor width and skew fluctuation with buffers change at fixed target transition time, figure 2-3, EDA中国门户网站$vTZ7MyI'[X,C q
EDA中国门户网站` WM U.x|+E"i ?
 
c}qrX {[4edfGuest  EDA中国门户网站-FD;EoO {X:nM
  
\(vT\\wp|iGuestEDA中国门户网站V,W [9m iN(\
  Didn’t like area-transition curve, we can’t find any regular fluctuation of area-buffer curve when transition target is not so big, it will decease with buffer size change when transition target swept to relatively larger value. We can also see total transistor width decreased with buffers changed from large to small when transition time was fixed at given values, this proved used small size buffer can reduce total clock power consumption. Another point need our attention is the swing of skew with buffer size is not regular, this make effect of user control CTS on timing not very clear.

2. Conclusion for user control CTS

  From the results we got above, one thing is clear that using small size buffer can reduce total transistor width of the clock tree, then clock power consumption decrease, how much power consumption it will reduce is not very optimistic for some design. Increase target transition time can rapidly improve area reduction of the clock network, but for transition time is also one primary cause of total clock power consumption, its increase will neutralize mainly of power improvement caused by area reduction, power reduction is so limited, otherwise, from area-transition curve of figures 2-1 and 2-2, area decline speed will turn slower when transition transferring to bigger values, so we can’t set it too large.

  To check the effect of user control CTS on timing and applicability of which for low power CTS, use another about 1M gate design (case 3) to finish one complete floor, it also use Verisilicon SMIC18 1P5M standard cell library, its layout picture is shown below:EDA中国门户网站oJ ^#I @7p+n"N
          
G;A:zAEfGuest  In this design, we set target transition at 0.6, use 4X buffer as the largest buffer from buffers provided to Astro CTS, set max transition and max capacitance 2, set max fanout 100, relax target transition and target fanout at very large value 1000, turn off CTA, compare results with that got from former design, table 3-1,EDA中国门户网站T r0O'j/utZ;k@

:Le1o$_ ^r2f.o0qoGuestEDA中国门户网站/rw t}_|4g$LSMm'^
  Total power decreases about %10 while clock power decreases %30, clock path delay 3.935 ns can be accepted by the design. Skew seems bad for new method, check timing after CTS compared with old data:EDA中国门户网站9Ud c7n:q)~ u

&JRPk H+rf OGuest          
Yd?;} ^ y,BGuestEDA中国门户网站{q6x NG Um
It can’t be the problem when global skew turned so bad, timing only became a little worse than before, it can quickly close after PPO, otherwise, max transition violations decreased when using new method too, this can be the contribution of area reduction. Thus, user control Astro CTS can reduce total power consumption without causing timing closure failure.

Double-spacing clock nets routing
H2[y2{ rK#Qf,{*i"yGuestEDA中国门户网站6B J#x(s/ax"e
  EDA中国门户网站r+]:Uaw1Q
   Figure 4-1 shows us total wire capacitance doesn’t vary much when we change CTS parameters, this tell us total wire length of the clock tree doesn’t change much during user control CTS too. Also see different type of capacitance comparison table 4-1,EDA中国门户网站&i zP4P0n

a Q(Xb"~,YGuest   EDA中国门户网站1gX4q:RW2J/J

X,{g)}7b/h&yGuestin Table 4-1, we can see that total wire capacitance takes more part than gate capacitance of the clock tree, if we can reduce wire capacitance of the tree, we can build clock tree use fewer buffers, skew and delay of the clock also can benefit from it, otherwise, switching power of total power consumption can also be reduced, so we using double-spacing clock net routing on case 3 to check this, using command like,EDA中国门户网站6HF d^D @ _'Y1\
axgDefineVarRule
z a)A*WP(x"@dt;YPwGuestsetFormField "Define Var Route Rule" "Rule Name" "double_spacing"EDA中国门户网站C8^6KizV q
setFormField "Define Var Route Rule" "Spacing1" "0.56"EDA中国门户网站k1R/{7kFQr
setFormField "Define Var Route Rule" "Spacing2" "0.56"
GG;RU.UGuestsetFormField "Define Var Route Rule" "Spacing3" "0.56"EDA中国门户网站7L`snql*b~
setFormField "Define Var Route Rule" "Spacing4" "0.56"EDA中国门户网站iHi"S)v/M2Pm
setFormField "Define Var Route Rule" "Spacing5" "1"
{[!s-L9o gi;UnGuestformOK "Define Var Route Rule"

axgSetNetConstraint
ywY2\%V&Fb+biGuestsetFormField "Set Net Constraint" "VarRoute Rule" "1"EDA中国门户网站o7c/O3b1bL8]$E
setFormField "Set Net Constraint" "Rule Name" "double_spacing"EDA中国门户网站?G7u:c~~sE|'Bx
setFormField "Set Net Constraint" "Net Name From" "All clock nets"
FMa3@'bkGuestformOK "Set Net Constraint"
EDA中国门户网站_iY ZS*c$\C~]
followed by user control Astro CTS with parameters setting same as above, route clock nets, then all other nets, optimize routing using “astPostRouteOpt”, do power analysis, compare results with former case 3 test results, table 4-2EDA中国门户网站XRz6g Usq

$O Q N+{/EGuest EDA中国门户网站4}[1wl.jw:i-j |

$o$C_8ZNggHT q HGuest  Just as expected, both skew and delay get improved after double-spacing clock nets routing, for switching power takes only a little part in total power consumption, total power only improved a bit.EDA中国门户网站 Uw#ds9kGt
EDA中国门户网站'N6vOf{+nl
Summary and Suggestion

Through the discussion above, we can make sure that using small size buffer to build CTS can reduce total transistor width of the clock network, generally it can reduce great deal of total clock power consumption, but for some design power reduction is so poor. We also can get good power results from the usage of inverters, but in this test we’ve not adopted it.

Increase target transition time to not very large value will not cause increase of clock power consumption, often we can get far more than %50 area and total transistor width reduction for the clock network by doubling the value, which will be very useful for reducing total leakage power in 90nm or deeper submicron design.

Double-spacing clock nets routing is very effective to improve CTS results whether the design is low power or not, but we should make sure our design is not very congested before used it.

               欢迎点击进入:TI德州中文网   (国内唯一针对TI应用的中文技术网站)    文章录入:admin    责任编辑:admin 
  • 上一篇文章:

  • 下一篇文章:
  • 发表评论】【加入收藏】【告诉好友】【打印此文】【关闭窗口
    最新热点 最新推荐 相关文章
    用抽象方法加快IC设计流程
      网友评论:(只显示最新10条。评论内容只代表网友观点,与本站立场无关!)
    站长:61IC 湘ICP备05002478号