gcc scheduler cache latency timings on Alpha.

gcc scheduler cache latency timings on Alpha.

Post by Tavis Ormand » Sat, 21 Jun 2003 09:09:15



the gcc scheduler uses memory latency timings to predict optimal
scheduling for memory references, the man page says that gcc knows
typical timings for ev4 and ev5..thats cool, but id like specific
timings for my machines so the scheduler can do the best job possible.

so i snarfed the source and investigated where it gets these figures
from and how it uses them...this is what i found:

* gcc has 9 values hardcoded
        L1 and L2 (and a rogue value for L3) for ev4.
        L1, L2 and L3 for ev5 and ev6.
        an estimate of `main` memory latency.

* if you dont specify a memory latency, it uses its hardcoded value
  for L1 on your revision.

* if you specify a bogus latency, it assumes 3 cycles :)

okay, these are the figures it has (this is gcc 3.2.3 btw)

        static int const cache_latency[][4] =
        {
                { 3, 30, -1 }, /* <-- ev4 */
                { 2, 12, 38 }, /* <-- ev5 */
                { 3, 12, 30 }, /* <-- ev6 */
        }

        /* ... */

        else if (! strcmp (alpha_mlat_string, "main"))
                lat = 150;

the comments say that the authors machine's main memory has latency of
370ns, so if i understand this..

        (((150/370)*10^9)/10^6) == 405Mhz

so assuming this guy made the timings for the ev5 Dcache latency as
well, using the same machine..

        ((10^9/405Mhz)*2) == 4.938ns latency of his Dcache (which sounds
        about right)

        (please correct me if im way off here, btw)

so i benchmarked mine, and my Dcache has a latency of 3.799
nanoseconds, and my ruffian has a 533Mhz clock speed, so my
-mmemory-latency time should be

        (3.799/(10^9/533Mhz)) == 2 cycles, which works (although i
        seem to get better benchmarks with 3 cycles latency...).

the value for `main` also works (well, 2 cycles out), so apply the same
logic to the Scache, where i have a latency of 65.1 ns

        (65.1/(10^9/533Mhz)) == 34.71 ...

so where is this 12 cycles figure coming from? id be greatful if anyone can
explain it.

--
-------------------------------------

-------------------------------------------------------

 
 
 

gcc scheduler cache latency timings on Alpha.

Post by Falk Hueffne » Sat, 21 Jun 2003 19:50:26



> the gcc scheduler uses memory latency timings to predict optimal
> scheduling for memory references, the man page says that gcc knows
> typical timings for ev4 and ev5..thats cool, but id like specific
> timings for my machines so the scheduler can do the best job possible.

> [...]

> the value for `main` also works (well, 2 cycles out), so apply the same
> logic to the Scache, where i have a latency of 65.1 ns

>    (65.1/(10^9/533Mhz)) == 34.71 ...

> so where is this 12 cycles figure coming from? id be greatful if
> anyone can explain it.

The HRM tells:

"When idle, Scache arbitration predicts a load miss in E0. If a load
actually does miss in E0, it is sent to the Scache immediately. If it
hits, and no other event in the Cbox affects the operation, the
requested data is available for use in eight cycles. Otherwise, the
request takes longer (possibly much longer, depending on the state of
the Scache and Cbox)."

So if you're lucky, it takes only 8 cycles. 12 is probably a
compromise.

Not that it matters a lot, anyway. There won't usually be enough
instructions that could be scheduled in between, so anything >8 is
basically equivalent.

--
        Falk

 
 
 

gcc scheduler cache latency timings on Alpha.

Post by Tavis Ormand » Sat, 21 Jun 2003 22:06:01



> "When idle, Scache arbitration predicts a load miss in E0. If a load
> actually does miss in E0, it is sent to the Scache immediately. If it
> hits, and no other event in the Cbox affects the operation, the
> requested data is available for use in eight cycles. Otherwise, the
> request takes longer (possibly much longer, depending on the state of
> the Scache and Cbox)."

> So if you're lucky, it takes only 8 cycles. 12 is probably a
> compromise.

Very interesting! Thanks!

--
-------------------------------------

-------------------------------------------------------

 
 
 

gcc scheduler cache latency timings on Alpha.

Post by Tavis Ormand » Sat, 21 Jun 2003 09:09:00


(CHARSET: PC-8)
(PATH: 263/950 236/150 261/38 140/1 106/2000 123/500 3613/1275 134/10)

the gcc scheduler uses memory latency timings to predict optimal scheduling
for memory references, the man page says that gcc knows typical timings for
ev4 and ev5..thats cool, but id like specific timings for my machines so the
scheduler can do the best job possible.

so i snarfed the source and investigated where it gets these figures from and
how it uses them...this is what i found:

* gcc has 9 values hardcoded
        L1 and L2 (and a rogue value for L3) for ev4.
        L1, L2 and L3 for ev5 and ev6.
        an estimate of `main` memory latency.

* if you dont specify a memory latency, it uses its hardcoded value
  for L1 on your revision.

* if you specify a bogus latency, it assumes 3 cycles :)

okay, these are the figures it has (this is gcc 3.2.3 btw)

        static int const cache_latency[][4] =
        {
                { 3, 30, -1 }, /* <-- ev4 */
                { 2, 12, 38 }, /* <-- ev5 */
                { 3, 12, 30 }, /* <-- ev6 */
        }

        /* ... */

        else if (! strcmp (alpha_mlat_string, "main"))
                lat = 150;

the comments say that the authors machine's main memory has latency of 370ns,
so if i understand this..

        (((150/370)*10^9)/10^6) == 405Mhz

so assuming this guy made the timings for the ev5 Dcache latency as well,
using the same machine..

        ((10^9/405Mhz)*2) == 4.938ns latency of his Dcache (which sounds
        about right)

        (please correct me if im way off here, btw)

so i benchmarked mine, and my Dcache has a latency of 3.799 nanoseconds, and
my ruffian has a 533Mhz clock speed, so my
-mmemory-latency time should be

        (3.799/(10^9/533Mhz)) == 2 cycles, which works (although i
        seem to get better benchmarks with 3 cycles latency...).

the value for `main` also works (well, 2 cycles out), so apply the same logic
to the Scache, where i have a latency of 65.1 ns

        (65.1/(10^9/533Mhz)) == 34.71 ...

so where is this 12 cycles figure coming from? id be greatful if anyone can
explain it.

--
-------------------------------------

-------------------------------------------------------

--- BBBS/NT v4.01 Flag-5
 * Origin: TCOB1: A slice of life on your plate (2:263/950)

--
This message posted from Shurato's Heavenly Sphere Telnet BBS
telnet://shurato.darktech.org

Shurato:  Admin of SHS.

 
 
 

gcc scheduler cache latency timings on Alpha.

Post by Falk Hueffne » Sat, 21 Jun 2003 21:50:00


(CHARSET: PC-8)
(PATH: 263/950 236/150 261/38 140/1 106/2000 123/500 3613/1275 134/10)


> the gcc scheduler uses memory latency timings to predict optimal
> scheduling for memory references, the man page says that gcc knows
> typical timings for ev4 and ev5..thats cool, but id like specific
> timings for my machines so the scheduler can do the best job possible.

> [...]

> the value for `main` also works (well, 2 cycles out), so apply the same
> logic to the Scache, where i have a latency of 65.1 ns

>       (65.1/(10^9/533Mhz)) == 34.71 ...

> so where is this 12 cycles figure coming from? id be greatful if
> anyone can explain it.

The HRM tells:

"When idle, Scache arbitration predicts a load miss in E0. If a load actually
does miss in E0, it is sent to the Scache immediately. If it hits, and no
other event in the Cbox affects the operation, the requested data is
available for use in eight cycles. Otherwise, the request takes longer
(possibly much longer, depending on the state of the Scache and Cbox)."

So if you're lucky, it takes only 8 cycles. 12 is probably a compromise.

Not that it matters a lot, anyway. There won't usually be enough instructions
that could be scheduled in between, so anything >8 is basically equivalent.

--
        Falk

--- BBBS/NT v4.01 Flag-5
 * Origin: TCOB1: A slice of life on your plate (2:263/950)

--
This message posted from Shurato's Heavenly Sphere Telnet BBS
telnet://shurato.darktech.org

Shurato:  Admin of SHS.

 
 
 

gcc scheduler cache latency timings on Alpha.

Post by Tavis Ormand » Sat, 21 Jun 2003 22:06:00


(CHARSET: PC-8)
(PATH: 263/950 236/150 261/38 140/1 106/2000 123/500 3613/1275 134/10)


> "When idle, Scache arbitration predicts a load miss in E0. If a load
> actually does miss in E0, it is sent to the Scache immediately. If it
> hits, and no other event in the Cbox affects the operation, the
> requested data is available for use in eight cycles. Otherwise, the
> request takes longer (possibly much longer, depending on the state of
> the Scache and Cbox)."

> So if you're lucky, it takes only 8 cycles. 12 is probably a
> compromise.

Very interesting! Thanks!

--
-------------------------------------

-------------------------------------------------------

--- BBBS/NT v4.01 Flag-5
 * Origin: TCOB1: A slice of life on your plate (2:263/950)

--
This message posted from Shurato's Heavenly Sphere Telnet BBS
telnet://shurato.darktech.org

Shurato:  Admin of SHS.

 
 
 

gcc scheduler cache latency timings on Alpha.

Post by Tavis Ormand » Sun, 22 Jun 2003 18:31:00


(CHARSET: PC-8)
(PATH: 263/950 236/150 261/38 140/1 106/2000 123/500 3613/1275 134/10)

(CHARSET: PC-8)
(PATH: 263/950 236/150 261/38 140/1 106/2000 123/500 3613/1275 134/10) From:

the gcc scheduler uses memory latency timings to predict optimal scheduling
for memory references, the man page says that gcc knows typical timings for
ev4 and ev5..thats cool, but id like specific timings for my machines so the
scheduler can do the best job possible.

so i snarfed the source and investigated where it gets these figures from and
how it uses them...this is what i found:

* gcc has 9 values hardcoded
        L1 and L2 (and a rogue value for L3) for ev4.
        L1, L2 and L3 for ev5 and ev6.
        an estimate of `main` memory latency.

* if you dont specify a memory latency, it uses its hardcoded value
  for L1 on your revision.

* if you specify a bogus latency, it assumes 3 cycles :)

okay, these are the figures it has (this is gcc 3.2.3 btw)

        static int const cache_latency[][4] =
        {
                { 3, 30, -1 }, /* <-- ev4 */
                { 2, 12, 38 }, /* <-- ev5 */
                { 3, 12, 30 }, /* <-- ev6 */
        }

        /* ... */

        else if (! strcmp (alpha_mlat_string, "main"))
                lat = 150;

the comments say that the authors machine's main memory has latency of 370ns,
so if i understand this..

        (((150/370)*10^9)/10^6) == 405Mhz

so assuming this guy made the timings for the ev5 Dcache latency as well,
using the same machine..

        ((10^9/405Mhz)*2) == 4.938ns latency of his Dcache (which sounds
        about right)

        (please correct me if im way off here, btw)

so i benchmarked mine, and my Dcache has a latency of 3.799 nanoseconds, and
my ruffian has a 533Mhz clock speed, so my
-mmemory-latency time should be

        (3.799/(10^9/533Mhz)) == 2 cycles, which works (although i
        seem to get better benchmarks with 3 cycles latency...).

the value for `main` also works (well, 2 cycles out), so apply the same logic
to the Scache, where i have a latency of 65.1 ns

        (65.1/(10^9/533Mhz)) == 34.71 ...

so where is this 12 cycles figure coming from? id be greatful if anyone can
explain it.

--
-------------------------------------

-------------------------------------------------------

-+- BBBS/NT v4.01 Flag-5
 + Origin: TCOB1: A slice of life on your plate (2:263/950)

--
This message posted from Shurato's Heavenly Sphere Telnet BBS
telnet://shurato.darktech.org

Admin of SHS.

--- BBBS/NT v4.01 Flag-5
 * Origin: TCOB1: A slice of life on your plate (2:263/950)

--
This message posted from Shurato's Heavenly Sphere Telnet BBS
telnet://shurato.darktech.org

Shurato:  Admin of SHS.

 
 
 

gcc scheduler cache latency timings on Alpha.

Post by Falk Hueffne » Sat, 21 Jun 2003 21:50:00


(CHARSET: PC-8)
(PATH: 263/950 236/150 261/38 140/1 106/2000 123/500 3613/1275 134/10)

(CHARSET: PC-8)
(PATH: 263/950 236/150 261/38 140/1 106/2000 123/500 3613/1275 134/10) From:


> the gcc scheduler uses memory latency timings to predict optimal
> scheduling for memory references, the man page says that gcc knows
> typical timings for ev4 and ev5..thats cool, but id like specific
> timings for my machines so the scheduler can do the best job possible.

> [...]

> the value for `main` also works (well, 2 cycles out), so apply the same
> logic to the Scache, where i have a latency of 65.1 ns

>       (65.1/(10^9/533Mhz)) == 34.71 ...

> so where is this 12 cycles figure coming from? id be greatful if
> anyone can explain it.

The HRM tells:

"When idle, Scache arbitration predicts a load miss in E0. If a load actually
does miss in E0, it is sent to the Scache immediately. If it hits, and no
other event in the Cbox affects the operation, the requested data is
available for use in eight cycles. Otherwise, the request takes longer
(possibly much longer, depending on the state of the Scache and Cbox)."

So if you're lucky, it takes only 8 cycles. 12 is probably a compromise.

Not that it matters a lot, anyway. There won't usually be enough instructions
that could be scheduled in between, so anything >8 is basically equivalent.

--
        Falk

-+- BBBS/NT v4.01 Flag-5
 + Origin: TCOB1: A slice of life on your plate (2:263/950)

--
This message posted from Shurato's Heavenly Sphere Telnet BBS
telnet://shurato.darktech.org

Admin of SHS.

--- BBBS/NT v4.01 Flag-5
 * Origin: TCOB1: A slice of life on your plate (2:263/950)

--
This message posted from Shurato's Heavenly Sphere Telnet BBS
telnet://shurato.darktech.org

Shurato:  Admin of SHS.

 
 
 

gcc scheduler cache latency timings on Alpha.

Post by Tavis Ormand » Sat, 21 Jun 2003 22:06:00


(CHARSET: PC-8)
(PATH: 263/950 236/150 261/38 140/1 106/2000 123/500 3613/1275 134/10)

(CHARSET: PC-8)
(PATH: 263/950 236/150 261/38 140/1 106/2000 123/500 3613/1275 134/10) From:


> "When idle, Scache arbitration predicts a load miss in E0. If a load
> actually does miss in E0, it is sent to the Scache immediately. If it
> hits, and no other event in the Cbox affects the operation, the
> requested data is available for use in eight cycles. Otherwise, the
> request takes longer (possibly much longer, depending on the state of
> the Scache and Cbox)."

> So if you're lucky, it takes only 8 cycles. 12 is probably a
> compromise.

Very interesting! Thanks!

--
-------------------------------------

-------------------------------------------------------

-+- BBBS/NT v4.01 Flag-5
 + Origin: TCOB1: A slice of life on your plate (2:263/950)

--
This message posted from Shurato's Heavenly Sphere Telnet BBS
telnet://shurato.darktech.org

Admin of SHS.

--- BBBS/NT v4.01 Flag-5
 * Origin: TCOB1: A slice of life on your plate (2:263/950)

--
This message posted from Shurato's Heavenly Sphere Telnet BBS
telnet://shurato.darktech.org

Shurato:  Admin of SHS.

 
 
 

gcc scheduler cache latency timings on Alpha.

Post by Tavis Ormand » Tue, 24 Jun 2003 16:00:00


(CHARSET: PC-8)
(PATH: 263/950 236/150 261/38 140/1 106/2000 123/500 3613/1275 134/10)

(CHARSET: PC-8)
(PATH: 263/950 236/150 261/38 140/1 106/2000 123/500 3613/1275 134/10) From:

(CHARSET: PC-8)
(PATH: 263/950 236/150 261/38 140/1 106/2000 123/500 3613/1275 134/10) From:

the gcc scheduler uses memory latency timings to predict optimal scheduling
for memory references, the man page says that gcc knows typical timings for
ev4 and ev5..thats cool, but id like specific timings for my machines so the
scheduler can do the best job possible.

so i snarfed the source and investigated where it gets these figures from and
how it uses them...this is what i found:

* gcc has 9 values hardcoded
        L1 and L2 (and a rogue value for L3) for ev4.
        L1, L2 and L3 for ev5 and ev6.
        an estimate of `main` memory latency.

* if you dont specify a memory latency, it uses its hardcoded value
  for L1 on your revision.

* if you specify a bogus latency, it assumes 3 cycles :)

okay, these are the figures it has (this is gcc 3.2.3 btw)

        static int const cache_latency[][4] =
        {
                { 3, 30, -1 }, /* <-- ev4 */
                { 2, 12, 38 }, /* <-- ev5 */
                { 3, 12, 30 }, /* <-- ev6 */
        }

        /* ... */

        else if (! strcmp (alpha_mlat_string, "main"))
                lat = 150;

the comments say that the authors machine's main memory has latency of 370ns,
so if i understand this..

        (((150/370)*10^9)/10^6) == 405Mhz

so assuming this guy made the timings for the ev5 Dcache latency as well,
using the same machine..

        ((10^9/405Mhz)*2) == 4.938ns latency of his Dcache (which sounds
        about right)

        (please correct me if im way off here, btw)

so i benchmarked mine, and my Dcache has a latency of 3.799 nanoseconds, and
my ruffian has a 533Mhz clock speed, so my
-mmemory-latency time should be

        (3.799/(10^9/533Mhz)) == 2 cycles, which works (although i
        seem to get better benchmarks with 3 cycles latency...).

the value for `main` also works (well, 2 cycles out), so apply the same logic
to the Scache, where i have a latency of 65.1 ns

        (65.1/(10^9/533Mhz)) == 34.71 ...

so where is this 12 cycles figure coming from? id be greatful if anyone can
explain it.

--
-------------------------------------

-------------------------------------------------------

-+- BBBS/NT v4.01 Flag-5
 + Origin: TCOB1: A slice of life on your plate (2:263/950)

--
This message posted from Shurato's Heavenly Sphere Telnet BBS
telnet://shurato.darktech.org

Admin of SHS.

-+- BBBS/NT v4.01 Flag-5
 + Origin: TCOB1: A slice of life on your plate (2:263/950)

--
This message posted from Shurato's Heavenly Sphere Telnet BBS
telnet://shurato.darktech.org

Admin of SHS.

--- BBBS/NT v4.01 Flag-5
 * Origin: TCOB1: A slice of life on your plate (2:263/950)

--
This message posted from Shurato's Heavenly Sphere Telnet BBS
telnet://shurato.darktech.org

Shurato:  Admin of SHS.

 
 
 

gcc scheduler cache latency timings on Alpha.

Post by Tavis Ormand » Wed, 25 Jun 2003 12:23:00


(CHARSET: PC-8)
(PATH: 263/950 236/150 261/38 140/1 106/2000 123/500 3613/1275 134/10)

(CHARSET: PC-8)
(PATH: 263/950 236/150 261/38 140/1 106/2000 123/500 3613/1275 134/10) From:

(CHARSET: PC-8)
(PATH: 263/950 236/150 261/38 140/1 106/2000 123/500 3613/1275 134/10) From:

(CHARSET: PC-8)
(PATH: 263/950 236/150 261/38 140/1 106/2000 123/500 3613/1275 134/10) From:

the gcc scheduler uses memory latency timings to predict optimal scheduling
for memory references, the man page says that gcc knows typical timings for
ev4 and ev5..thats cool, but id like specific timings for my machines so the
scheduler can do the best job possible.

so i snarfed the source and investigated where it gets these figures from and
how it uses them...this is what i found:

* gcc has 9 values hardcoded
        L1 and L2 (and a rogue value for L3) for ev4.
        L1, L2 and L3 for ev5 and ev6.
        an estimate of `main` memory latency.

* if you dont specify a memory latency, it uses its hardcoded value
  for L1 on your revision.

* if you specify a bogus latency, it assumes 3 cycles :)

okay, these are the figures it has (this is gcc 3.2.3 btw)

        static int const cache_latency[][4] =
        {
                { 3, 30, -1 }, /* <-- ev4 */
                { 2, 12, 38 }, /* <-- ev5 */
                { 3, 12, 30 }, /* <-- ev6 */
        }

        /* ... */

        else if (! strcmp (alpha_mlat_string, "main"))
                lat = 150;

the comments say that the authors machine's main memory has latency of 370ns,
so if i understand this..

        (((150/370)*10^9)/10^6) == 405Mhz

so assuming this guy made the timings for the ev5 Dcache latency as well,
using the same machine..

        ((10^9/405Mhz)*2) == 4.938ns latency of his Dcache (which sounds
        about right)

        (please correct me if im way off here, btw)

so i benchmarked mine, and my Dcache has a latency of 3.799 nanoseconds, and
my ruffian has a 533Mhz clock speed, so my
-mmemory-latency time should be

        (3.799/(10^9/533Mhz)) == 2 cycles, which works (although i
        seem to get better benchmarks with 3 cycles latency...).

the value for `main` also works (well, 2 cycles out), so apply the same logic
to the Scache, where i have a latency of 65.1 ns

        (65.1/(10^9/533Mhz)) == 34.71 ...

so where is this 12 cycles figure coming from? id be greatful if anyone can
explain it.

--
-------------------------------------

-------------------------------------------------------

-+- BBBS/NT v4.01 Flag-5
 + Origin: TCOB1: A slice of life on your plate (2:263/950)

--
This message posted from Shurato's Heavenly Sphere Telnet BBS
telnet://shurato.darktech.org

Admin of SHS.

-+- BBBS/NT v4.01 Flag-5
 + Origin: TCOB1: A slice of life on your plate (2:263/950)

--
This message posted from Shurato's Heavenly Sphere Telnet BBS
telnet://shurato.darktech.org

Admin of SHS.

-+- BBBS/NT v4.01 Flag-5
 + Origin: TCOB1: A slice of life on your plate (2:263/950)

--
This message posted from Shurato's Heavenly Sphere Telnet BBS
telnet://shurato.darktech.org

Admin of SHS.

--- BBBS/NT v4.01 Flag-5
 * Origin: TCOB1: A slice of life on your plate (2:263/950)

--
This message posted from Shurato's Heavenly Sphere Telnet BBS
telnet://shurato.darktech.org

Shurato:  Admin of SHS.