Rough investigation: Open-source solver time and memory penalties
#16
(07-09-2024, 01:17 AM)Olexandr Wrote: Thanks @Antii-L!
Interesting! As far as I remember we ended the project in summer 2018. Julia v1.0 was released in August that year, so before that the API would have been unstable.
This is my first experience with Julia, so I installed the latest version available 2 weeks ago - v1.10.4.
If we decide to proceed further with this we could e.g. decide on the versions of Julia to support and just run automated testing on GitHub to discover any incompatibilities.

Interesting progress. Nice work.

FYI, I've been in contact with IBM. Not so easy to find the right people but still trying. Goal is to see if they'd consider a special consideration for TIMES if ETSAP was to port TIMES to Julia. Hope to have something to say at the next ExCo, but not holding my breath.
g2
Reply
#17
@Olex : I tried it now, with Julia v1.10.4, and it looked like it worked alright.

At first I thought you were using the PROTO model database for the bigger model, which I used for my Julia experiment in 2018, but now I realize this must be the database for the smaller test model, the Utopia model (in GAMS it was only 614 rows, 632 columns and 2000 non-zeros), correct?

Here is my console log output from the run:

Define sets: 0.078074 seconds (90.67 k allocations: 6.188 MiB, 32.06% gc time, 91.65% compilation time)
Load demo data (sets and parameters): 5.537421 seconds (14.83 M allocations: 1.004 GiB, 8.09% gc time, 86.74% compilation time)
Compute additional parameters: 0.175782 seconds (370.70 k allocations: 25.610 MiB, 95.83% compilation time)
Compute additional sets: 1.650677 seconds (3.08 M allocations: 212.797 MiB, 2.42% gc time, 98.64% compilation time)
Create Variables: 92.028063 seconds (235.78 M allocations: 21.695 GiB, 71.83% gc time, 0.95% compilation time)
Objective Function: 0.186049 seconds (190.58 k allocations: 13.096 MiB, 95.06% compilation time)
Specify constraints: 31.989180 seconds (477.64 M allocations: 22.244 GiB, 29.73% gc time, 30.81% compilation time)
Running HiGHS 1.7.2 (git hash: 5ce7a2753): Copyright © 2024 HiGHS under MIT licence terms
Coefficient ranges:
  Matrix [1e-02, 1e+04]
  Cost  [1e+00, 1e+00]
  Bound  [1e-01, 1e+01]
  RHS    [9e-02, 8e+03]
Presolving model
265 rows, 288 cols, 966 nonzeros  3s
228 rows, 245 cols, 826 nonzeros  3s
Presolve : Reductions: rows 228(-376); columns 245(-46644766); elements 826(-1094)
Solving the presolved LP
Using EKK dual simplex solver - serial
  Iteration        Objective    Infeasibilities num(sum)
          0    1.2861637975e+04 Pr: 30(74.9799) 4s
        161    3.8706151237e+04 Pr: 0(0); Du: 0(8.52651e-14) 4s
Solving the original LP from the solution after postsolve
Model  status      : Optimal
Simplex  iterations: 161
Objective value    :  3.8706151237e+04
HiGHS run time      :        14.87
Solve model: 297.598005 seconds (279.92 M allocations: 29.121 GiB, 54.82% gc time, 0.04% compilation time: 61% of which was recompilation)


I think the model stats reported by HiGHS confirm it being the small Utopia model (but with 46.6 million variables?!). Some initial comments:
  • Most of the time appears spent into creating unnecessary variables (92 secs!); I am sure this could be easily reduced close to zero;
  • Constraints generation seems to take 32 secs; this may be affected by the inefficient handling of variables;
  • Solving with HiGHS appears to take 300 secs (including I/O overheads); again probably inefficient due to the variables;
  • At this point, it would indeed not make much sense to test with the bigger PROTO model. The times for Utopia should first be reduced close to zero, which should not be too dfficult.
Reply
#18
@Gary : Good to hear from you here as well!
> FYI, I've been in contact with IBM. Not so easy to find the right people but still trying. Goal is to see if they'd consider a special consideration for TIMES if ETSAP was to port TIMES to Julia.

Thanks for your efforts on that front. Hope to hear any news about it at the ExCo.
Reply
#19
Thanks @Gary!
@Antti-L, please try this one: https://github.com/olejandro/TIMES.jl/tr...d3d8a479a5
I've changed YEAR to MODLYR in the definition of the variables to avoid generating unnecessary variables. It is interesting to see that this wasn't necessary in the Pyomo example.
On my laptop, the Utopia model now takes about 1 minute, which is still much slower than the GAMS and Pyomo examples. But I guess, it should now be okay to try it on your Proto model?
I'll look into other common issues (e.g. type instability) that influence Julia / JuMP performance.
Reply
#20
@Olex : Yes, it was already much better, on my computer the whole cycle took now about 35 secs:

Define sets: 0.071533 seconds (90.67 k allocations: 6.188 MiB, 35.27% gc time, 97.53% compilation time)
Load demo data (sets and parameters): 4.993969 seconds (14.83 M allocations: 1.004 GiB, 8.33% gc time, 92.74% compilation time)
Compute additional parameters: 0.161982 seconds (370.70 k allocations: 25.610 MiB, 98.97% compilation time)
Compute additional sets: 1.584292 seconds (3.08 M allocations: 212.797 MiB, 2.28% gc time, 99.04% compilation time)
Create Variables: 1.606490 seconds (9.44 M allocations: 739.344 MiB, 28.36% gc time, 55.39% compilation time)
Objective Function: 0.240101 seconds (190.58 k allocations: 13.097 MiB, 26.72% gc time, 98.79% compilation time)
Specify constraints: 23.129035 seconds (477.64 M allocations: 22.244 GiB, 6.23% gc time, 22.82% compilation time)
Running HiGHS 1.7.2 (git hash: 5ce7a2753): Copyright © 2024 HiGHS under MIT licence terms
Presolve : Reductions: rows 228(-376); columns 245(-1393846); elements 826(-1094)
Solving the presolved LP
Model  status      : Optimal
Objective value    :  3.8706151237e+04
HiGHS run time      :          0.38
Solve model: 2.802878 seconds (8.42 M allocations: 725.451 MiB, 10.65% gc time, 3.70% compilation time: 56% of which was recompilation)


However, there were still lots of unnecessary variables (1.4 million vars in total, while there should be only about 650 variables.  In my 2018 experiment, I used the following kind of tighter specs:

@variable(m,PrcAct[r in REGION,t in MILEYR,p in [p for p in PROCESS if RTP_VARA[r,t,p]],v in RTP_VNT[r,t,p],s in PRC_TS[r,p]] >=0)
@variable(m,PrcFlo[r in REGION,t in MILEYR,p in [p for p in PROCESS if RP_FLO[r,p] && RTP_VARA[r,t,p]],v in RTP_VNT[r,t,p],c in RPC[r,p],s in RPC_TS[r,p,c]] >=0)
@variable(m,IreFlo[r in REGION,t in MILEYR,p in [p for p in PROCESS if RP_IRE[r,p] && RTP_VARA[r,t,p]],v in RTP_VNT[r,t,p],c in RPC[r,p],s in PRC_TS[r,p],ie in IMPEXP; RPC_IRE[r,p,c,ie]] >=0)


Of course, the various control attributes used there had to be created before that, but they could be used in many constraints as well. But I agree that Julia works unexpectedly poorly with respect to variables declaration (compared to e.g. GAMS), if one needs to use complex conditionals to avoid lots of spurious entries.

On my computer, the Utopia model is generated by GAMS in just 0.05 secs, and solved in 0.02 secs by Cplex, and so the performance gap is still high. Looks like there is a lot of room for improvement in the constraint formulation (e.g. the memory usage suggests so, 22.244 GiB, if I understand it correctly). I think one should get it at least under 1 second...
Reply
#21
Thanks @Antti-L! I've reduced the number of variables to about 5k by using tighter specs here: https://github.com/olejandro/TIMES.jl/tree/run-faster
On my laptop it did not cause any visible improvement in the run time. Will look into constraint formulation next.
Reply
#22
Great work guys! I'm trying to follow along. From having a play, it looks like the ~34 lines of code defining the Process Transformation constraints are the biggest bottleneck now, together with the Process Flow Shares ones.

Create model: 0.000072 seconds (86 allocations: 9.000 KiB)
System Sets: 0.000062 seconds (697 allocations: 57.195 KiB)
Get data: 0.126396 seconds (96.55 k allocations: 4.645 MiB)
Parameters: 0.000076 seconds (28 allocations: 1.305 KiB)
Compute sets: 0.003145 seconds (31.07 k allocations: 1.266 MiB)
Variables: 0.427004 seconds (5.74 M allocations: 265.189 MiB, 17.58% gc time)
Objective: 0.000031 seconds (37 allocations: 2.172 KiB)
EQ_OBJINV: 0.000362 seconds (2.81 k allocations: 130.523 KiB)
EQ_OBJFIX: 0.000370 seconds (2.68 k allocations: 121.883 KiB)
EQ_OBJVAR: 0.000568 seconds (3.71 k allocations: 261.398 KiB)
EQ_ACTFLO: 0.003794 seconds (56.12 k allocations: 2.402 MiB)
EQL_CAPACT: 0.005743 seconds (82.12 k allocations: 3.988 MiB)
EQE_CAPACT: 0.002943 seconds (49.06 k allocations: 1.976 MiB)
EQE_CPT: 0.000207 seconds (1.74 k allocations: 65.930 KiB)
EQL_CPT: 0.000146 seconds (1.48 k allocations: 49.820 KiB)
EQG_CPT: 0.000173 seconds (1.92 k allocations: 77.039 KiB)
EQL_FLOSHR: 2.689352 seconds (33.74 M allocations: 1.891 GiB, 12.86% gc time)
EQL_FLOSHR: 3.379022 seconds (33.73 M allocations: 1.890 GiB, 13.02% gc time)
EQL_FLOSHR: 2.258765 seconds (33.73 M allocations: 1.890 GiB, 13.85% gc time)
EQE_ACTEFF: 0.084329 seconds (1.31 M allocations: 54.377 MiB)
EQ_PTRANS: 32.496109 seconds (360.99 M allocations: 15.599 GiB, 9.21% gc time)
EQG_COMBAL: 0.003289 seconds (26.95 k allocations: 1.687 MiB)
EQE_COMBAL: 0.000493 seconds (3.47 k allocations: 138.609 KiB)
EQE_COMPRD: 0.000244 seconds (1.95 k allocations: 67.266 KiB)
EQ_STGTSS: 0.003442 seconds (59.42 k allocations: 2.432 MiB)
Constraints: 40.936759 seconds (463.81 M allocations: 21.337 GiB, 9.99% gc time)
Running HiGHS 1.7.2 (git hash: 5ce7a2753): Copyright © 2024 HiGHS under MIT licence terms
Coefficient ranges:
Matrix [1e-02, 1e+04]
Cost [1e+00, 1e+00]
Bound [1e-01, 1e+01]
RHS [9e-02, 8e+03]
Presolving model
265 rows, 285 cols, 963 nonzeros 0s
228 rows, 245 cols, 826 nonzeros 0s
Presolve : Reductions: rows 228(-376); columns 245(-4571); elements 826(-1094)
Solving the presolved LP
Using EKK dual simplex solver - serial
Iteration Objective Infeasibilities num(sum)
0 1.2861637975e+04 Pr: 30(74.9799) 0s
162 3.8706151237e+04 Pr: 0(0); Du: 0(1.35003e-13) 0s
Solving the original LP from the solution after postsolve
Model status : Optimal
Simplex iterations: 162
Objective value : 3.8706151237e+04
HiGHS run time : 0.02
Solve model: 0.031630 seconds (37.42 k allocations: 3.734 MiB)
End to end model run: 41.530639 seconds (469.71 M allocations: 21.605 GiB, 10.03% gc time)
Reply
#23
Great, it is indeed very useful to see the times used by equation block.
Did you change something about the loading of data, as it seems to go much faster for you?
Get data: 0.126396 seconds (96.55 k allocations: 4.645 MiB)
And is it so that you compiled the project before running (I guess that is recommended for performance)? For my older experiment, I worked under the REPL and so compiling functions there before running was easy, but I am not sure how to do it for running this demo. Therefore, you can see notable compile times in my logs, but I cannot see any such in yours.
Reply
#24
Thanks @willcattoneeca! Probably a lot of time is spent on filtering here:
Code:
EQ_PTRANS[
        r in REGION,
        p in PROCESS,
        cg1 in COMGRP,
        cg2 in COMGRP,
        s1 in TSLICE,
        t in MILEYR,
        v in MODLYR,
        s in TSLICE;
        (r, p, cg1, cg2, s1) in RP_PTRAN &&
        (r, s1, s) in eachindex(RS_FR) &&
        s in RP_S1[r, p] &&
        (r, t, p) in RTP_VARA &&
        v in RTP_VNT[r, t, p],
    ]

The following code takes almost 26 sec.

Code:
[(r, p, cg1, cg2, s1, t, v, s)
for r in REGION
for p in PROCESS
for cg1 in COMGRP
for cg2 in COMGRP
for s1 in TSLICE
for t in MILEYR
for v in MODLYR
for s in TSLICE
if (
    (r, p, cg1, cg2, s1) in RP_PTRAN &&
    (r, s1, s) in eachindex(RS_FR) &&
    s in RP_S1[r, p] &&
    (r, t, p) in RTP_VARA &&
    v in RTP_VNT[r, t, p]
)
]

I can experiment with relational algebra instead.
Reply
#25
Hi Antti-L - yes, I think that the way demo.jl is re-running "include" statements forces Julia to re-compile the code each time. I suspect something similar will likely be the case where eval is being used. I was just profiling the second run of a function into which I'd splatted in all of Olex's code: https://github.com/willcattoneeca/TIMES....e.jl#L1192
The typical pattern is to bundle different bits of functionality as functions (or custom types) in modules that can then be precompiled, but I think even precompiled module code requires JIT compilation the first time it executes - implying that in Julia it would be more efficient to run scenarios in batches rather than one at a time, etc.

Edit - Actually, it looks like there is a solution available to avoid that slow first execution as well:
https://github.com/JuliaLang/PackageCompiler.jl
Reply
#26
Cool! :-)
Btw, if you execute the demo code in REPL as include("demo.jl") it will behave the expected way. I.e. it will execute faster the second time.
I've done some more of 'index tightening" to reduce the constraint generation time. I'll need to switch to using dictionaries instead of JuMP containers to do more of it.
Reply
#27
Nice! You've made it a whole lot quicker!!

For me (Julia 1.9.4 but I don't think that is likely to cause any difference), running include("demo.jl") twice does still lead to the @time macro results reporting that Julia is spending a significant fraction of its time (for each of the small preliminary steps) on compilation. For instance, "54.38% compilation time" in the below output where I run it a second time. Do you see compilation time mentioned the second time you run include("demo.jl") in your REPL? Compilation is entirely eliminated the second time the "run_model" function gets executed in my fork.

The compilation time is a small part of the overall time at present, but it could seem significant e.g. if the goal is to get down to the 1 second ballpark. And since it can be eliminated it should be excluded from any estimate of the achievable performance.

(base) PS C:\Users\cattonw\git\TIMES.jl> julia --project=.
              _
  _      _ _(_)_    |  Documentation: https://docs.julialang.org
  (_)    | (_) (_)    |
  _ _  _| |_  __ _  |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.9.4 (2023-11-14)
_/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                  |

julia> include("demo.jl")
Define sets: 0.091661 seconds (106.66 k allocations: 7.145 MiB, 94.45% compilation time)
Load demo data (sets and parameters): 10.461156 seconds (17.31 M allocations: 1.128 GiB, 3.94% gc time, 81.63% compilation time)
Compute additional parameters: 0.321723 seconds (422.56 k allocations: 28.788 MiB, 4.00% gc time, 98.12% compilation time)
Compute additional sets: 3.165227 seconds (3.29 M allocations: 216.488 MiB, 2.12% gc time, 99.01% compilation time)
Create Variables: 4.322982 seconds (10.30 M allocations: 597.847 MiB, 4.41% gc time, 87.55% compilation time)
Objective Function: 0.423517 seconds (216.08 k allocations: 14.667 MiB, 3.00% gc time, 97.54% compilation time)
Specify constraints: 19.452738 seconds (97.60 M allocations: 5.401 GiB, 5.11% gc time, 63.17% compilation time)
Running HiGHS 1.7.2 (git hash: 5ce7a2753): Copyright © 2024 HiGHS under MIT licence terms
Coefficient ranges:
  Matrix [1e-02, 1e+04]
  Cost  [1e+00, 1e+00]
  Bound  [1e-01, 1e+01]
  RHS    [9e-02, 8e+03]
Presolving model
265 rows, 285 cols, 963 nonzeros  0s
228 rows, 245 cols, 826 nonzeros  0s
Presolve : Reductions: rows 228(-376); columns 245(-4571); elements 826(-1094)
Solving the presolved LP
Using EKK dual simplex solver - serial
  Iteration        Objective    Infeasibilities num(sum)
          0    1.2861637975e+04 Pr: 30(74.9799) 0s
        162    3.8706151237e+04 Pr: 0(0); Du: 0(1.35003e-13) 0s
Solving the original LP from the solution after postsolve
Model  status      : Optimal
Simplex  iterations: 162
Objective value    :  3.8706151237e+04
HiGHS run time      :          0.02
Solve model: 0.228086 seconds (96.55 k allocations: 7.800 MiB, 76.42% compilation time: 33% of which was recompilation)
* Solver : HiGHS

* Status
  Result count      : 1
  Termination status : OPTIMAL
  Message from the solver:
  "kHighsModelStatusOptimal"

* Candidate solution (result #1)
  Primal status      : FEASIBLE_POINT
  Dual status        : FEASIBLE_POINT
  Objective value    : 3.87062e+04
  Objective bound    : 0.00000e+00
  Relative gap      : Inf
  Dual objective value : 3.87062e+04

* Work counters
  Solve time (sec)  : 2.07250e-02
  Simplex iterations : 162
  Barrier iterations : 0
  Node count        : -1


julia> include("demo.jl")
Define sets: 0.007051 seconds (1.63 k allocations: 106.188 KiB)
Load demo data (sets and parameters): 0.250902 seconds (123.93 k allocations: 6.411 MiB, 45.34% compilation time)
Compute additional parameters: 0.093254 seconds (32.46 k allocations: 2.268 MiB, 88.64% compilation time)
Compute additional sets: 2.109598 seconds (1.25 M allocations: 86.716 MiB, 1.60% gc time, 97.75% compilation time)
Create Variables: 2.623167 seconds (7.02 M allocations: 385.873 MiB, 4.27% gc time, 81.17% compilation time)
Objective Function: 0.027465 seconds (5.25 k allocations: 305.211 KiB, 64.73% compilation time)
Specify constraints: 15.146921 seconds (91.38 M allocations: 5.004 GiB, 5.57% gc time, 54.38% compilation time)
Running HiGHS 1.7.2 (git hash: 5ce7a2753): Copyright © 2024 HiGHS under MIT licence terms
Coefficient ranges:
  Matrix [1e-02, 1e+04]
  Cost  [1e+00, 1e+00]
  Bound  [1e-01, 1e+01]
  RHS    [9e-02, 8e+03]
Presolving model
265 rows, 285 cols, 963 nonzeros  0s
228 rows, 245 cols, 826 nonzeros  0s
Presolve : Reductions: rows 228(-376); columns 245(-4571); elements 826(-1094)
Solving the presolved LP
Using EKK dual simplex solver - serial
  Iteration        Objective    Infeasibilities num(sum)
          0    1.2861637975e+04 Pr: 30(74.9799) 0s
        162    3.8706151237e+04 Pr: 0(0); Du: 0(1.35003e-13) 0s
Solving the original LP from the solution after postsolve
Model  status      : Optimal
Simplex  iterations: 162
Objective value    :  3.8706151237e+04
HiGHS run time      :          0.02
Solve model: 0.028188 seconds (35.32 k allocations: 3.690 MiB)
* Solver : HiGHS

* Status
  Result count      : 1
  Termination status : OPTIMAL
  Message from the solver:
  "kHighsModelStatusOptimal"

* Candidate solution (result #1)
  Primal status      : FEASIBLE_POINT
  Dual status        : FEASIBLE_POINT
  Objective value    : 3.87062e+04
  Objective bound    : 0.00000e+00
  Relative gap      : Inf
  Dual objective value : 3.87062e+04

* Work counters
  Solve time (sec)  : 1.58119e-02
  Simplex iterations : 162
  Barrier iterations : 0
  Node count        : -1
Reply
#28
> The compilation time is a small part of the overall time at present, but it could seem significant e.g. if the goal is to get down to the 1 second ballpark. And since it can be eliminated it should be excluded from any estimate of the achievable performance.

I concur with that.  And after getting there, one could start examining how the performance scales up.
Reply
#29
Sounds good!
My current best is about 13 sec total time (2nd pass); with 8 sec for constraints generation.
Reply
#30
Managed to get it down to about 7 sec total time (2nd pass) with 3 sec for constraints generation.
Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)