I had extensive airflow and I generally agree that Airflow isn't a good solution. It good when you process a single atomic/"unit of work" per step, when each step process multiple files etc and if it's restart you have to write code to handle skip those processed file for example.
But I want to point out a few things that are wrong in the artcile to help other evaluate airflow.
> Second, Airflow’s DAGs are not parameterized, which means you can’t pass parameters into your workflows. So if you want to run the same model with different learning rates, you’ll have to create different workflows.
You can pass the parameter to workflows by giving it a JSON config. When trigger on the UI, you can paste the JSON with the right argument/parameters into your DAGs. So you can train model with different arguments etc
> Third, Airflow’s DAGs are static, which means it can’t automatically create new steps at runtime as needed.
You can absolutely create new steps at run time. The point of airflow is everything is just Python code that is evaluate to generate DAGs, as long as you generate the DAGs and write the operator. It will happily run and log. It may have trouble rendered on the UIs and cause some weird issue (tasks won't advanced after certain steps regardless when I last work on them but they are bugs).
The one drawback I did note with Airflow was none of the mentioned ones, but this: It does not allow defining data dependencies at the data level. That is, in terms of individual inputs and outputs of a process or task.
But I want to point out a few things that are wrong in the artcile to help other evaluate airflow.
> Second, Airflow’s DAGs are not parameterized, which means you can’t pass parameters into your workflows. So if you want to run the same model with different learning rates, you’ll have to create different workflows.
You can pass the parameter to workflows by giving it a JSON config. When trigger on the UI, you can paste the JSON with the right argument/parameters into your DAGs. So you can train model with different arguments etc
> Third, Airflow’s DAGs are static, which means it can’t automatically create new steps at runtime as needed.
You can absolutely create new steps at run time. The point of airflow is everything is just Python code that is evaluate to generate DAGs, as long as you generate the DAGs and write the operator. It will happily run and log. It may have trouble rendered on the UIs and cause some weird issue (tasks won't advanced after certain steps regardless when I last work on them but they are bugs).
You can write an operator, the operator in turn can initiate any other known operators, and point the next steps to those operators. Here is an example: https://stackoverflow.com/questions/41517798/proper-way-to-c...