Using Google bq CLI, the following command allows to get the top Pypi keywords from the bigquery-public-data.pypi.distribution_metadata table:
bq query --use_legacy_sql=false 'SELECT keyword, COUNT(*) as keyword_count FROM `bigquery-public-data.pypi.distribution_metadata`, UNNEST(SPLIT(keywords, ", ")) as keyword GROUP BY keyword ORDER BY keyword_count DESC LIMIT 100'
Result for the top-15 keywords:
python : 128555 appearances
DuckDB Database SQL OLAP : 70739 appearances
ai : 64997 appearances
tensorflow tensor machine learning : 51144 appearances
pulumi : 50076 appearances
api : 47986 appearances
probabilities probabilistic-graphical-models inference diagnosis : 46552 appearances
rust : 45607 appearances
cli : 39512 appearances
OpenAPI : 38814 appearances
sdk : 38060 appearances
llm : 37487 appearances
OpenAPI-Generator : 36734 appearances
database : 35578 appearances
automation : 34393 appearances
Note that this is a very basic query, that does take into account that some packages have a lot more versions published on Pypi than others.
—
Permalien