Listing top Pypi keywords | BigQuery Datasets - PyPI Docs
Using Google bq CLI, the following command allows to get the top Pypi keywords from the bigquery-public-data.pypi.distribution_metadata table:
bq query --use_legacy_sql=false 'SELECT keyword, COUNT(*) as keyword_count FROM `bigquery-public-data.pypi.distribution_metadata`, UNNEST(SPLIT(keywords, ", ")) as keyword GROUP BY keyword ORDER BY keyword_count DESC LIMIT 100'
Result for the top-15 keywords:
python: 128555 appearancesDuckDB Database SQL OLAP: 70739 appearancesai: 64997 appearancestensorflow tensor machine learning: 51144 appearancespulumi: 50076 appearancesapi: 47986 appearancesprobabilities probabilistic-graphical-models inference diagnosis: 46552 appearancesrust: 45607 appearancescli: 39512 appearancesOpenAPI: 38814 appearancessdk: 38060 appearancesllm: 37487 appearancesOpenAPI-Generator: 36734 appearancesdatabase: 35578 appearancesautomation: 34393 appearances
Note that this is a very basic query, that does take into account that some packages have a lot more versions published on Pypi than others.
— Permalien
