Skip to content

TDI-Lab/Generative-AI-Voting

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Generative-AI-Voting

This repository contains codes and data for emulating elections using Large Language Models

A comprehensive framework to emulate the elections using AI for three datasets - a) & b) Mock and Actual Participatory Budgeting Campaigns in Aarau, Switzerland; and c) US Elections. These election campaigns often have feedback surveys of voters, who also cast their ballots in the voting scenarios. In these surveys, the voters self-report their personal traits such as socio-demographics which we use along with voting scenarios to emulate AI representatives using generative and predictive AI models. The inconsistencies of the AI choices with human choices and also across various voting input methods are analyzed using explainable AI methods.

Datasets

We use the ANES data source (https://electionstudies.org/data-center/) for extracting a subset of the votes for American Elections 2012, 2016 and 2020 along with the following self-reported traits - (i) racial/ethnic self-identification, (ii) gender, (iii) age, (iv) conservative-liberal ideological self-placement, (v) party identification (political belief / political orientation), (vi) political interest, (vii) church attendance, (viii) discussion of politics with family/friends, (ix) feelings of patriotism associated with the American flag, and (x) state of residence.

Figshare Link: https://figshare.com/collections/Generative_AI_Voting_-_ANES/7261288

How to navigate the repository?

Required python version: Python3.11. Clone the repository and navigate the directory using the following pointers This repository has 5 major components.

A) Large language models to generate votes using the prompts.

We use llama 2 and GPT 3.5 models. Folder: Gen_AI_Codes: The extraction codes (running the API keys of GPT 3.5) can be found here. The parsing of the codes to extract the responses is also provided. The complete set of prompts to emulate a voter is available in the following link.

Folder: Gen_AI_Outputs: The parsed output from each prompt for llama and GPT3.5 at the different temperature settings along with original votes. The terminology A21 in these sheets stands for the single choice. Available for ANES 2012, 2016 and 2020. A21_GPT_LLAMA_PREDICTION_2012.xlsx A21_GPT_LLAMA_PREDICTION_2016.xlsx A21_GPT_LLAMA_PREDICTION_2020.xlsx

Package dependencies: openapi

B) Logs

Contains the token and requests usage between the period: 16th of June 2023 to 8th of November 2023. applicable for more than 50,000 AI representatives

C) Consistency Calculations

The Jaccard similarity code (jaccard_overall.py) for finding the overlap between original votes and that generated by llama2 and GPT3.5.
Input: Gen_AI_Outputs/A21_GPT_LLAMA_PREDICTION_2012.xlsx Command: python jaccard_overall.py

D) ML

All codes found in Common Codes Training code using recurrent neural network: ML_prediction_code_A21.py

Input: Gen_AI_Outputs/A21_GPT_LLAMA_PREDICTION_2012.xlsx --- this sheet contains the independent (10 personal traits) and dependent (i) predicted choices (Single_Choice ML Original (A21_original), Single_Choice GPT 3.5 (A21_GPT3.5), Single_Choice GPT llama (A21_llama2) and ii) the human and AI overlaps - llama-original, gpt3.5-original and ML-original.

Command: running the .ipynb notebook Package dependencies: tensorflow=2.0.0, Scikit-learn = 0.21, numpy, pandas

MODELWISE Folders: 2012, 2016, 2020 Contains feature importance codes, generates feature importance scores and

Models: The trained model considers the human-AI overlaps as dependent variables for all three years for GPT 3.5, llama2, and original stores as rnn_model.keras. The value 1 signifies overlap and 2 signifies overlap.

--MODELWISE -----2012 -----feature_importance_score.py ---------Original (Human) -> rnn_model.keras (trained model), feature_analysis.xlsx (feature importance scores), accuracy (metrics obtained) --------GPT 3.5 -> rnn_model.keras (trained model), feature_analysis.xlsx (feature importance scores), accuracy (metrics obtained) --------llama2 -> rnn_model.keras (trained model), feature_analysis.xlsx (feature importance scores), accuracy (metrics obtained)

-----2016 -----feature_importance_score.py ---------Original (Human) -> rnn_model.keras (trained model), feature_analysis.xlsx (feature importance scores), accuracy (metrics obtained) --------GPT 3.5 -> rnn_model.keras (trained model), feature_analysis.xlsx (feature importance scores), accuracy (metrics obtained) --------llama2 -> rnn_model.keras (trained model), feature_analysis.xlsx (feature importance scores), accuracy (metrics obtained)

-----2020 -----feature_importance_score.py ---------Original (Human) -> rnn_model.keras (trained model), feature_analysis.xlsx (feature importance scores), accuracy (metrics obtained) --------GPT 3.5 -> rnn_model.keras (trained model), feature_analysis.xlsx (feature importance scores), accuracy (metrics obtained) --------llama2 -> rnn_model.keras (trained model), feature_analysis.xlsx (feature importance scores), accuracy (metrics obtained)

E) Visualization

Codes to plot the feature importances for original, GPT 3.5, llama2 - scatter_plot.py Input: gpt_gain.xlsx, llama_gain.xlsx and original_gain.xlsx

About

This repository contain codes and data for emulating elections using Large Language Models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published