vmtmix_fy23 package

Submodules

vmtmix_fy23.figures module

vmtmix_fy23.i_raw_dt_prc module

Pre-process the data received from TxDOT. This data contains MVC and PERM counter data. Check layers in STAR II (1^) to get more info. 1^ https://txdot.public.ms2soft.com/tcds/tsearch.asp?loc=Txdot&mod=TCDS Created by: Apoorb Created/ Modified on: 02/14/2023

vmtmix_fy23.i_raw_dt_prc.add_mvs_rdtype_to_mvc_new(mvc_countr_new_): Add MOVES road type to the MVC data.

vmtmix_fy23.i_raw_dt_prc.add_mvs_rdtype_to_perm(perm_countr_): Add MOVES road type to the ATR data.

vmtmix_fy23.i_raw_dt_prc.clean_mvc_countr(mvc_file)

The function clean_mvc_countr takes a file path mvc_file as input and returns a pandas dataframe after cleaning and processing. mvc_file contains the Manual Vehicle Count data from TxDOT. The function performs the following operations on the dataset:

The function expects the input file to be in excel format.
The function drops the ‘start_date’ and ‘start_time’ columns from the input file.
The ‘latitude’ and ‘longitude’ columns are converted to float type.
The ‘end_date’ column is converted to datetime format with format=”%m/%d/%Y”.
The ‘start_datetime’ column is created by combining ‘start_date’ and ‘start_time’ columns and converted to datetime format with format=”%m/%d/%Y %I:%M:%S %p”.
The ‘location_id’ column is created by combining values from the ‘station_id’, ‘pre_dir’, ‘street’, ‘suf_dir’, and ‘cmb_dir’ columns using the function ‘get_sta_pre_id_suf_cmb’.

Parameters:: mvc_file (str) – The name of the file to read and clean.
Returns:: mvc_countr_fil – A cleaned pandas dataframe containing information from the input file. The dataframe has the following columns: - latitude - longitude - end_date - start_datetime - location_id
Return type:: pd.DataFrame

vmtmix_fy23.i_raw_dt_prc.clean_perm_countr() → DataFrame

Clean and preprocess the PERM_CLASS_BY_HR_2013_2021 dataset, which contains hourly count data from the Texas Department of Transportation. The function performs the following operations on the dataset:

Reads the CSV file into a pandas DataFrame
Renames the columns of the DataFrame to snake_case
Asserts that the start_time column has a uniform format of 1900-01-01 HH:MM and that all hours from 0 to 23 are present in the column
Converts the start_date and start_time columns to a single datetime column named start_datetime
Drops the start_date and start_time columns
Converts the local_id and master_local_id columns to string type
Uses the get_sta_pre_id_suf_cmb() function to concatenate the station, precinct, ID, and suffix columns of the local_id column
Uses the add_mvs_rdtype_to_perm() function to add a vehicle type column to the DataFrame

Returns:: perm_countr_1 – The cleaned and preprocessed DataFrame containing hourly count data for vehicle permits issued by the Texas Department of Transportation.
Return type:: pd.DataFrame

vmtmix_fy23.i_raw_dt_prc.get_unq_mvc_stas(data_, sub_col): Only keep unique MVC stations. If the data has “ALL”, “EB”, and “WBs. Directional counts have “EB”, “WB”, … suffixes. While, total counts (sum of directional counts have no suffixes. The following line calls the “dir” for these total counts as “ALL”. E.g. Stations BR1702_NB and BR1702_SB have directional counts and station BR1702 has the total counts.

vmtmix_fy23.i_raw_dt_prc.raw_dt_prc(MVC_file='MVC_2013_21_received_on_030922', PERM_file='PERM_CLASS_BY_HR_2013_2021'): Process the raw MVC and permanent counter data to fix date time format, station id, map road types to MOVES, and save data to parquet for faster loading.

vmtmix_fy23.i_raw_dt_prc.save_raw_data_as_parquet(mvc_countr_, perm_countr_, mvc_out_fi, perm_out_fi): Clean the raw Permanent and MVC counter data save them as parquet for quick loading.

vmtmix_fy23.ii_dow_by_cls_fact_calc module

Use TxDOT Perm counter data to get DOW factors by different vehicle category. Use the formula used by Tao to be f_d,m ATR factor to get this factor by different vehicle categories. Created by: Apoorb Created/ Modified on: 02/14/2023

vmtmix_fy23.ii_dow_by_cls_fact_calc.conv_aadt_adt_mnth_dow_by_vehcat(out_fi, min_yr=2013, max_yr=2019): Convert AADT To monthly DOW ADT.

vmtmix_fy23.ii_dow_by_cls_fact_calc.dow_by_cls_fac(out_fi, min_yr, max_yr): Create DOW by veh class factors that will be applied to the AADT from ATR data by vehicle class.

vmtmix_fy23.ii_dow_by_cls_fact_calc.fun_region_episode(atr_data, episode_index, region_cat_name): Function to calculate the adt of a given region (region_cat_name, e.g. district, MPO) and episode (episode_index, e.g., weekend, summer) the region_cat_name should a field in the df atr_data, episode_index is a column of logical variable whose length is the same as df atr_data

vmtmix_fy23.ii_dow_by_cls_fact_calc.unique_datetimes_test(perm_countr_fil_)

vmtmix_fy23.ii_dow_by_cls_fact_calc.year_range_test(perm_countr_fil_)

vmtmix_fy23.iii_mvc_hpms_counts module

Get FHWA VMT Mix from Card 4.

class vmtmix_fy23.iii_mvc_hpms_counts.MVCVmtMix(tod_map_, path_inp=WindowsPath('E:/Texas A&M Transportation Institute/TxDOT_TPP_Projects - Task 5.3 Activity Forecasting Factors/Data/fy23_vmt_mix/input'), path_interm=WindowsPath('E:/Texas A&M Transportation Institute/TxDOT_TPP_Projects - Task 5.3 Activity Forecasting Factors/Data/fy23_vmt_mix/intermediate'), min_yr_=2013, max_yr_=2019)

Bases: object

agg_vtype_cols = ['MC', 'PC', 'PT_LCT', 'Bus', 'SU_MH_RT_HDV', 'CT_HDV']

cntr_lev_agg(): Mean MVC counts by TOD and then by across year for each counter.

get_mvc_sample_size(spatial_level): Get the sample size (# of counters) per spatial_level, road type, and tod.

map_ra = {2: 'r_ra', 3: 'r_ura', 4: 'u_ra', 5: 'u_ura', 'ALL': 'ALL'}

region_mvc_agg(spatial_level='district')

set_conv_aadt2dow_by_vehcat(): Read the AADT to DOW factor by vehicle category.

set_mvc(): Read the manual vehicle count parquet file into a pandas dataframe. Extract date tim parameters such as year, hour, month, dow. Filter the data to be between the min_yr and max_yr years. For FY22 the min_yr was 2013 and max_yr was 2019. Drop the rows where the MVC doesn’t have road type info. Create a copy of MVC data and assign road, area, and access type as “ALL”. Map the data to MOVES road types. Create new columns to represent counts by HPMS vehicle categories.

set_txdist(): Read TxDOT district shapefile.

vehclscntcols = {'class1': 'MC', 'class10': 'CT_HDV', 'class11': 'CT_HDV', 'class12': 'CT_HDV', 'class13': 'CT_HDV', 'class14': 'Unk', 'class15': 'Unk', 'class2': 'PC', 'class3': 'PT_LCT', 'class4': 'Bus', 'class5': 'SU_MH_RT_HDV', 'class6': 'SU_MH_RT_HDV', 'class7': 'SU_MH_RT_HDV', 'class8': 'CT_HDV', 'class9': 'CT_HDV'}

vmtmix_fy23.iii_mvc_hpms_counts.compute_vmtmix_dow(mvc_agg_dist_imputed_, mvcvmtmix_)

Computes the vehicle miles traveled (VMT) surrogate (counts-based) distribution by day of week and vehicle category for each district + road type group using the MVC counts.

Parameters:

mvc_agg_dist_imputed (pandas.DataFrame) – A DataFrame containing the imputed MVC counts at the district and road type level. Must contain the columns ‘district’, ‘mvs_rdtype_nm’, ‘hour’, ‘based_on_dg’, ‘MC_adt’, ‘PC_adt’, ‘PT_LCT_adt’, ‘Bus_adt’, ‘SU_MH_RT_HDV_adt’, and ‘CT_HDV_adt’.
mvcvmtmix (MVCVmtMix) – A MVCVmtMix object containing data and functions to compute counts by day of week and vehicle category.

Returns:

A DataFrame containing the counts by day of week and vehicle category for each district + road type group. The DataFrame has the columns ‘dgcode’, ‘district’, ‘based_on_dg’, ‘mvs_rdtype_nm’, ‘mvs_rdtype’, ‘dowagg’, ‘hour’, ‘MC_dow’, ‘PC_dow’, ‘PT_LCT_dow’, ‘Bus_dow’, ‘SU_MH_RT_HDV_dow’, ‘CT_HDV_dow’, ‘Total_dow’, ‘MC_frac’, ‘PC_frac’, ‘PT_LCT_frac’, ‘Bus_frac’, ‘SU_MH_RT_HDV_frac’, and ‘CT_HDV_frac’, representing the district group code, district identifier, Boolean flag indicating whether the imputation is based on district groups, road type group, road type code, day of week, hour of the day, counts by day of week and vehicle category, total VMT by day of week, and count (VMT) fraction by day of week and vehicle category.

Return type:

pandas.DataFrame

vmtmix_fy23.iii_mvc_hpms_counts.get_min_ss_per_loc(mvcvmtmix_, spatial_level_): Create spatial_rdtyp_lng dataframe of all combinations of districts or district groups and road types. Call get_mvc_sample_size to get the sample size. Check if there are at least 5 counters available for each analysis group: spatial_level_, road type, and hour.

vmtmix_fy23.iii_mvc_hpms_counts.handle_low_district_ss(all_district_sta_counts_, mvcvmtmix_)

Impute missing counts for district + road type groups that have less than 5 stations by aggregating the counts of district groups + road types that have at least 5 stations.

Parameters:

all_district_sta_counts (pandas.DataFrame) – A DataFrame containing station count information at the district and road type level. Must contain the columns ‘district’, ‘mvs_rdtype_nm’, and ‘min_avg_sta_count’.
mvcvmtmix (MVCVmtMix) – A MVCVmtMix object containing data and function to compute MVC counts at the district and/or district group level.

Returns:

A DataFrame containing imputed counts at the district and road type level. The DataFrame has the columns ‘district’, ‘mvs_rdtype_nm’, ‘hour’, ‘based_on_dg’, ‘MC_adt’, ‘PC_adt’, ‘PT_LCT_adt’, ‘Bus_adt’, ‘SU_MH_RT_HDV_adt’, and ‘CT_HDV_adt’, representing the district identifier, the road type group, the hour of the day, a Boolean flag indicating whether the imputation is based on district groups, and the MVC counts for the district and road type group.

Return type:

pandas.DataFrame

Raises:

AssertionError – If the resulting DataFrame does not have the expected number of rows or if there is more than one count for the “district”, “mvs_rdtype_nm”, “hour” group.

vmtmix_fy23.iii_mvc_hpms_counts.mvc_hpms_cnt(out_fi, min_yr, max_yr): Compute the HPMS category counts from the MVC data and apply the above conversion factors.

vmtmix_fy23.iv_SU_CT_sh_lh_dist module

Get short-haul vs. long-haul distribution from the FAF4 Network Assignment data for Texas.

class vmtmix_fy23.iv_SU_CT_sh_lh_dist.TrucksDist(path_faf_, path_inp_)

Bases: object

Get VMT distribution between long-haul and short-haul for combination trucks (CT) and single unit trucks (SU)

static compute_vmt_dist(ass_faf4_, erg_crc_a88_vius2002_SULhT_pct=0.103) → DataFrame

Static method, so does not have an instance of the class (self) in it.

faf12 - Year 2012 FAF long distance truck volume estimated based on the FAF 4 Origin-Destination truck tonnage and includes empty trucks. Volume/day/section. nonfaf12 - Year 2012 Local truck traffic that is not part of FAF 4 O-D database. Volume/day/section su_aadt12 - Single Unit Truck Traffic year 2012 comb_aadt1 - Combination Unit Truck Traffic year 2012

Parameters:

ass_faf4 (Combined FAF4 assignment and metadata.) –
erg_crc_a88_vius2002_SULhT_pct (Single Unit Long Haul (SULhT) vs. Long Haul (Lh)) –
survey. (trucks fraction based on the VIUS 2002) –

Returns:

pd.DataFrame

Return type:

distribution of the CLhT, CShT, SULhT, and SUShT.

get_vmt_dist() → dict[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]

Get the distribution of the CLhT, CShT, SULhT, and SUShT by district and at statewide level. There are missing values at district level, so we are using the statewide values in the analysis. Following are keys processing steps of this function:

Create ass_faf4_tx_distr and ass_faf4_tx_state by merging FAF4 metadata and assignment data.
ass_faf4_tx_state is used for statewide distribution, thus we overwrite all the txdot_dist values with 0.
Call compute_vmt_dist to compute the distribution of the CLhT, CShT, SULhT, and SUShT.
vmt_dist_tx can be directly used. vmt_dist_tx_distr have missing values, so we merge with other datasets to help identify discrepencies.
vmt_dist_tx_distr is not used.

Parameters:: self (object) – Instance of TrucksDist class.
Returns:: vmt_dist_tx has the distribution of the CLhT, CShT, SULhT, and SUShT at statewide level. vmt_dist_txdist has the distribution of the CLhT, CShT, SULhT, and SUShT at by district.
Return type:: dict[pd.DataFrame, pd.DataFrame]

prc_meta_faf4(): Process the FAF4 metadata that has route information. We already filtered the FAF4 metadata to keep Texas only data in the read_data function. Following are keys processing steps of this function: - Assign county FIPS to the FAF4 metadata and the assignment data. - FAF metadata: Filter the data to just keep faf4 ID, road type, area type, county FIPS. FAF4 ID links metadata with assignment data. - FAF metadata: Check for percent of rows with missing functional class or area code. - FAF metadata: Filter out the rows with missing functional class and area codes. - FAF metadata: Map MOVES road types to the FAF4 area and road type codes. - FAF metadata: Add district column by merging with county shapefile. - Filter out the data for districts where we only have 50 miles of total route miles. :param self: Instance of TrucksDist class. :type self: object

read_data(): Read FAF4 assignment and metadata, county geodata, and urbanized area geodata. Filter the FAF metadata to only keep Texas data.

vmtmix_fy23.iv_SU_CT_sh_lh_dist.faf4_su_ct_lh_sh_pct(out_fi): Get the SU and CT, Sh and Lh splits from FAF4 assignment and metadata using ERG methodology and VIUS 2002 factor.

vmtmix_fy23.utils module

Common functions that will be used by all modules. Created by: Apoorba Bibeka Crated on: 2/6/2023

class vmtmix_fy23.utils.ChainedAssignent(chained=None)

Bases: object

This class ChainedAssignment is used to control the behavior of chained assignment in pandas. It can be used as a context manager to temporarily change the chained_assignment option in pandas and then revert to the original value when the context is exited.

chained: A string that specifies the behavior of chained assignment. Acceptable values are None, “warn”, and “raise”. If None, no warning or exception will be raised when a chained assignment is encountered. If “warn”, a warning will be raised. If “raise”, an exception will be raised.

saved_swcw: A variable that stores the original value of the chained_assignment option in pandas.

The __enter__ method sets the chained_assignment option to the value specified by the chained argument and returns the context manager instance.

The __exit__ method sets the chained_assignment option back to the value stored in saved_swcw.

This class allows for a more convenient and readable way to control the behavior of chained assignment in pandas, compared to setting and resetting the option manually in separate statements.

vmtmix_fy23.utils.connect_to_server_db(database_nm, user_nm='moves', port_=3308): Function to connect to a particular database on the server. :returns: conn_ – Connection object to access the data in MariaDB Server. :rtype: mariadb.connection

vmtmix_fy23.utils.create_sut_fueltype_map(): Create Mapping for SUT type and Fuel type. 23 combination.

vmtmix_fy23.utils.get_engine_to_output_to_db(db): Get engine to output data to out_database using pd.to_sql().

vmtmix_fy23.utils.get_snake_case_dict(columns): Get columns in snake_case.

vmtmix_fy23.utils.timing(f): timing(f) is a decorator that measures and prints execution time of a function f. It takes a function as an argument and returns a wrapped version with timing functionality.

vmtmix_fy23.v_sut_nd_fuel_mix module

Use the following data to impute missing data: MVS303 default runs MVS 3 samplevehiclepopulation and sourcetypeagedistribution distribution data 2018 vehicle registration data.

Created by: Apoorb Created on: 02/14/2022

vmtmix_fy23.v_sut_nd_fuel_mix.get_mvs303defaultsutdist() → DataFrame: mvs303_1990_2000to2060_splits_out is the output database based on default MOVES run for Texas for 1990 and between 2000 and 2060. Read the activity from this database. movesdb20220105 is the default MOVES database, read the SUT and HPMS vehicle type code and description from this table. Sum-up the activity ID 1 (VMT) by HPMS. Create a modified set of SUTs, sum-up the activity by the modified set of SUTs. Get the distribution of SUTs within the modified HPMS parent category. Do the above for no-road (population) also.

vmtmix_fy23.v_sut_nd_fuel_mix.get_mvs303fueldist(anlyr=[1990, 2000, 2005, 2010, 2015, 2020, 2025, 2030, 2035, 2040, 2045, 2050, 2055, 2060]): Read default MOVES fuel type distribution by model year from 1960 to 2060. Read default MOVES age distribution for the analysis years. Merge the two and get a age weighted distribution of fuel type for the analysis year. Re-normalized the age weighed fuel type distribution to just two fuel types: gasoline and diesel.

vmtmix_fy23.v_sut_nd_fuel_mix.get_mvs303samvehpop() → DataFrame

Retrieves the aggregated sample vehicle population data from the samplevehiclepopulation: table in the movesdb20220105 database, and returns it as pandas DataFrame. The returned DataFrame contains the sum of stmyFraction values rouped by ‘sourceTypeID’, ‘modelYearID’, and ‘fuelTypeID’.

Returns:: A DataFrame containing the aggregated sample vehicle population data with the following columns: ‘sourceTypeID’, ‘modelYearID’, ‘fuelTypeID’, and ‘stmyFraction’.
Return type:: pandas.DataFrame

vmtmix_fy23.v_sut_nd_fuel_mix.get_mvs303souagedist(anlyr_: list) → DataFrame

Retrieves the source type age distribution data from the ‘sourcetypeagedistribution’ table in the ‘movesdb20220105’ database for the specified year(s), and returns it as a pandas DataFrame.

Parameters:: anlyr (list) – A list of integer values representing the year(s) for which to retrieve the source type age distribution data.
Returns:: A DataFrame containing the source type age distribution data for the specified year(s).
Return type:: pandas.DataFrame

vmtmix_fy23.v_sut_nd_fuel_mix.mvs_sut_nd_fuel_mx(fueldist_outfi, sut_hpms_dist_outfi): Get the SUT dist within HPMS and the fuel dist from MOVES default database.

vmtmix_fy23.vi_vmt_mix_disagg module

Use FAF and MOVES national level run VMT Mix to get the dissagregated VMT-Mix

vmtmix_fy23.vi_vmt_mix_disagg.add_yr_mod_cols_mvc(mvc_vmtmix_long_filt_, mvs303defaultsutdist_): Add analysis year column to the MVC data that does not have year column for the vehicle types that were not merged with national default data.

vmtmix_fy23.vi_vmt_mix_disagg.apply_faf4_fac(mvc_su_ct_, faf4_fac_): Merge the MVC data for SU and CT with the FAF4-based factors to split the SU and CT MVC counts to SUShT, SULhT, CShT, and CLhT.

vmtmix_fy23.vi_vmt_mix_disagg.apply_fuel_dist(mvc_suts_, mvs303fueldist_): Apply the mvs303fueldist_ fuel type distribution from default MOVES run to the MVC counts by SUT dataframe obtained from the concat_suts function.

vmtmix_fy23.vi_vmt_mix_disagg.concat_suts(mvc_mc_pc, mvc_sut_pt_lct, mvc_sut_ob_sb_tb, mvc_modsut_rt_mh, mvc_su_ct_sut): Concat the split-out SUT counts from different groups of SUTs to obtain a unified dataframe. Add some descriptive columns to this dataframe.

vmtmix_fy23.vi_vmt_mix_disagg.fac_sutdist_natdef(mvc_: DataFrame, mvc_vtype_cat: dict, mvs303defaultsutdist_: dict, modhpmsvehcat: dict)

The initial part of the code filters out the MVC data to just keep the HPMS vehicle categories specified in the mvc_vtype_cat dict. We filter mvs303defaultsutdist_ to the corresponding modified vehicle category to the HPMS vehicle category specified in mvc_vtype_cat dict and create a mapping between the two, that is used for merging. We merge the two dataframe and apply the default MOVES SUT splits to the MVC data.

Parameters:

mvc –
mvc_vtype_cat (dict) – MVC vehicle type column name and the value of interest. {‘mvc_vtype_cat’: ‘PT_LCT’} {‘mvc_vtype_cat’: ‘Bus’} {‘mvc_vtype_cat’: ‘SU_MH_RT_HDV’}
mvs303defaultsutdist (pd.DataFrame) – Distribution of SUTs within the modified HPMS parent category. This dataframe is filterd to modhpmsvehcat value. For instance for {‘modhpms_vtype_name’: ‘PT_LCT’}, we are filter this dataframe to ‘PT_LCT’
modhpmsvehcat (dict) – MOVES modified HPMS vehicle type column name and the value of interest. We map the mvc_vtype_cat names to the values specified in this dict. So, “PT_LCT” maps to “PT_LCT”. {‘modhpms_vtype_name’: ‘PT_LCT’} {‘modhpms_vtype_name’: ‘Buses’} {‘modhpms_vtype_name’: ‘SU_MH_RT_HDV’}

Return type:

pd.DataFrame

vmtmix_fy23.vi_vmt_mix_disagg.fin_vmt_mix(in_file_nm, out_file_nm): Apply the FAF4, and MOVES dist to the HPMS counts, filter data to different TODs, and normalize the final counts to get the SUT-FT dist.

vmtmix_fy23.vi_vmt_mix_disagg.norm_vmt_mix_by_tod(mvc_suts_ftype_, txdist_): Filter the values from apply_fuel_dist to different TOD hours and normalize the counts to get the Count distribution or the “VMT-Mix”.

vmtmix_fy23.vi_vmt_mix_disagg.prc_faf4_fac(faf4_su_ct_lh_sh_pct_: DataFrame) → DataFrame

Process FAF4 data-based factors by SU (Single Unit) and CT (Combination Truck). Connect FAF4 data-based factors with the MVC (Manual vehicle count) data columns and fields. Change the mapping to allow for merging.

Parameters:: faf4_su_ct_lh_sh_pct (pd.DataFrame) – A pandas DataFrame containing FAF4 data-based factors by SU and CT. It should have the following columns: - mvs_rdtype: str - pct_CLhT_vs_CT: float - pct_CShT_vs_CT: float - pct_SULhT_vs_SU: float - pct_SUShT_vs_SU: float
Returns:: A pandas DataFrame containing the filtered and mapped factors. It has the following columns: - mvs_rdtype: str - modsutname: str - sourceTypeName: str - su_ct_sh_lh_pcts: float
Return type:: pd.DataFrame

vmtmix_fy23.vi_vmt_mix_disagg.prc_mvc(mvc_vmtmix_): Transform MVC data into long format.

vmtmix_fy23 package

Submodules

vmtmix_fy23.figures module

vmtmix_fy23.i_raw_dt_prc module

vmtmix_fy23.ii_dow_by_cls_fact_calc module

vmtmix_fy23.iii_mvc_hpms_counts module

vmtmix_fy23.iv_SU_CT_sh_lh_dist module

vmtmix_fy23.utils module

vmtmix_fy23.v_sut_nd_fuel_mix module

vmtmix_fy23.vi_vmt_mix_disagg module

Module contents