Genome-scale models of metabolism can illuminate the molecular basis of cell phenotypes. Since some enzymes are only active in specific cell types, several algorithms use omics data to construct cell-line- and tissue-specific metabolic models from genome-scale models. However, these methods are often not rigorously benchmarked, and it is unclear how algorithm and parameter selection (e.g., gene expression thresholds, metabolic constraints) affects model content and predictive accuracy. To investigate this, we built hundreds of models of four different cancer cell lines using six algorithms, four gene expression thresholds, and three sets of metabolic constraints. Model content varied substantially across different parameter sets, but the algorithms generally increased accuracy in gene essentiality predictions. However, model extraction method choice had the largest impact on model accuracy. We further highlight how assumptions during model development influence model prediction accuracy. These insights will guide further development of context-specific models, thus more accurately resolving genotype-phenotype relationships.