FeatureListAsync.py 8.6 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180
  1. import os
  2. import json
  3. import time
  4. import asyncio
  5. import warnings
  6. from dotenv import load_dotenv
  7. from langchain_community.chat_models import ChatOpenAI
  8. from langchain import LLMChain, PromptTemplate
  9. import nest_asyncio
  10. # Apply the nest_asyncio patch to allow nested event loops
  11. nest_asyncio.apply()
  12. # Suppress all warnings
  13. warnings.filterwarnings("ignore")
  14. # Load environment variables
  15. load_dotenv()
  16. openai_api_key = os.getenv('OPENAI_API_KEY')
  17. # Step 1: Load the JSON file synchronously
  18. def load_json_file(file_path):
  19. with open(file_path, 'r') as json_file:
  20. data = json.load(json_file)
  21. print('Loaded data')
  22. return data
  23. # Define an asynchronous function to run Langchain on an individual invention
  24. async def process_invention(featureListExtractionChain, key, value):
  25. print(f'Generating Feature List for {key}')
  26. invention_text = value["invention"]
  27. # Process the invention text with the Langchain chain asynchronously
  28. result = await featureListExtractionChain.arun({"invention": invention_text})
  29. # Clean the result to remove code block markers and extra formatting
  30. cleaned_result = result.replace("```json", "").replace("```", "").strip()
  31. # Try to parse the cleaned result as JSON
  32. try:
  33. parsed_result = json.loads(cleaned_result)
  34. except json.JSONDecodeError as e:
  35. print(f"Error parsing result for {key}: {e}")
  36. parsed_result = cleaned_result # Fallback to raw string if parsing fails
  37. # Return the key and result for later collection
  38. return key, {"invention": invention_text, "result": parsed_result}
  39. # Step 2: Run Langchain prompt on each invention asynchronously
  40. async def run_langchain_on_inventions(data, model_name='gpt-4o-mini'):
  41. # Prompt template
  42. prompt_template = PromptTemplate(
  43. input_variables=["invention"],
  44. template = """
  45. Break down the provided text, which may describe a technical solution, invention claim, or methodology, into distinct and well-defined technical features. Each feature must adhere to the following guidelines:
  46. 1. **Technical Precision**:
  47. - Capture the structural, functional, or process-related elements described in the text.
  48. - For apparatus/technical claims, identify each structural component, its configuration, and its specific role within the system.
  49. - For methodology claims, outline each step in the exact sequence presented, ensuring that dependencies and technical details are preserved.
  50. - Each feature should focus on one unique aspect or functionality of the described solution.
  51. 2. **Completeness**:
  52. - Write each feature as a complete, standalone sentence that specifies the component, configuration, or function clearly.
  53. - Avoid vague language or incomplete descriptions. Each feature must include enough context to be meaningful on its own.
  54. 3. **Clarity and Consistency**:
  55. - Exclude phrases like "the present invention" or narrative elements that do not contribute directly to the technical details.
  56. - Focus on unique features and avoid unnecessary repetition.
  57. 4. **Fallback Instructions**:
  58. - If the provided text is abstract or lacks distinct technical elements, break it down into general purposes, key objectives, and any identifiable components or methodologies. Each feature should focus on specific technical attributes or intended functions.
  59. - If the text does not explicitly list components or steps, infer features based on the described purpose or functionality, ensuring each feature is precise and self-contained.
  60. ### Example Inputs and Outputs:
  61. #### Example 1:
  62. **Input**:
  63. The solution proposes a hanger bracket (Fishtail/Y-support bracket) for an exhaust system that includes a fin design to give structural rigidity and allows support for a hanger rod. A fin/plate is attached to the rod and then it will engage a slot in the bracket for tolerance control. The hanger bracket (Fishtail/Y-support bracket) is supported on a square tube or on a triangular tube with adjustment slots, which allows adjustment fore/aft, and up/down for tolerance stacks. The hanger bracket can be provided with a chamfer.
  64. **Output**:
  65. {{
  66. "F1": "A hanger bracket (Fishtail/Y-support bracket) includes a fin design to give structural rigidity and allows support for a hanger rod.",
  67. "F2": "A fin/plate is attached to the rod, then it will engage a slot in the bracket for tolerance control.",
  68. "F3": "The hanger bracket (Fishtail/Y-support bracket) supports on a square tube or on a triangular tube with adjustment slots, which allows adjustment fore/aft, and up/down for tolerance stacks.",
  69. "F4": "The hanger bracket includes a chamfer."
  70. }}
  71. #### Example 2:
  72. **Input**:
  73. A toilet seat and a cover member disposed behind the toilet seat to cover a rear part of a toilet bowl, the cover member comprising: a first standing part formed on the toilet seat side; an inclined part connected to a rear end of the first standing part and extending upward and downward; A second rising portion connected to a rear end of the inclined portion; and an extending portion connected to a rear end of the second rising portion and extending rearward, wherein the inclined portion is formed to be inclined at a steeper inclination than the extending portion.
  74. **Output**:
  75. {{
  76. "F1": "A toilet seat and a cover member disposed behind the toilet seat to cover a rear part of a toilet bowl.",
  77. "F2": "The cover member comprising a first standing part formed on the toilet seat side.",
  78. "F3": "An inclined part connected to a rear end of the first standing part and extending upward and downward.",
  79. "F4": "A second rising portion connected to a rear end of the inclined portion.",
  80. "F5": "An extending portion connected to a rear end of the second rising portion and extending rearward.",
  81. "F6": "The inclined portion is formed to be inclined at a steeper inclination than the extending portion."
  82. }}
  83. ### Analysis Process:
  84. 1. Identify distinct components, configurations, or processes described in the text.
  85. 2. For each unique aspect, create a feature that captures the key technical detail, ensuring it is specific and complete.
  86. 3. Where variations or optional features are described, include them as separate features (e.g., a chamfer being present or absent).
  87. 4. If no clear features can be extracted, revert to the fallback approach by breaking down the text line-by-line and using each line as a feature. If the provided text is abstract or lacks distinct technical elements, break it down into general purposes, key objectives, and any identifiable components or methodologies.
  88. Input Text: {invention}
  89. Strictly follow the JSON Output Structure below, No extra content
  90. Output:
  91. {{
  92. "F1": "First technical feature...",
  93. "F2": "Second technical feature...",
  94. ...
  95. }}
  96. """
  97. )
  98. # Initialize the Langchain LLM with the desired model
  99. featureListExtractionChain = LLMChain(
  100. llm=ChatOpenAI(model=model_name),
  101. prompt=prompt_template
  102. )
  103. # Create a list to hold the asynchronous tasks
  104. tasks = []
  105. # Create tasks for each invention
  106. for key, value in data.items():
  107. tasks.append(process_invention(featureListExtractionChain, key, value))
  108. # Run all the tasks concurrently
  109. results = await asyncio.gather(*tasks)
  110. # Convert results into a dictionary
  111. results_dict = {key: value for key, value in results}
  112. return results_dict
  113. # Step 3: Save the results to a new JSON file synchronously
  114. def save_results_to_json(results, output_file):
  115. with open(output_file, 'w') as outfile:
  116. json.dump(results, outfile, indent=4)
  117. # Main function to tie everything together
  118. def main(input_file_path, output_file_path):
  119. # Start timing
  120. start_time = time.time()
  121. # Step 1: Load the JSON file
  122. data = load_json_file(input_file_path)
  123. if data is None:
  124. print("Error: Data not loaded.")
  125. return
  126. # Step 2: Process the inventions asynchronously using asyncio.run()
  127. processed_results = asyncio.run(run_langchain_on_inventions(data))
  128. # Step 3: Save the processed results to a new JSON file
  129. save_results_to_json(processed_results, output_file_path)
  130. # End timing
  131. end_time = time.time()
  132. # Calculate and print the total execution time
  133. execution_time = end_time - start_time
  134. print(f"Script executed in: {execution_time:.2f} seconds")
  135. # Run the script as a standalone program
  136. if __name__ == "__main__":
  137. input_file = 'FTO_inventions.json' # Set the path to your input JSON file
  138. output_file = 'FTO_GPT_FeatureList3.json' # Set the path to your output JSON file
  139. main(input_file, output_file)